TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: A search engine that lets you 'add' words as vectors

148 pointsby jakekover 11 years ago

36 comments

gjm11over 11 years ago
Hmm. It&#x27;s a lovely idea but I find the results uninspiring. (Which isn&#x27;t surprising; it&#x27;s a very difficult problem. But I was hoping to be amazed.) Here are the examples I tried (all of them, no cherrypicking):<p>daughter + male - female -&gt; { The Eldest (book), Songwriter, Granddaughter }<p>(Hopeless; should have had &quot;son&quot; in there)<p>pc - microsoft + apple -&gt; { Olynssis The Silver Color (Japanese book), Burger Time (arcade game), Phantasy Star (series of games) }<p>(Hopeless; should have had &quot;Mac&quot; in there)<p>violin - string + woodwind -&gt; { clarinet, oboe, flute }<p>(OK)<p>mccartney - beatles + stones -&gt; { Rolling Stone (magazine), carvedilol (pharmaceutical), stone (geological term) }<p>(Poor; should have had Jagger or Richards or something of the kind in the top few results)<p>sofa - large + small -&gt; { relaxing, asleep, cupboard }<p>(Poor; I&#x27;d have hoped for &quot;armchair&quot; or something of the kind)
评论 #6720309 未加载
nemo1618over 11 years ago
Lisp + JVM = Clojure<p>I&#x27;m sold. This is really cool! (Though it&#x27;s worth noting that a Google search with the same terms returns the exact same result...)
评论 #6745572 未加载
leephillipsover 11 years ago
I was really excited by this writeup, so I tried it. Four test queries returned nothing that seemed useful or even relevant:<p>fluid dynamics + electromagnetism : expected magnetrohydrodynamics, got Maxwell’s equations and classical mechanics (not useful);<p>verse + 5 - rhyme : expected blank iambic pentameter, Shakespeare, etc.: got nonsense;<p>writer + American + Russian + Great - Nobel Prize : expected Nabokov, got Meirkhaim Gavrielov + 1 nonsense result;<p>plant + illegal - addictive : expected cannabis, chronic, etc; got “Plants” (thanks) and “Nuclear Weapon” (?!? ) and some Hungarian village.<p>EDIT: I thought maybe I wasn&#x27;t being sufficiently imaginative, so I tried &quot;Nixon + Clinton - JFK&quot; and got nothing that looked interesting. Then I noticed that the &quot;Nixon&quot; part of my query was &quot;disambiguated&quot; to something like &quot;non_film&quot;, and the word &quot;Nixon&quot; was just stripped out. I think this thing is just broken.
评论 #6720243 未加载
评论 #6720369 未加载
doctobogganover 11 years ago
Hey juxtaposicion, fascinating work. I have many questions so I am just going to shoot them rapid fire.<p>What is the dimensionality of each word vector and what does a words position in this space &quot;mean&quot;? What is this dimensionality determined by? Have your tried any dimensionality reduction algorithms like PCA or Isomap? It would be interesting to find the word vectors that contain the most variation across all of wikipedia. Have you tried any other nearest neighbor search methods other than a simple dot product, such as locality sensitive hashing?<p>I guess most of those questions are about the word2vec algorithm, but you are probably in a good place to answer them after working with it. Anyways, cool work, and I am glad you did it in python so I can really dig in and understand it.
评论 #6720858 未加载
评论 #6721252 未加载
juxtaposicionover 11 years ago
Harvard - Boston + Silicon <a href="http://www.thisplusthat.me/search/Harvard%20-%20Boston%20%2B%20Silicon" rel="nofollow">http:&#x2F;&#x2F;www.thisplusthat.me&#x2F;search&#x2F;Harvard%20-%20Boston%20%2B...</a>
评论 #6719879 未加载
SandB0xover 11 years ago
I saw what you wrote about your dot product speed issue. Did you try using NumPy&#x27;s einsum function? <a href="http://docs.scipy.org/doc/numpy/reference/generated/numpy.einsum.html" rel="nofollow">http:&#x2F;&#x2F;docs.scipy.org&#x2F;doc&#x2F;numpy&#x2F;reference&#x2F;generated&#x2F;numpy.ei...</a><p>It&#x27;s really fast for this kind of stuff. Happy to give details about how to use it if you need.
评论 #6722430 未加载
emehrkayover 11 years ago
Interesting. I&#x27;ve played around with words as vectors (with values) and the cosign similarity algorithm (<a href="http://en.wikipedia.org/wiki/Cosine_similarity" rel="nofollow">http:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Cosine_similarity</a>). This is very cool stuff. I wonder how they&#x27;re doing it in real-time, it is heavy number crunching
评论 #6720359 未加载
donretagover 11 years ago
Interesting concept, but how will it work with more dynamic content? You can train the model on a fairly static corpus such as Wikipedia, but what if you content changes with a greater frequency?<p>Since MapReduce is used, perhaps the model is already being trained on small batches making incremental updates possible.
评论 #6720007 未加载
lognover 11 years ago
daft punk - repetitive + lyrics == La Roux<p>nice work!
axblountover 11 years ago
I guess we all just need a little more LeAnn Rimes. <a href="http://www.thisplusthat.me/search/the%20world%20-%20violence%20%2B%20love" rel="nofollow">http:&#x2F;&#x2F;www.thisplusthat.me&#x2F;search&#x2F;the%20world%20-%20violence...</a>
estover 11 years ago
Sounds like this paper from Google<p><a href="http://www.technologyreview.com/view/519581" rel="nofollow">http:&#x2F;&#x2F;www.technologyreview.com&#x2F;view&#x2F;519581</a><p>For example, the operation ‘king’ – ‘man’ + ‘woman’ results in a vector that is similar to ‘queen’.
jeorgunover 11 years ago
Is it just me, or do almost half of my searches return `Dvbc&#x27; for no apparent reason?<p><a href="http://www.thisplusthat.me/search/Saturn%20-%20Rings%20%2B%20Spot" rel="nofollow">http:&#x2F;&#x2F;www.thisplusthat.me&#x2F;search&#x2F;Saturn%20-%20Rings%20%2B%2...</a><p><a href="http://www.thisplusthat.me/search/Chrome%20%2B%20open%20source" rel="nofollow">http:&#x2F;&#x2F;www.thisplusthat.me&#x2F;search&#x2F;Chrome%20%2B%20open%20sour...</a><p><a href="http://www.thisplusthat.me/search/Unix%20%2B%20Open%20Source" rel="nofollow">http:&#x2F;&#x2F;www.thisplusthat.me&#x2F;search&#x2F;Unix%20%2B%20Open%20Source</a>
toolsliveover 11 years ago
Does this relate to Latent Semantic Indexing? <a href="http://en.wikipedia.org/wiki/Latent_semantic_indexing" rel="nofollow">http:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Latent_semantic_indexing</a>
评论 #6720524 未加载
CurtMonashover 11 years ago
Sounds like another go-around at 1990s (&amp; early 2000s) concept search -- Excite, Northern Light, etc.<p>And it sounds really close to what I was trying at Elucidate.
dhammackover 11 years ago
Hey, nice work! Can you explain the &quot;comma delimited list&quot; functionality any more? It seems (awesomely) similar to a hack I did a while back with Word2Vec which would pick out the word which didn&#x27;t belong in a list.<p>My hack: <a href="https://github.com/dhammack/Word2VecExample" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;dhammack&#x2F;Word2VecExample</a>
Danieruover 11 years ago
Fun bug: handheld - sony + nintendo = {Wii, Wii, Snes}<p>I was hoping for the DS or Gameboy but expecting at least something handheld.
评论 #6720441 未加载
评论 #6723786 未加载
grishmaover 11 years ago
Interesting. Currently it generates garbage for lot of queries but, some stuff is kinda fun. Forrest Gump - comedy + romance gives pulp fiction (!), as good as it gets (match) and polar express (?) Avatar - action + comedy gives The Office (haha!)
yetanotherphdover 11 years ago
I know people like to keep things positive, but this is completely useless. Apart from a few cherry picked examples, subtracting words makes no sense most of the time, and there is no clear advantage for their method when it comes to adding words.
jboynycover 11 years ago
This is neat, and I found a few queries that added interesting results. However, I tried<p><pre><code> Slavoj Žižek - Jacques Lacan - Hegel </code></pre> which yielded an internal server error, probably due to the diacritics not being encoded properly.
cocoflunchyover 11 years ago
Bug report: using some non-ascii characters crashes the server (for example é or É).
评论 #6722565 未加载
zhemaoover 11 years ago
Albert Einstein - Smart = Niels Bohr, Werner Heisenberg, Wolfgang Pauli<p>Ouch, that&#x27;s cold
breckover 11 years ago
Neat stuff juxtaposicion.<p>Seems like this is how Numenta&#x27;s AI works: <a href="http://www.youtube.com/watch?v=iNMbsvK8Q8Y" rel="nofollow">http:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=iNMbsvK8Q8Y</a>
akennbergover 11 years ago
Stanford - American + Canadian = University of Toronto<p>I think it should be Waterloo.
评论 #6723633 未加载
whistlerbrkover 11 years ago
Works for me:<p><a href="http://www.thisplusthat.me/search/Dick%20Cheney%20-%20evil%20%2B%20good" rel="nofollow">http:&#x2F;&#x2F;www.thisplusthat.me&#x2F;search&#x2F;Dick%20Cheney%20-%20evil%2...</a>
评论 #6725066 未加载
somberiover 11 years ago
Fantastic work and is relevant to something we are working on in this space. Thanks.<p>On a lighter note I tried &quot;sarah palin + sexy&quot; and I got John Mccain, Hillary Clinton and Mitt Romney.
pitover 11 years ago
Also interesting to try something like:<p>sleep - sleep<p><a href="http://www.thisplusthat.me/search/sleep%20-%20sleep" rel="nofollow">http:&#x2F;&#x2F;www.thisplusthat.me&#x2F;search&#x2F;sleep%20-%20sleep</a>
bocanautover 11 years ago
<a href="http://www.thisplusthat.me/search/Germany%20-%20Fun" rel="nofollow">http:&#x2F;&#x2F;www.thisplusthat.me&#x2F;search&#x2F;Germany%20-%20Fun</a> Germany - Fun = USA<p>:)
coroboover 11 years ago
Hey this is pretty cool!<p>superman - male + female:<p><pre><code> - Lex Luthor (hmm..) - Superman&#x27;s pal Jimmy Olsen (haha, what?) - Wonder Woman (That&#x27;ll do it!)</code></pre>
ppymouover 11 years ago
Great writeup. Curious, are there clear advantages that the vector representation has over graph models (FB graph search, Google Knowledge graph)?
SergeyHackover 11 years ago
The default example &quot;justin bieber - man + women&quot; was ok, but I have found a better one - &quot;justin bieber - women + man &quot;
Lucy_karpovaover 11 years ago
What are the use cases for this fancy feature? I&#x27;m thinking of e-advisor for fun, but what are the real life serious use cases?
iLochover 11 years ago
ThisPlusThat.me - fast + slow...<p>Just kidding! :)<p>You could also say...<p>ThisPlusThat.me - another rant + something cool<p>Thanks for posting this, very interesting work!
iamchmodover 11 years ago
I thought this one was good &quot;Stanford - Red + Smart&quot; = Berkeley
elwellover 11 years ago
Server apparently wasn&#x27;t ready for HN frontpage load
epagaover 11 years ago
Pretty impressive for my first try.<p>iPad - cool -&gt; Windows Phone
dlsymover 11 years ago
reddit - dumb<p>Expected: HN, Got: Digg