TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: A search engine that lets you 'add' words as vectors

148 点作者 jakek超过 11 年前

36 条评论

gjm11超过 11 年前
Hmm. It&#x27;s a lovely idea but I find the results uninspiring. (Which isn&#x27;t surprising; it&#x27;s a very difficult problem. But I was hoping to be amazed.) Here are the examples I tried (all of them, no cherrypicking):<p>daughter + male - female -&gt; { The Eldest (book), Songwriter, Granddaughter }<p>(Hopeless; should have had &quot;son&quot; in there)<p>pc - microsoft + apple -&gt; { Olynssis The Silver Color (Japanese book), Burger Time (arcade game), Phantasy Star (series of games) }<p>(Hopeless; should have had &quot;Mac&quot; in there)<p>violin - string + woodwind -&gt; { clarinet, oboe, flute }<p>(OK)<p>mccartney - beatles + stones -&gt; { Rolling Stone (magazine), carvedilol (pharmaceutical), stone (geological term) }<p>(Poor; should have had Jagger or Richards or something of the kind in the top few results)<p>sofa - large + small -&gt; { relaxing, asleep, cupboard }<p>(Poor; I&#x27;d have hoped for &quot;armchair&quot; or something of the kind)
评论 #6720309 未加载
nemo1618超过 11 年前
Lisp + JVM = Clojure<p>I&#x27;m sold. This is really cool! (Though it&#x27;s worth noting that a Google search with the same terms returns the exact same result...)
评论 #6745572 未加载
leephillips超过 11 年前
I was really excited by this writeup, so I tried it. Four test queries returned nothing that seemed useful or even relevant:<p>fluid dynamics + electromagnetism : expected magnetrohydrodynamics, got Maxwell’s equations and classical mechanics (not useful);<p>verse + 5 - rhyme : expected blank iambic pentameter, Shakespeare, etc.: got nonsense;<p>writer + American + Russian + Great - Nobel Prize : expected Nabokov, got Meirkhaim Gavrielov + 1 nonsense result;<p>plant + illegal - addictive : expected cannabis, chronic, etc; got “Plants” (thanks) and “Nuclear Weapon” (?!? ) and some Hungarian village.<p>EDIT: I thought maybe I wasn&#x27;t being sufficiently imaginative, so I tried &quot;Nixon + Clinton - JFK&quot; and got nothing that looked interesting. Then I noticed that the &quot;Nixon&quot; part of my query was &quot;disambiguated&quot; to something like &quot;non_film&quot;, and the word &quot;Nixon&quot; was just stripped out. I think this thing is just broken.
评论 #6720243 未加载
评论 #6720369 未加载
doctoboggan超过 11 年前
Hey juxtaposicion, fascinating work. I have many questions so I am just going to shoot them rapid fire.<p>What is the dimensionality of each word vector and what does a words position in this space &quot;mean&quot;? What is this dimensionality determined by? Have your tried any dimensionality reduction algorithms like PCA or Isomap? It would be interesting to find the word vectors that contain the most variation across all of wikipedia. Have you tried any other nearest neighbor search methods other than a simple dot product, such as locality sensitive hashing?<p>I guess most of those questions are about the word2vec algorithm, but you are probably in a good place to answer them after working with it. Anyways, cool work, and I am glad you did it in python so I can really dig in and understand it.
评论 #6720858 未加载
评论 #6721252 未加载
juxtaposicion超过 11 年前
Harvard - Boston + Silicon <a href="http://www.thisplusthat.me/search/Harvard%20-%20Boston%20%2B%20Silicon" rel="nofollow">http:&#x2F;&#x2F;www.thisplusthat.me&#x2F;search&#x2F;Harvard%20-%20Boston%20%2B...</a>
评论 #6719879 未加载
SandB0x超过 11 年前
I saw what you wrote about your dot product speed issue. Did you try using NumPy&#x27;s einsum function? <a href="http://docs.scipy.org/doc/numpy/reference/generated/numpy.einsum.html" rel="nofollow">http:&#x2F;&#x2F;docs.scipy.org&#x2F;doc&#x2F;numpy&#x2F;reference&#x2F;generated&#x2F;numpy.ei...</a><p>It&#x27;s really fast for this kind of stuff. Happy to give details about how to use it if you need.
评论 #6722430 未加载
emehrkay超过 11 年前
Interesting. I&#x27;ve played around with words as vectors (with values) and the cosign similarity algorithm (<a href="http://en.wikipedia.org/wiki/Cosine_similarity" rel="nofollow">http:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Cosine_similarity</a>). This is very cool stuff. I wonder how they&#x27;re doing it in real-time, it is heavy number crunching
评论 #6720359 未加载
donretag超过 11 年前
Interesting concept, but how will it work with more dynamic content? You can train the model on a fairly static corpus such as Wikipedia, but what if you content changes with a greater frequency?<p>Since MapReduce is used, perhaps the model is already being trained on small batches making incremental updates possible.
评论 #6720007 未加载
logn超过 11 年前
daft punk - repetitive + lyrics == La Roux<p>nice work!
axblount超过 11 年前
I guess we all just need a little more LeAnn Rimes. <a href="http://www.thisplusthat.me/search/the%20world%20-%20violence%20%2B%20love" rel="nofollow">http:&#x2F;&#x2F;www.thisplusthat.me&#x2F;search&#x2F;the%20world%20-%20violence...</a>
est超过 11 年前
Sounds like this paper from Google<p><a href="http://www.technologyreview.com/view/519581" rel="nofollow">http:&#x2F;&#x2F;www.technologyreview.com&#x2F;view&#x2F;519581</a><p>For example, the operation ‘king’ – ‘man’ + ‘woman’ results in a vector that is similar to ‘queen’.
jeorgun超过 11 年前
Is it just me, or do almost half of my searches return `Dvbc&#x27; for no apparent reason?<p><a href="http://www.thisplusthat.me/search/Saturn%20-%20Rings%20%2B%20Spot" rel="nofollow">http:&#x2F;&#x2F;www.thisplusthat.me&#x2F;search&#x2F;Saturn%20-%20Rings%20%2B%2...</a><p><a href="http://www.thisplusthat.me/search/Chrome%20%2B%20open%20source" rel="nofollow">http:&#x2F;&#x2F;www.thisplusthat.me&#x2F;search&#x2F;Chrome%20%2B%20open%20sour...</a><p><a href="http://www.thisplusthat.me/search/Unix%20%2B%20Open%20Source" rel="nofollow">http:&#x2F;&#x2F;www.thisplusthat.me&#x2F;search&#x2F;Unix%20%2B%20Open%20Source</a>
toolslive超过 11 年前
Does this relate to Latent Semantic Indexing? <a href="http://en.wikipedia.org/wiki/Latent_semantic_indexing" rel="nofollow">http:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Latent_semantic_indexing</a>
评论 #6720524 未加载
CurtMonash超过 11 年前
Sounds like another go-around at 1990s (&amp; early 2000s) concept search -- Excite, Northern Light, etc.<p>And it sounds really close to what I was trying at Elucidate.
dhammack超过 11 年前
Hey, nice work! Can you explain the &quot;comma delimited list&quot; functionality any more? It seems (awesomely) similar to a hack I did a while back with Word2Vec which would pick out the word which didn&#x27;t belong in a list.<p>My hack: <a href="https://github.com/dhammack/Word2VecExample" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;dhammack&#x2F;Word2VecExample</a>
Danieru超过 11 年前
Fun bug: handheld - sony + nintendo = {Wii, Wii, Snes}<p>I was hoping for the DS or Gameboy but expecting at least something handheld.
评论 #6720441 未加载
评论 #6723786 未加载
grishma超过 11 年前
Interesting. Currently it generates garbage for lot of queries but, some stuff is kinda fun. Forrest Gump - comedy + romance gives pulp fiction (!), as good as it gets (match) and polar express (?) Avatar - action + comedy gives The Office (haha!)
yetanotherphd超过 11 年前
I know people like to keep things positive, but this is completely useless. Apart from a few cherry picked examples, subtracting words makes no sense most of the time, and there is no clear advantage for their method when it comes to adding words.
jboynyc超过 11 年前
This is neat, and I found a few queries that added interesting results. However, I tried<p><pre><code> Slavoj Žižek - Jacques Lacan - Hegel </code></pre> which yielded an internal server error, probably due to the diacritics not being encoded properly.
cocoflunchy超过 11 年前
Bug report: using some non-ascii characters crashes the server (for example é or É).
评论 #6722565 未加载
zhemao超过 11 年前
Albert Einstein - Smart = Niels Bohr, Werner Heisenberg, Wolfgang Pauli<p>Ouch, that&#x27;s cold
breck超过 11 年前
Neat stuff juxtaposicion.<p>Seems like this is how Numenta&#x27;s AI works: <a href="http://www.youtube.com/watch?v=iNMbsvK8Q8Y" rel="nofollow">http:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=iNMbsvK8Q8Y</a>
akennberg超过 11 年前
Stanford - American + Canadian = University of Toronto<p>I think it should be Waterloo.
评论 #6723633 未加载
whistlerbrk超过 11 年前
Works for me:<p><a href="http://www.thisplusthat.me/search/Dick%20Cheney%20-%20evil%20%2B%20good" rel="nofollow">http:&#x2F;&#x2F;www.thisplusthat.me&#x2F;search&#x2F;Dick%20Cheney%20-%20evil%2...</a>
评论 #6725066 未加载
somberi超过 11 年前
Fantastic work and is relevant to something we are working on in this space. Thanks.<p>On a lighter note I tried &quot;sarah palin + sexy&quot; and I got John Mccain, Hillary Clinton and Mitt Romney.
pit超过 11 年前
Also interesting to try something like:<p>sleep - sleep<p><a href="http://www.thisplusthat.me/search/sleep%20-%20sleep" rel="nofollow">http:&#x2F;&#x2F;www.thisplusthat.me&#x2F;search&#x2F;sleep%20-%20sleep</a>
bocanaut超过 11 年前
<a href="http://www.thisplusthat.me/search/Germany%20-%20Fun" rel="nofollow">http:&#x2F;&#x2F;www.thisplusthat.me&#x2F;search&#x2F;Germany%20-%20Fun</a> Germany - Fun = USA<p>:)
corobo超过 11 年前
Hey this is pretty cool!<p>superman - male + female:<p><pre><code> - Lex Luthor (hmm..) - Superman&#x27;s pal Jimmy Olsen (haha, what?) - Wonder Woman (That&#x27;ll do it!)</code></pre>
ppymou超过 11 年前
Great writeup. Curious, are there clear advantages that the vector representation has over graph models (FB graph search, Google Knowledge graph)?
SergeyHack超过 11 年前
The default example &quot;justin bieber - man + women&quot; was ok, but I have found a better one - &quot;justin bieber - women + man &quot;
Lucy_karpova超过 11 年前
What are the use cases for this fancy feature? I&#x27;m thinking of e-advisor for fun, but what are the real life serious use cases?
iLoch超过 11 年前
ThisPlusThat.me - fast + slow...<p>Just kidding! :)<p>You could also say...<p>ThisPlusThat.me - another rant + something cool<p>Thanks for posting this, very interesting work!
iamchmod超过 11 年前
I thought this one was good &quot;Stanford - Red + Smart&quot; = Berkeley
elwell超过 11 年前
Server apparently wasn&#x27;t ready for HN frontpage load
epaga超过 11 年前
Pretty impressive for my first try.<p>iPad - cool -&gt; Windows Phone
dlsym超过 11 年前
reddit - dumb<p>Expected: HN, Got: Digg