TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Making Sense of Everything with words2map

60 pointsby lmcinnesalmost 9 years ago

8 comments

minimaxiralmost 9 years ago
I&#x27;m not fond of the &quot;magic AI does everything&quot; narrative, especially since the code is available on GitHub (<a href="https:&#x2F;&#x2F;github.com&#x2F;overlap-ai&#x2F;words2map" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;overlap-ai&#x2F;words2map</a>) and it&#x27;s not magic. That being said, the code is optimized for efficient memory usage (important with the pre-built word2vec models), and since it MIT-licensed, I might be able to develop a few pretty visualizations. :)
评论 #12169455 未加载
mddaalmost 9 years ago
&quot;We are now at a point in history when algorithms can learn, like people, about pretty much anything. &quot; seems pretty disingenuously worded.<p>One infers from a quick read ~&quot;Algorithms are now like people, and can learn about anything.&quot; But careful parsing of the commas shows that the sentence is true, but in the precise sense that &quot;People can learn about anything. Now, algorithms can also learn about anything.&quot; - and the extent of learning&#x2F;understanding is not being compared.<p>Perhaps I&#x27;m nit-picking, but this statement appears to have been constructed to support an AI pitch, and is literally true, but no &#x27;actual AI&#x27; is involved (and no-one is actually claiming it is... unless you &#x2F;want to believe&#x2F;).
评论 #12173956 未加载
ilyaeckalmost 9 years ago
Question to Y-hat folks: why cluster in 2D? Granted, clustering in 300D is hard :) Still, the 2D projection must add a significant metric distortion. Why not a middle ground, say, 5-10D ?
评论 #12169390 未加载
评论 #12169116 未加载
评论 #12168682 未加载
评论 #12168556 未加载
vinchucoalmost 9 years ago
Nitpicking: NOT (human + robot) ≈ cyborg BUT average(human + robot) ≈ cyborg<p>Some things that come to mind:<p>I&#x27;d be interested to see other vector operations such as projection of one word into another in the examples. Also, only nouns yet.<p>How is ≈ defined, if the distance to the closest word vector is not necessarily unique?<p>Finally, what is the proportion of words that maintain human meaning when averaged to those that are nonsense? What are the most &quot;meaningful&quot; words, in that sense?
vonnikalmost 9 years ago
how is this different than TSNE?<p><a href="https:&#x2F;&#x2F;lvdmaaten.github.io&#x2F;tsne&#x2F;" rel="nofollow">https:&#x2F;&#x2F;lvdmaaten.github.io&#x2F;tsne&#x2F;</a><p>anyone looking for an explanation of word2vec may find this helpful:<p><a href="http:&#x2F;&#x2F;deeplearning4j.org&#x2F;word2vec" rel="nofollow">http:&#x2F;&#x2F;deeplearning4j.org&#x2F;word2vec</a>
评论 #12169765 未加载
ganeshkrishnanalmost 9 years ago
Hi, I was in the middle of creating &quot;user personalities&quot; using K-means clustering.<p>Is it ok to reference your document for our papers? MIT licence is awesome and let us reuse your tech. Our site is at www.shoten.xyz if you are interested to know what we are doing
评论 #12170585 未加载
sixhobbitsalmost 9 years ago
human + robot ≈ cyborg<p>electricity + silicon ≈ solar cells<p>virtual reality + reality ≈ augmented reality<p>--<p>These always seem impressive in word vector models, but in reality, I imagine that &quot;robot&quot; and &quot;cyborg&quot; were already pretty close. The fact that adding &quot;human&quot; nudged the vector closer is likely not as meaningful as it would be nice to believe. The same for &quot;electricity&#x2F;solar cells&quot; and &quot;virtual reality&#x2F;augmented reality&quot;<p>Still a really nice application for word2vec, and I&#x27;m looking forward to seeing other similarly practical implementations in future.
评论 #12170944 未加载
visargaalmost 9 years ago
I think you can also get pretty good suggestions with plain old bag-of-words, tf-idf and k-means.
评论 #12170566 未加载