I'm not fond of the "magic AI does everything" narrative, especially since the code is available on GitHub (<a href="https://github.com/overlap-ai/words2map" rel="nofollow">https://github.com/overlap-ai/words2map</a>) and it's not magic. That being said, the code is optimized for efficient memory usage (important with the pre-built word2vec models), and since it MIT-licensed, I might be able to develop a few pretty visualizations. :)
"We are now at a point in history when algorithms can learn, like people, about pretty much anything. " seems pretty disingenuously worded.<p>One infers from a quick read ~"Algorithms are now like people, and can learn about anything." But careful parsing of the commas shows that the sentence is true, but in the precise sense that "People can learn about anything. Now, algorithms can also learn about anything." - and the extent of learning/understanding is not being compared.<p>Perhaps I'm nit-picking, but this statement appears to have been constructed to support an AI pitch, and is literally true, but no 'actual AI' is involved (and no-one is actually claiming it is... unless you /want to believe/).
Question to Y-hat folks: why cluster in 2D? Granted, clustering in 300D is hard :) Still, the 2D projection must add a significant metric distortion. Why not a middle ground, say, 5-10D ?
Nitpicking:
NOT (human + robot) ≈ cyborg BUT average(human + robot) ≈ cyborg<p>Some things that come to mind:<p>I'd be interested to see other vector operations such as projection of one word into another in the examples. Also, only nouns yet.<p>How is ≈ defined, if the distance to the closest word vector is not necessarily unique?<p>Finally, what is the proportion of words that maintain human meaning when averaged to those that are nonsense? What are the most "meaningful" words, in that sense?
how is this different than TSNE?<p><a href="https://lvdmaaten.github.io/tsne/" rel="nofollow">https://lvdmaaten.github.io/tsne/</a><p>anyone looking for an explanation of word2vec may find this helpful:<p><a href="http://deeplearning4j.org/word2vec" rel="nofollow">http://deeplearning4j.org/word2vec</a>
Hi, I was in the middle of creating "user personalities" using K-means clustering.<p>Is it ok to reference your document for our papers?
MIT licence is awesome and let us reuse your tech. Our site is at www.shoten.xyz if you are interested to know what we are doing
human + robot ≈ cyborg<p>electricity + silicon ≈ solar cells<p>virtual reality + reality ≈ augmented reality<p>--<p>These always seem impressive in word vector models, but in reality, I imagine that "robot" and "cyborg" were already pretty close. The fact that adding "human" nudged the vector closer is likely not as meaningful as it would be nice to believe. The same for "electricity/solar cells" and "virtual reality/augmented reality"<p>Still a really nice application for word2vec, and I'm looking forward to seeing other similarly practical implementations in future.