The "deep" in deep learning refers to hierarchical layers of representations (to note: you can do "deep learning" without neural networks).<p>Word embeddings using skipgram or CBOW are a shallow method (single-layer representation). Remarkably, in order to stay interpretable, word embeddings <i>have</i> to be shallow. If you distributed the predictive task (eg. skip-gram) over several layers, the resulting geometric spaces would be much less interpretable.<p>So: this is not deep learning, and this not being deep learning is in fact the core feature.
> <i>word2vec is a Deep Learning technique first described by Tomas Mikolov only 2 years ago but due to its simplicity of algorithm and yet surprising robustness of the results, it has been widely implemented and adopted.</i><p>… And patented <a href="http://www.freepatentsonline.com/9037464.html" rel="nofollow">http://www.freepatentsonline.com/9037464.html</a>
I am not getting the "Obama + Russia - USA = Putin" piece nor the "King + Woman - Man" bit either. Nothing particularly meaningful came up on a search for the latter. Could someone explain?
When I see things like this, it makes me wonder how much data forms each of these vectors; if a single article were to say things about Obama, or humans and animals, would it produce these results?