I implemented the new t-SNE in sklearn, so I've got some experience in reading these diagrams. Unfortunately, as wonderful as the algorithm is, it's extremely hard to interpret what it means rigorously. I've seen many diagrams that look like this one -- and they were generated from actual noise. So take the plots with a big grain of salt :)<p>I'd be interested in seeing more direct evidence, like SVD factorizing the PMI matrix (which is what similar to what word2vec is doing) and seeing how much of the variance is explained by the first components. If you want to do this, check out: <a href="https://minhlab.wordpress.com/2015/06/08/a-new-proof-for-the-equivalence-of-word2vec-skip-gram-and-shifted-ppmi/" rel="nofollow">https://minhlab.wordpress.com/2015/06/08/a-new-proof-for-the...</a>
I think this approach has a lot of potential, and I wonder what a statistical comparison of character co-occurrences between the Voynich manuscript and other writing systems would reveal. For anyone curious, here is Stephen Bax's video on his 2014 findings.<p><a href="https://m.youtube.com/watch?index=1&v=fpZD_3D8_WQ&list=LLATcCtXq6Eg7iFjmWQ1CNkA" rel="nofollow">https://m.youtube.com/watch?index=1&v=fpZD_3D8_WQ&list=LLATc...</a><p>He believes he has translated about 10 words in the manuscript, which is huge, and he thinks the script may have been invented to express a language once spoken between the near east and the Himalayas, maybe Turkic or Caucasian...
You know, I really like this, because it's an example of the kind of structure machine learning finds without my own understanding of the training set clouding my understanding of the machine's understanding.
Very interesting approach, but I would say this is just a scratch. There are several factors that might really limit statistical analysis of this manuscript [1].<p>[1] <a href="http://www.ciphermysteries.com/2013/03/09/this-week-a-talk-at-stanford-on-the-voynich-manuscript" rel="nofollow">http://www.ciphermysteries.com/2013/03/09/this-week-a-talk-a...</a>
First time I hear about the cultural extinction theory. If that were the case, shouldn't there be more documents using the same script? But assuming the theory is right. Is there any way to decipher it without finding a Rosetta stone?