TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Voynich Manuscript: word vectors and t-SNE visualization of some patterns

131 pointsby peroneover 9 years ago

7 comments

juxtaposicionover 9 years ago
I implemented the new t-SNE in sklearn, so I&#x27;ve got some experience in reading these diagrams. Unfortunately, as wonderful as the algorithm is, it&#x27;s extremely hard to interpret what it means rigorously. I&#x27;ve seen many diagrams that look like this one -- and they were generated from actual noise. So take the plots with a big grain of salt :)<p>I&#x27;d be interested in seeing more direct evidence, like SVD factorizing the PMI matrix (which is what similar to what word2vec is doing) and seeing how much of the variance is explained by the first components. If you want to do this, check out: <a href="https:&#x2F;&#x2F;minhlab.wordpress.com&#x2F;2015&#x2F;06&#x2F;08&#x2F;a-new-proof-for-the-equivalence-of-word2vec-skip-gram-and-shifted-ppmi&#x2F;" rel="nofollow">https:&#x2F;&#x2F;minhlab.wordpress.com&#x2F;2015&#x2F;06&#x2F;08&#x2F;a-new-proof-for-the...</a>
评论 #10929776 未加载
评论 #10931989 未加载
评论 #10928685 未加载
vonnikover 9 years ago
I think this approach has a lot of potential, and I wonder what a statistical comparison of character co-occurrences between the Voynich manuscript and other writing systems would reveal. For anyone curious, here is Stephen Bax&#x27;s video on his 2014 findings.<p><a href="https:&#x2F;&#x2F;m.youtube.com&#x2F;watch?index=1&amp;v=fpZD_3D8_WQ&amp;list=LLATcCtXq6Eg7iFjmWQ1CNkA" rel="nofollow">https:&#x2F;&#x2F;m.youtube.com&#x2F;watch?index=1&amp;v=fpZD_3D8_WQ&amp;list=LLATc...</a><p>He believes he has translated about 10 words in the manuscript, which is huge, and he thinks the script may have been invented to express a language once spoken between the near east and the Himalayas, maybe Turkic or Caucasian...
评论 #10928025 未加载
评论 #10929783 未加载
danharajover 9 years ago
You know, I really like this, because it&#x27;s an example of the kind of structure machine learning finds without my own understanding of the training set clouding my understanding of the machine&#x27;s understanding.
haddrover 9 years ago
Very interesting approach, but I would say this is just a scratch. There are several factors that might really limit statistical analysis of this manuscript [1].<p>[1] <a href="http:&#x2F;&#x2F;www.ciphermysteries.com&#x2F;2013&#x2F;03&#x2F;09&#x2F;this-week-a-talk-at-stanford-on-the-voynich-manuscript" rel="nofollow">http:&#x2F;&#x2F;www.ciphermysteries.com&#x2F;2013&#x2F;03&#x2F;09&#x2F;this-week-a-talk-a...</a>
评论 #10937391 未加载
评论 #10929693 未加载
lawpoopover 9 years ago
&gt;&gt;&gt; model.most_similar(&quot;queen&quot;)<p>[(u&#x27;princess&#x27;, 0.519856333732605), (u&#x27;latifah&#x27;, 0.47644317150115967),
splitbrainover 9 years ago
First time I hear about the cultural extinction theory. If that were the case, shouldn&#x27;t there be more documents using the same script? But assuming the theory is right. Is there any way to decipher it without finding a Rosetta stone?
评论 #10933602 未加载
acqqover 9 years ago
Do we get any new insight with this?
评论 #10929253 未加载