It seems like your cluster quality will be sensitive to the words used to seed each cluster.<p>Why not use a standard word clustering algorithm like Brown clustering? <a href="http://acl.ldc.upenn.edu/J/J92/J92-4003.pdf" rel="nofollow">http://acl.ldc.upenn.edu/J/J92/J92-4003.pdf</a><p>Percy Liang wrote a great implementation in C++ that you could plug into your visualization: <a href="http://cs.stanford.edu/~pliang/software/" rel="nofollow">http://cs.stanford.edu/~pliang/software/</a><p>Also of interest is that Brown clustering is hierarchical, so you can get coarse or fine-grained clustering.<p>[Aside: Here are some 2-d visualizations I made of word embeddings from a neural language model: <a href="http://metaoptimize.com/projects/wordreprs/" rel="nofollow">http://metaoptimize.com/projects/wordreprs/</a> ]
First of all, great work and thanks for sharing!<p>I guess I know less about NLP and clustering than I thought, but what exactly does the visualization indicate?<p>On Iteration 1/3, when I click "husband" on the sidebar and "first" shows up...what does that mean? That that's the closest cluster by distance?<p>The visualization looks nice but the accompanying text doesn't shed much light...