TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Syntactic: A lexical categorizer with a pretty visualization

61 pointsby omershapiraover 12 years ago

4 comments

bravuraover 12 years ago
It seems like your cluster quality will be sensitive to the words used to seed each cluster.<p>Why not use a standard word clustering algorithm like Brown clustering? <a href="http://acl.ldc.upenn.edu/J/J92/J92-4003.pdf" rel="nofollow">http://acl.ldc.upenn.edu/J/J92/J92-4003.pdf</a><p>Percy Liang wrote a great implementation in C++ that you could plug into your visualization: <a href="http://cs.stanford.edu/~pliang/software/" rel="nofollow">http://cs.stanford.edu/~pliang/software/</a><p>Also of interest is that Brown clustering is hierarchical, so you can get coarse or fine-grained clustering.<p>[Aside: Here are some 2-d visualizations I made of word embeddings from a neural language model: <a href="http://metaoptimize.com/projects/wordreprs/" rel="nofollow">http://metaoptimize.com/projects/wordreprs/</a> ]
评论 #4390864 未加载
JunkDNAover 12 years ago
Would love to see what clusters from PubMed would look like. Anyone planning to run this on it?
评论 #4390859 未加载
评论 #4390868 未加载
dansoover 12 years ago
First of all, great work and thanks for sharing!<p>I guess I know less about NLP and clustering than I thought, but what exactly does the visualization indicate?<p>On Iteration 1/3, when I click "husband" on the sidebar and "first" shows up...what does that mean? That that's the closest cluster by distance?<p>The visualization looks nice but the accompanying text doesn't shed much light...
评论 #4390968 未加载
username3over 12 years ago
need horizontal scrollbar