TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Latent Semantic Analysis Tutorial

63 点作者 rouli超过 12 年前

5 条评论

beagle3超过 12 年前
State of the art implementation, using random projection for reasonably accurate yet hundreds to thousands of times faster: <a href="http://radimrehurek.com/gensim/" rel="nofollow">http://radimrehurek.com/gensim/</a><p>Random Projection is something you should be aware of if you do any kind of large dimensional modeling. It <i>is</i> magic.
评论 #4595609 未加载
textminer超过 12 年前
Truncated SVD has been a wonderful tool for "cleaning up" pairwise cosine similarity data for text document comparisons, graph/network building (for a visual representation of entities represented by documents, embedded in something like Gephi/Sigma.js/D3), and for item-based recommendation systems.<p>The biggest problems I then run into involves choosing a "k" (the dimensions allowed in your truncation). Have had some thoughts about training this unsupervised method (providing labeled data for what "oughta" be the top nearest neighbors for this particular entity, and optimizing toward that) or building an ensemble method on top of many SVD'd truncated vector spaces (though the combination method is unclear to me-- pick kNN from a linear combination of each model's outcomes? Pick the intersection of each method's k nearest neighbors?)<p>To novices looking at this tutorial: NumPy's a wonderful tool for small toy examples, but at a certain scale you will depend heavily on the sparse matrix formats provided for you by SciPy. (That and random projections should curb any memory problems you'll run into for many vector space-based problems, short of operating at a Google/Yahoo scale, or if your target's TBs of logging data).
评论 #4594925 未加载
elchief超过 12 年前
FYI<p>What are the differences among latent semantic analysis (LSA), latent semantic indexing (LSI), and singular value decomposition (SVD)?<p><a href="http://stats.stackexchange.com/questions/4735/what-are-the-differences-among-latent-semantic-analysis-lsa-latent-semantic-i" rel="nofollow">http://stats.stackexchange.com/questions/4735/what-are-the-d...</a>
kleiba超过 12 年前
Also: "LSA was patented in 1988 (US Patent 4,839,853) by Scott Deerwester, Susan Dumais, George Furnas, Richard Harshman, Thomas Landauer, Karen Lochbaum and Lynn Streeter."<p><a href="https://en.wikipedia.org/wiki/Latent_semantic_analysis" rel="nofollow">https://en.wikipedia.org/wiki/Latent_semantic_analysis</a>
评论 #4595207 未加载
jackhammer2022超过 12 年前
Also here is a nice paper to get started with LSI : <a href="http://www2.denizyuret.com/ref/berry/berry95using.pdf" rel="nofollow">http://www2.denizyuret.com/ref/berry/berry95using.pdf</a>