TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Latent Dirichlet Allocation Surprisingly Well Correlated w/ Google Rankings

58 pointsby randfishalmost 15 years ago

5 comments

nkurzalmost 15 years ago
This is a good layman's introduction to modern search techniques, but to someone not in the SEO field it feels like a very strange inversion of priorities. To me, like most people, the surprise is how effective techniques like LDA[1] can be in characterizing a document, but the 'surprise' in the article is that LDA correlates to Google search order better than a more simplistic model.<p>To a technologically savvy but naive outsider, this might seem obvious: shouldn't pages that rank highly in Google have strong topic-based correlation to pages that the user wants to see? But from the SEO perspective, I guess the conclusion would be that your page is more likely to be ranked highly if it includes all the trappings of other high ranked pages, with, you know, like synonyms and stuff. At a certain point, one has to start thinking, wouldn't it be simpler to make a page that people actually want to find?<p>Are there good examples of actually useful pages that Google doesn't do a good job of ranking? I occasionally find myself lately getting frustrated with Google about ignoring my rarer search terms, but generally I find the good pages are at the top if they exist at all.<p>[1] LDA is Latent Dirichlet Allocation, which is very similar to Latent Semantic Analysis, which in turn is very similar to Principle Component Analysis and Singular Value Decomposition. So it's possible you've already heard of the concept, but coming from another angle in another field.
评论 #1668722 未加载
评论 #1668251 未加载
moultanoalmost 15 years ago
All good ranking functions are pretty correlated. There are many ways for a ranking to be bad, and few ways for it to be good.
nlalmost 15 years ago
This is news? Seriously????<p>They have found a correlation between a set of words related to a topic you are searching for and how highly a search engine ranks that page?<p>Well duh! Did anyone really think search engines did a keyword search and then applied Pagerank/HITS (<a href="http://en.wikipedia.org/wiki/HITS_algorithm" rel="nofollow">http://en.wikipedia.org/wiki/HITS_algorithm</a>) or whatever? That would give dreadful results.<p>If you really want to understand this, I recommend <i>Building a Vector Space Search Engine in Perl</i> <a href="http://perl.about.com/b/2007/05/24/building-a-vector-space-search-engine-in-perl.htm" rel="nofollow">http://perl.about.com/b/2007/05/24/building-a-vector-space-s...</a><p>I build the vector space classifier in <a href="http://classifier4j.sf.net" rel="nofollow">http://classifier4j.sf.net</a> based almost entirely on that article, even though I don't know Perl. It's very readable, and gives you a great understanding.
评论 #1668324 未加载
评论 #1668302 未加载
评论 #1668334 未加载
mark_l_watsonalmost 15 years ago
I sometimes use LDA (using Hadoop and Mahout) and it is not an inexpensive calculation for large document sets). I wonder what the costs are for using this large scale.
madridoramaalmost 15 years ago
I'm sorry but this is overthinking something that is relatively simple to understand