TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

The Weaknesses of Full Text Searching (2008) [pdf]

29 pointsby lemonspatover 4 years ago

3 comments

btrettelover 4 years ago
As a (junior) patent examiner, the weaknesses of text search were discussed in my training and have become very clear over time. Many people today think that text search is the &quot;be-all and end-all&quot; of search, but if one wants to be comprehensive, text search should only be one part of a search strategy. Other major components include citation search (forwards and backwards) and classification search.<p>Google can identify many synonyms today, but my experience has been Google frequently misses important synonyms. I&#x27;ve started compiling lists of synonyms and even partial search queries (medical searches call these &quot;hedges&quot;) to use when searching. The problem of synonyms is one place where citation and classification search shine, as they are independent of the terminology used (and even <i>language</i> independent in the case of a classification like the IPC). There&#x27;s no one &quot;best&quot; approach; each of these approaches complement each other. And you can do a &quot;combination&quot; search, e.g., of all the documents citing this document, return all that contain a keyword.<p>Unfortunately classification search has fallen out of favor among the general population, but I can see systems like the Dewey Decimal System being extremely useful when the terminology in a field varies appreciably. Classification search is extremely useful in my work.<p>When I have the time I&#x27;ll take a close look at this article. Thanks for posting it.
aduffyover 4 years ago
At least w.r.t. to the synonym problem, more modern search techniques that rely on language modeling and word representations seem to solve that.
评论 #25542433 未加载
评论 #25541572 未加载
visargaover 4 years ago
In theory there is amazing progress in retrieval, but in practice we have Google. Maybe their motives are not aligned with search improvement after all.<p>The problem of meaning disambiguation has been solved with neural nets to a much higher degree than it appears in Google&#x27;s search engine.