TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Books about full text search?

232 pointsby sopromoover 2 years ago
I would love to learn more about FTS at a very low level and I'm looking for books to read more on that topic. Any good suggestions ?

17 comments

binarymaxover 2 years ago
“Relevant search” by Doug Turnbull and John Berryman, published by Manning, is THE best book to get started with tuning search engines.<p>I’be been a search engineer for &gt;10 years and this is always the first book I recommend.<p><a href="https:&#x2F;&#x2F;www.manning.com&#x2F;books&#x2F;relevant-search" rel="nofollow">https:&#x2F;&#x2F;www.manning.com&#x2F;books&#x2F;relevant-search</a>
评论 #33736916 未加载
ssnover 2 years ago
Three reference textbooks are available openly:<p>* Introduction to Information Retrieval, <a href="http:&#x2F;&#x2F;informationretrieval.org&#x2F;" rel="nofollow">http:&#x2F;&#x2F;informationretrieval.org&#x2F;</a><p>* Information Retrieval in Practice, <a href="http:&#x2F;&#x2F;www.search-engines-book.com&#x2F;" rel="nofollow">http:&#x2F;&#x2F;www.search-engines-book.com&#x2F;</a><p>* Entity-Oriented Search, <a href="https:&#x2F;&#x2F;eos-book.org&#x2F;" rel="nofollow">https:&#x2F;&#x2F;eos-book.org&#x2F;</a><p>Modern Information Retrieval is also a classic reference. Not openly available but some contents are (were?) available online. Their site seems to be down but the Internet Archive has a copy.<p>Additional resources here:<p>* <a href="https:&#x2F;&#x2F;nlp.stanford.edu&#x2F;IR-book&#x2F;information-retrieval.html" rel="nofollow">https:&#x2F;&#x2F;nlp.stanford.edu&#x2F;IR-book&#x2F;information-retrieval.html</a> <a href="http:&#x2F;&#x2F;web.archive.org&#x2F;web&#x2F;20220708135205&#x2F;http:&#x2F;&#x2F;grupoweb.upf.es&#x2F;mir2ed&#x2F;" rel="nofollow">http:&#x2F;&#x2F;web.archive.org&#x2F;web&#x2F;20220708135205&#x2F;http:&#x2F;&#x2F;grupoweb.up...</a>
评论 #33737848 未加载
100kover 2 years ago
At a general audience level, &quot;Index&quot; is on my list to read. It covers the invention of the index up to digital search engines. <a href="https:&#x2F;&#x2F;www.nytimes.com&#x2F;2022&#x2F;02&#x2F;09&#x2F;books&#x2F;review-index-history-of-dennis-duncan.html" rel="nofollow">https:&#x2F;&#x2F;www.nytimes.com&#x2F;2022&#x2F;02&#x2F;09&#x2F;books&#x2F;review-index-histor...</a><p>&quot;Introduction to Information Retrieval&quot; is a textbook which is available online <a href="https:&#x2F;&#x2F;nlp.stanford.edu&#x2F;IR-book&#x2F;" rel="nofollow">https:&#x2F;&#x2F;nlp.stanford.edu&#x2F;IR-book&#x2F;</a> Here&#x27;s a review: <a href="http:&#x2F;&#x2F;glinden.blogspot.com&#x2F;2009&#x2F;02&#x2F;book-review-introduction-to-information.html" rel="nofollow">http:&#x2F;&#x2F;glinden.blogspot.com&#x2F;2009&#x2F;02&#x2F;book-review-introduction...</a><p>Another textbook which IMHO is a bit lower level is &quot;Information Retrieval: Implementing and Evaluating Search Engines&quot;. The book website is down for me right now, but you can find it on Amazon here: <a href="https:&#x2F;&#x2F;www.amazon.com&#x2F;Information-Retrieval-Implementing-Evaluating-Engines&#x2F;dp&#x2F;0262026511" rel="nofollow">https:&#x2F;&#x2F;www.amazon.com&#x2F;Information-Retrieval-Implementing-Ev...</a><p>Another commenter linked to &quot;Relevant Search&quot;, which is great if you want to learn how to effectively use a search engine to improve relevance (as opposed to how to implement a search engine). It&#x27;s old, but another book in that vein that was really helpful for me earlier in my career is Lucene in Action: <a href="https:&#x2F;&#x2F;www.amazon.com&#x2F;Lucene-Action-Second-Covers-Apache&#x2F;dp&#x2F;1933988177&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.amazon.com&#x2F;Lucene-Action-Second-Covers-Apache&#x2F;dp...</a>
评论 #33737231 未加载
DamonHDover 2 years ago
Managing Gigabytes<p><a href="https:&#x2F;&#x2F;books.google.co.uk&#x2F;books&#x2F;about&#x2F;Managing_Gigabytes.html?id=2F74jyPl48EC&amp;redir_esc=y" rel="nofollow">https:&#x2F;&#x2F;books.google.co.uk&#x2F;books&#x2F;about&#x2F;Managing_Gigabytes.ht...</a><p>Old but good!
评论 #33735974 未加载
评论 #33742751 未加载
francoisprunierover 2 years ago
Not a book, but this paper from 2019 covers a lot of ground and reviews the different topics extensively: <a href="https:&#x2F;&#x2F;tonellotto.github.io&#x2F;publication&#x2F;fntir&#x2F;fntir_main.pdf" rel="nofollow">https:&#x2F;&#x2F;tonellotto.github.io&#x2F;publication&#x2F;fntir&#x2F;fntir_main.pd...</a>
pixelmonkeyover 2 years ago
Take a look at my post “Lucene: The Good Parts”—<p><a href="https:&#x2F;&#x2F;blog.parse.ly&#x2F;lucene&#x2F;" rel="nofollow">https:&#x2F;&#x2F;blog.parse.ly&#x2F;lucene&#x2F;</a><p>The book mentioned there is Lucene in Action.<p>And then this YouTube presentation by a Lucene&#x2F;Elasticsearch committer will give you a nice overview of some related algorithms—<p><a href="https:&#x2F;&#x2F;youtu.be&#x2F;eQ-rXP-D80U" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;eQ-rXP-D80U</a>
brudgersover 2 years ago
Not a book but Hellerstein’s CS186 from 2015 starting with Lecture 17 gave me a basic understanding (I think).<p>Playlist <a href="https:&#x2F;&#x2F;youtube.com&#x2F;playlist?list=PLhMnuBfGeCDPtyC9kUf_hG_QwjYzZ0Am1" rel="nofollow">https:&#x2F;&#x2F;youtube.com&#x2F;playlist?list=PLhMnuBfGeCDPtyC9kUf_hG_Qw...</a><p>Also from that lecture series, the low level is always IO. One disk read tends to dwarf n^2 in-memory algorithms.<p>And IO is all about tuning caches and hardware for the specific structural relationships in the data, the way in which it is accessed, and the hardware everything runs on.<p>Good luck.
MonkoftheFunkover 2 years ago
Hotz... Is that you... Trying to learn to improve Twitter search? ;)
fiedziaover 2 years ago
<a href="https:&#x2F;&#x2F;www.manning.com&#x2F;books&#x2F;relevant-search" rel="nofollow">https:&#x2F;&#x2F;www.manning.com&#x2F;books&#x2F;relevant-search</a><p>Also &quot;taming text&quot;
评论 #33735075 未加载
评论 #33735901 未加载
vdfsover 2 years ago
Lucene in Action, good introduction to Lucene, which can be helpful to learn ElasticSearch (most used FTS these days)
评论 #33736804 未加载
tgvover 2 years ago
Check the literature of open courses on Text Retrieval. E.g. <a href="https:&#x2F;&#x2F;stanford.edu&#x2F;class&#x2F;cs276&#x2F;" rel="nofollow">https:&#x2F;&#x2F;stanford.edu&#x2F;class&#x2F;cs276&#x2F;</a>
Beefinover 2 years ago
series of tutorials and comparisons that aim to teach a foundations about vector search:<p><a href="https:&#x2F;&#x2F;vectorsearch.dev&#x2F;" rel="nofollow">https:&#x2F;&#x2F;vectorsearch.dev&#x2F;</a>
cb321over 2 years ago
It&#x27;s all in the Nim programming language, but if you prefer reading code or running diffs then you might get a vague sense of (some) low level nuts &amp; bolts from: <a href="https:&#x2F;&#x2F;github.com&#x2F;c-blake&#x2F;nimsearch" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;c-blake&#x2F;nimsearch</a>
User23over 2 years ago
Is there some better alternative to Knuth-Morris-Pratt or Boyer-Moore? Both can easily be adapted to regular expression matching and as far as I know there’s no faster algorithm that doesn’t do preprocessing.
Beefinover 2 years ago
Stanford&#x27;s NLP course:<p><a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;playlist?list=PLoROMvodv4rOSH4v6133s9LFPRHjEmbmJ" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;playlist?list=PLoROMvodv4rOSH4v6133s...</a>
leeseonwookover 2 years ago
123
评论 #33739963 未加载
unixheroover 2 years ago
Just use Postgres fulltext Search, its good enough <a href="http:&#x2F;&#x2F;rachbelaid.com&#x2F;postgres-full-text-search-is-good-enough&#x2F;" rel="nofollow">http:&#x2F;&#x2F;rachbelaid.com&#x2F;postgres-full-text-search-is-good-enou...</a>
评论 #33738929 未加载