Ask HN: Books about full text search?

232 pointsby sopromoover 2 years ago

I would love to learn more about FTS at a very low level and I'm looking for books to read more on that topic. Any good suggestions ?

17 comments

binarymaxover 2 years ago

“Relevant search” by Doug Turnbull and John Berryman, published by Manning, is THE best book to get started with tuning search engines.I’be been a search engineer for >10 years and this is always the first book I recommend.<a href="https://www.manning.com/books/relevant-search" rel="nofollow">https://www.manning.com/books/relevant-search</a>

评论 #33736916 未加载

ssnover 2 years ago

Three reference textbooks are available openly:* Introduction to Information Retrieval, <a href="http://informationretrieval.org/" rel="nofollow">http://informationretrieval.org/</a>* Information Retrieval in Practice, <a href="http://www.search-engines-book.com/" rel="nofollow">http://www.search-engines-book.com/</a>* Entity-Oriented Search, <a href="https://eos-book.org/" rel="nofollow">https://eos-book.org/</a>Modern Information Retrieval is also a classic reference. Not openly available but some contents are (were?) available online. Their site seems to be down but the Internet Archive has a copy.Additional resources here:* <a href="https://nlp.stanford.edu/IR-book/information-retrieval.html" rel="nofollow">https://nlp.stanford.edu/IR-book/information-retrieval.html</a> <a href="http://web.archive.org/web/20220708135205/http://grupoweb.upf.es/mir2ed/" rel="nofollow">http://web.archive.org/web/20220708135205/http://grupoweb.up...</a>

评论 #33737848 未加载

100kover 2 years ago

At a general audience level, "Index" is on my list to read. It covers the invention of the index up to digital search engines. <a href="https://www.nytimes.com/2022/02/09/books/review-index-history-of-dennis-duncan.html" rel="nofollow">https://www.nytimes.com/2022/02/09/books/review-index-histor...</a>"Introduction to Information Retrieval" is a textbook which is available online <a href="https://nlp.stanford.edu/IR-book/" rel="nofollow">https://nlp.stanford.edu/IR-book/</a> Here's a review: <a href="http://glinden.blogspot.com/2009/02/book-review-introduction-to-information.html" rel="nofollow">http://glinden.blogspot.com/2009/02/book-review-introduction...</a>Another textbook which IMHO is a bit lower level is "Information Retrieval: Implementing and Evaluating Search Engines". The book website is down for me right now, but you can find it on Amazon here: <a href="https://www.amazon.com/Information-Retrieval-Implementing-Evaluating-Engines/dp/0262026511" rel="nofollow">https://www.amazon.com/Information-Retrieval-Implementing-Ev...</a>Another commenter linked to "Relevant Search", which is great if you want to learn how to effectively use a search engine to improve relevance (as opposed to how to implement a search engine). It's old, but another book in that vein that was really helpful for me earlier in my career is Lucene in Action: <a href="https://www.amazon.com/Lucene-Action-Second-Covers-Apache/dp/1933988177/" rel="nofollow">https://www.amazon.com/Lucene-Action-Second-Covers-Apache/dp...</a>

评论 #33737231 未加载

DamonHDover 2 years ago

Managing Gigabytes<a href="https://books.google.co.uk/books/about/Managing_Gigabytes.html?id=2F74jyPl48EC&redir_esc=y" rel="nofollow">https://books.google.co.uk/books/about/Managing_Gigabytes.ht...</a>Old but good!

评论 #33735974 未加载

评论 #33742751 未加载

francoisprunierover 2 years ago

Not a book, but this paper from 2019 covers a lot of ground and reviews the different topics extensively: <a href="https://tonellotto.github.io/publication/fntir/fntir_main.pdf" rel="nofollow">https://tonellotto.github.io/publication/fntir/fntir_main.pd...</a>

pixelmonkeyover 2 years ago

Take a look at my post “Lucene: The Good Parts”—<a href="https://blog.parse.ly/lucene/" rel="nofollow">https://blog.parse.ly/lucene/</a>The book mentioned there is Lucene in Action.And then this YouTube presentation by a Lucene/Elasticsearch committer will give you a nice overview of some related algorithms—<a href="https://youtu.be/eQ-rXP-D80U" rel="nofollow">https://youtu.be/eQ-rXP-D80U</a>

brudgersover 2 years ago

Not a book but Hellerstein’s CS186 from 2015 starting with Lecture 17 gave me a basic understanding (I think).Playlist <a href="https://youtube.com/playlist?list=PLhMnuBfGeCDPtyC9kUf_hG_QwjYzZ0Am1" rel="nofollow">https://youtube.com/playlist?list=PLhMnuBfGeCDPtyC9kUf_hG_Qw...</a>Also from that lecture series, the low level is always IO. One disk read tends to dwarf n^2 in-memory algorithms.And IO is all about tuning caches and hardware for the specific structural relationships in the data, the way in which it is accessed, and the hardware everything runs on.Good luck.

MonkoftheFunkover 2 years ago

Hotz... Is that you... Trying to learn to improve Twitter search? ;)

fiedziaover 2 years ago

<a href="https://www.manning.com/books/relevant-search" rel="nofollow">https://www.manning.com/books/relevant-search</a>Also "taming text"

评论 #33735075 未加载

评论 #33735901 未加载

vdfsover 2 years ago

Lucene in Action, good introduction to Lucene, which can be helpful to learn ElasticSearch (most used FTS these days)

评论 #33736804 未加载

tgvover 2 years ago

Check the literature of open courses on Text Retrieval. E.g. <a href="https://stanford.edu/class/cs276/" rel="nofollow">https://stanford.edu/class/cs276/</a>

Beefinover 2 years ago

series of tutorials and comparisons that aim to teach a foundations about vector search:<a href="https://vectorsearch.dev/" rel="nofollow">https://vectorsearch.dev/</a>

cb321over 2 years ago

It's all in the Nim programming language, but if you prefer reading code or running diffs then you might get a vague sense of (some) low level nuts & bolts from: <a href="https://github.com/c-blake/nimsearch" rel="nofollow">https://github.com/c-blake/nimsearch</a>

User23over 2 years ago

Is there some better alternative to Knuth-Morris-Pratt or Boyer-Moore? Both can easily be adapted to regular expression matching and as far as I know there’s no faster algorithm that doesn’t do preprocessing.

Beefinover 2 years ago

Stanford's NLP course:<a href="https://www.youtube.com/playlist?list=PLoROMvodv4rOSH4v6133s9LFPRHjEmbmJ" rel="nofollow">https://www.youtube.com/playlist?list=PLoROMvodv4rOSH4v6133s...</a>

leeseonwookover 2 years ago

123

评论 #33739963 未加载

unixheroover 2 years ago

Just use Postgres fulltext Search, its good enough <a href="http://rachbelaid.com/postgres-full-text-search-is-good-enough/" rel="nofollow">http://rachbelaid.com/postgres-full-text-search-is-good-enou...</a>

评论 #33738929 未加载

17 comments

binarymaxover 2 years ago

评论 #33736916 未加载

ssnover 2 years ago

评论 #33737848 未加载

100kover 2 years ago

评论 #33737231 未加载

DamonHDover 2 years ago

评论 #33735974 未加载

评论 #33742751 未加载

francoisprunierover 2 years ago

pixelmonkeyover 2 years ago

brudgersover 2 years ago

MonkoftheFunkover 2 years ago

Hotz... Is that you... Trying to learn to improve Twitter search? ;)

fiedziaover 2 years ago

<a href="https://www.manning.com/books/relevant-search" rel="nofollow">https://www.manning.com/books/relevant-search</a>Also "taming text"

评论 #33735075 未加载

评论 #33735901 未加载

vdfsover 2 years ago

Lucene in Action, good introduction to Lucene, which can be helpful to learn ElasticSearch (most used FTS these days)

评论 #33736804 未加载

tgvover 2 years ago

Check the literature of open courses on Text Retrieval. E.g. <a href="https://stanford.edu/class/cs276/" rel="nofollow">https://stanford.edu/class/cs276/</a>

Beefinover 2 years ago

series of tutorials and comparisons that aim to teach a foundations about vector search:<a href="https://vectorsearch.dev/" rel="nofollow">https://vectorsearch.dev/</a>

cb321over 2 years ago

User23over 2 years ago

Beefinover 2 years ago

Stanford's NLP course:<a href="https://www.youtube.com/playlist?list=PLoROMvodv4rOSH4v6133s9LFPRHjEmbmJ" rel="nofollow">https://www.youtube.com/playlist?list=PLoROMvodv4rOSH4v6133s...</a>