TechEcho

8 comments

This is a an interesting way of doing text search over a collection of documents, that I suppose makes sense if you only have a small number of documents to search.Generally, an inverted index [1] is used rather than a trie, and collection-based term weights are calculated using something like IDF [2]. You could then use something like cosine to compute length normalised rankings for documents.There are of course many different approaches to text search, but this is the first time I've personally come across this one.If anyone is interested in an in depth look at how text search works, an Introduction to Information Retrieval [3] is available for free and is a fantastic book.[1] <a href="https://en.wikipedia.org/wiki/Inverted_index" rel="nofollow">https://en.wikipedia.org/wiki/Inverted_index</a>[2] <a href="https://en.wikipedia.org/wiki/Tf%E2%80%93idf" rel="nofollow">https://en.wikipedia.org/wiki/Tf%E2%80%93idf</a>[3] <a href="https://nlp.stanford.edu/IR-book/" rel="nofollow">https://nlp.stanford.edu/IR-book/</a>

评论 #15677315 未加载

ameliusover 7 years ago

Finally a post on search algorithms on HN again!Pardon my surprise, but I think it's kind of strange that we see so little on HN of this topic, which is one of the pillars of Computer Science, and also at the core of one of the most successful IT companies.

评论 #15677627 未加载

wycover 7 years ago

If you're a working programmer looking for a bit more depth on text searching, check out Taming Text by Ingersoll. It was written by some real pros, and it's very practical. It goes over everything from the first naive approaches to what drives many engines like Lucene today.<a href="https://www.amazon.com/Taming-Text-Find-Organize-Manipulate/dp/193398838X" rel="nofollow">https://www.amazon.com/Taming-Text-Find-Organize-Manipulate/...</a>

stochastic_monkover 7 years ago

I'm a little surprised to not be seeing some classical search algorithms.For online pattern searching, I'd want Boyer-Moore with the the Galil rule for guaranteed linear search. [Due to an unfortunate accident in C++ history, Boost and C++17 skip the Galil rule, leading to O(n+m) runtime vs. O(nm).]For offline, Burrows-Wheeler has performance linear with query string and independent of the length of the searched document.

评论 #15677703 未加载

bootcatover 7 years ago

Certainly its worth taking a look at another javascript library which works for fuzzy search called, fuse, <a href="http://fusejs.io/" rel="nofollow">http://fusejs.io/</a>.

jihadjihadover 7 years ago

This is neat, and reminds me a bit of the tricks used by RediSearch. I came across it on HN recently and have been geeking out on it since! <a href="http://redisearch.io" rel="nofollow">http://redisearch.io</a>

johnhenryover 7 years ago

Lunr, named to correspond to Apache Solr, is also worth a look. <a href="https://lunrjs.com/" rel="nofollow">https://lunrjs.com/</a>

hnrcover 7 years ago

<a href="https://news.ycombinator.com/item?id=14805395" rel="nofollow">https://news.ycombinator.com/item?id=14805395</a>

评论 #15680216 未加载

评论 #15680206 未加载

8 comments

JamesMcMinnover 7 years ago

评论 #15677315 未加载

ameliusover 7 years ago

评论 #15677627 未加载

wycover 7 years ago

stochastic_monkover 7 years ago

评论 #15677703 未加载

bootcatover 7 years ago

Certainly its worth taking a look at another javascript library which works for fuzzy search called, fuse, <a href="http://fusejs.io/" rel="nofollow">http://fusejs.io/</a>.

jihadjihadover 7 years ago

johnhenryover 7 years ago

Lunr, named to correspond to Apache Solr, is also worth a look. <a href="https://lunrjs.com/" rel="nofollow">https://lunrjs.com/</a>

hnrcover 7 years ago

<a href="https://news.ycombinator.com/item?id=14805395" rel="nofollow">https://news.ycombinator.com/item?id=14805395</a>

评论 #15680216 未加载

评论 #15680206 未加载

The math and algorithms behind a search library

8 comments

The math and algorithms behind a search library

8 comments