TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Faster string searching and matching with Rabin-Karp algorithm

38 点作者 freeatnet大约 13 年前

6 条评论

ratzkewatzke大约 13 年前
This article is a mess, and people need to stop voting it up. For example, it contains this perplexing passage:<p>"Let’s say we have the string “hello world”, and let’s assume that its hash is hash(‘hello world’) = 12345. So if hash(‘he’) = 1 we can say that the pattern “he” is contained in the text “hello world”"<p>What the heck is he talking about here?<p>His hash function is O(m), where m is the length of the pattern being searched for. This makes the implementation trivially O(mn), which he notes, writing in another passage:<p>"The Rabin-Karp algorithm has the complexity of O(nm) where n, of course, is the length of the text, while m is the length of the pattern. [...] However it’s considered that Rabin-Karp’s complexity is O(n+m) in practice"<p>Next time I submit a code review, I'm going to "consider" my complexity this way.<p>This isn't a good guide to Rabin-Karp, which is effectively discussed on Wikipedia, in Cormen, etc. It is probably a very effective way of attracting ad dollars for the banners all over the page.<p>Avoid, avoid.
评论 #3811304 未加载
albertzeyer大约 13 年前
It lacks an important detail: For any random hash function, the complexity is O(n * m) as in the naive implementation. You need a rolling hash function where you can get the next hash value of the next sub-string in constant type.<p>The implementation is also wrong. First of all, it has that detail wrong, thus it is O(n * m). But the wrong part is the comparison; it just compares the hash and not the value itself.<p>More about Rabin-Karp: <a href="http://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm" rel="nofollow">http://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm</a><p>(Btw., are those tags auto-generated? It's funny that it got the 'rolling hash' tag but the article doesn't mention it at all.)
kevingadd大约 13 年前
No discussion of whether it's actually faster (no benchmarks, etc) other than some handwaving about algorithmic complexity. Meh. :(<p>Clever algorithm, but I would be shocked if it were actually faster than Boyer-Moore or KMP on a modern computer for average search cases.
评论 #3809865 未加载
tomkinstinch大约 13 年前
We shouldn't forget about Boyer-Moore or Knuth-Morris-Pratt either.<p><a href="http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm" rel="nofollow">http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_sear...</a><p><a href="http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm" rel="nofollow">http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pr...</a>
评论 #3809954 未加载
perlgeek大约 13 年前
If I want to match multiple patterns at the same time, wouldn't it be more efficient to build a (possibly tagged) DFA from them? Then the performance would be independent from the number of patterns to match (modulo some startup cost to construct the DFA in the first place).
tocomment大约 13 年前
Does this beat burrows wheeler?
评论 #3809950 未加载
评论 #3810269 未加载