科技回声

6 条评论

This article is a mess, and people need to stop voting it up. For example, it contains this perplexing passage:"Let’s say we have the string “hello world”, and let’s assume that its hash is hash(‘hello world’) = 12345. So if hash(‘he’) = 1 we can say that the pattern “he” is contained in the text “hello world”"What the heck is he talking about here?His hash function is O(m), where m is the length of the pattern being searched for. This makes the implementation trivially O(mn), which he notes, writing in another passage:"The Rabin-Karp algorithm has the complexity of O(nm) where n, of course, is the length of the text, while m is the length of the pattern. [...] However it’s considered that Rabin-Karp’s complexity is O(n+m) in practice"Next time I submit a code review, I'm going to "consider" my complexity this way.This isn't a good guide to Rabin-Karp, which is effectively discussed on Wikipedia, in Cormen, etc. It is probably a very effective way of attracting ad dollars for the banners all over the page.Avoid, avoid.

评论 #3811304 未加载

albertzeyer大约 13 年前

It lacks an important detail: For any random hash function, the complexity is O(n * m) as in the naive implementation. You need a rolling hash function where you can get the next hash value of the next sub-string in constant type.The implementation is also wrong. First of all, it has that detail wrong, thus it is O(n * m). But the wrong part is the comparison; it just compares the hash and not the value itself.More about Rabin-Karp: <a href="http://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm" rel="nofollow">http://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm</a>(Btw., are those tags auto-generated? It's funny that it got the 'rolling hash' tag but the article doesn't mention it at all.)

kevingadd大约 13 年前

No discussion of whether it's actually faster (no benchmarks, etc) other than some handwaving about algorithmic complexity. Meh. :(Clever algorithm, but I would be shocked if it were actually faster than Boyer-Moore or KMP on a modern computer for average search cases.

评论 #3809865 未加载

tomkinstinch大约 13 年前

We shouldn't forget about Boyer-Moore or Knuth-Morris-Pratt either.<a href="http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm" rel="nofollow">http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_sear...</a><a href="http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm" rel="nofollow">http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pr...</a>

评论 #3809954 未加载

perlgeek大约 13 年前

If I want to match multiple patterns at the same time, wouldn't it be more efficient to build a (possibly tagged) DFA from them? Then the performance would be independent from the number of patterns to match (modulo some startup cost to construct the DFA in the first place).

Faster string searching and matching with Rabin-Karp algorithm

6 条评论

Faster string searching and matching with Rabin-Karp algorithm

6 条评论