科技回声

5 条评论

stabbles大约 1 年前

Regarding "A final optimization", there's another way:`_mm256_cmpeq_epi32` produces `0xFFFFFFFF`, the article suggests shifting to produce a `1` and then add.Instead you can interpret `0xFFFFFFFF` as negative one, and subtract. That saves a shift.Flip the sign when accumulating.In general I think this is a pretty common counting trick. I don't think those shift operations even exists for epi8, so there you really need to use it to avoid reduction to a narrow register. Also, in the case of epi8 you need to deal with overflow, so the pattern is like this in pseudo code:<pre><code> v[1:32] = 0 total = 0 for j = 0 to N / 256 for i = 0 to 255 v[1:32] -= cmpeq(..., ...) end total += sum(v) end</code></pre>

评论 #39653296 未加载

pvg大约 1 年前

Short related discussion a month ago: <a href="https://news.ycombinator.com/item?id=39254176">https://news.ycombinator.com/item?id=39254176</a>

perihelions大约 1 年前

I wonder what a designed-for-SIMD rolling hash would look like. Like—just a trivial toy experiment—suppose hash ↤ (hash * input[j]) << 1. That's a (very crappy) rolling-hash function with a window size of sizeof(hash). It's also algebraically equivalent to hash ↤ hash * (input[j] << 1), which is a parallelizable horizontal-product over the input bytevector. This shows that you can keep the "rolling window" property, without the "evaluates serially from left-to-right" requirement: it's possible to rearrange things algebraically so that the "folding" step can be evaluated in any order.Are there nontrivial constructions with this property?

EdSchouten大约 1 年前

Are there actually practical cases where Rabin-Karp hashing is what dominates the running time of an application? The naive implementation already gives you 0.75 GB/s. Seems pretty fast.

评论 #39650366 未加载

评论 #39650376 未加载

评论 #39650204 未加载

mis1988大约 1 年前

A response to a question posed by Daniel Lemire.

5 条评论

stabbles大约 1 年前

评论 #39653296 未加载

pvg大约 1 年前

Short related discussion a month ago: <a href="https://news.ycombinator.com/item?id=39254176">https://news.ycombinator.com/item?id=39254176</a>

perihelions大约 1 年前

EdSchouten大约 1 年前

Are there actually practical cases where Rabin-Karp hashing is what dominates the running time of an application? The naive implementation already gives you 0.75 GB/s. Seems pretty fast.

Optimizing Rabin-Karp Hashing

5 条评论

Optimizing Rabin-Karp Hashing

5 条评论