Meow Hash: A high-speed non-cryptographic hash function

173 pointsby spatulonover 6 years ago

17 comments

This is a good looking hash, just from the construction I can see that it should be extremely fast for bulk hashing of large objects and with strong statistical properties.I do want to comment for the wider audience why the details of this construction does not lend itself to small keys, and how you would fix that. I have an (unpublished, I'm lazy) AES-based hash function that is very fast for both bulk hashing and small keys, which required a mostly performance neutral (likely fits in an unused ALU execution port) tweak to the bulk hashing loop.Large key performance is entirely in the bulk loop algorithm (an AES round in this case) and small key performance is entirely in the mixer/finalizer algorithm. If you have a very wide bulk loops with no mixing of the individual lanes then it usually requires a deep mixing stage that makes small keys very expensive since that is a fixed overhead. The question then becomes, how do you cheaply "bank" dispersal of bits across lanes in the bulk loop to shorten the mixing stage without creating an operation dependency between lanes that will massively reduce throughput.As an important observation, vectorizing the AES lanes makes it difficult to disperse bits because there are no good, cheap operations that move bits between lanes. However, the CPU will happily run multiple AES operations and other 128-bit operations concurrently across ALU ports if they are in independent registers. This allows you to trivially create a "mixing" lane that solely exists to aid bit dispersal because you no longer are trying to do sideways operations on a vector. So what does a mixing lane look like that doesn't create dependencies on your hashing lanes? It is so simple it is embarrassing: XOR the input data lanes. The mixing lane is worthless as a hash but it only exists to represent all the bits from all the lanes.At the mixing stage, you simply fold the mixing lane into each of the hashing lanes by running it through a round of AES -- one concurrent operation. This allows you to avoid the deep mixing stage altogether, which means small keys can be hashed very quickly since almost all of the computation is in the finalizer. Using this technique, I've been able to hash small keys in <30 cycles with concurrent AES lanes.

评论 #18263871 未加载

评论 #18264433 未加载

评论 #18266847 未加载

apoover 6 years ago

To our surprise, we found a lack of published, well-optimized, large-data hash functions. Most hash work seems to focus on small input sizes (for things like dictionary lookup) or on cryptographic quality. We wanted the fastest possible hash that would be collision-free in practice (like SHA-1 was), and we didn’t need any cryptograhic [sic] security.This is an important passage that could be explained in more detail. I can look up a list of non-cryptographic hash functions, nine of which support 128-bit output:<a href="https://en.wikipedia.org/wiki/List_of_hash_functions#Non-cryptographic_hash_functions" rel="nofollow">https://en.wikipedia.org/wiki/List_of_hash_functions#Non-cry...</a>Given that the claim is that Meow Hash is fast, where's the benchmarking to prove it's faster than the alternatives?

评论 #18263790 未加载

评论 #18263647 未加载

DrJosiahover 6 years ago

Late to the party, some actual benchmarks using an older CPU with AES-NI: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf Kernel 4.18.14 GCC 7.3.0Short version, it's ~2x as fast as anything else on my machine for data > 4k, maybe down to 1k, all the way up to > cache (where various other algorithms also max out memory bandwidth). Other non-cryptographic hashes (for comparison) on my local machine: xxHash64 -> ~13G/s, Metrohash128 -> ~13G/s, Metrohash128_crc -> ~18G/s, t1ha_crc -> ~17.5G/s, etc., but Meow doesn't break the 1G/s barrier until 64-96 bytes (the others mentioned reach that speed at 8 bytes).So yeah, faster for longer keys due to the fast aes mixing + parallel pipeline usage, but slower for shorter keys due to the final mixing dependency at the end.Using a hacked-up version of Smhasher for wider range of hashing sizes (cache on this CPU is 20 megs):<pre><code> --- Testing Meow1_64 "parallel aes internals" [[[ Speed Tests ]]] Bulk speed test - 67108864-byte keys Average - 4.556 bytes/cycle - 14337.09 MiB/sec @ 3 ghz Bulk speed test - 16777216-byte keys Average - 7.882 bytes/cycle - 24805.52 MiB/sec @ 3 ghz Bulk speed test - 4194304-byte keys Average - 9.813 bytes/cycle - 30884.30 MiB/sec @ 3 ghz Bulk speed test - 1048576-byte keys Average - 9.657 bytes/cycle - 30390.48 MiB/sec @ 3 ghz Bulk speed test - 262144-byte keys Average - 10.642 bytes/cycle - 33490.19 MiB/sec @ 3 ghz Bulk speed test - 65536-byte keys Average - 11.871 bytes/cycle - 37360.76 MiB/sec @ 3 ghz Bulk speed test - 16384-byte keys Average - 11.336 bytes/cycle - 35675.98 MiB/sec @ 3 ghz Bulk speed test - 4096-byte keys Average - 8.241 bytes/cycle - 25936.55 MiB/sec @ 3 ghz Bulk speed test - 1024-byte keys Average - 4.056 bytes/cycle - 12765.26 MiB/sec @ 3 ghz</code></pre>

ardaozkalover 6 years ago

I just ran a simple collision test against it with real life, prod data. It resulted in 13 collisions on 116275 files in 512b mode.<a href="https://github.com/cmuratori/meow_hash/issues/7#issuecomment-431657191" rel="nofollow">https://github.com/cmuratori/meow_hash/issues/7#issuecomment...</a>

评论 #18278528 未加载

martincmartinover 6 years ago

How does it compare to:FarmHashCityHashSMHasherMetroHashFNV1a_YTNote that if you're just using it for a hash table, where you'll compare the full strings to resolve hash collisions, you don't actually need good randomness. FNV1A_VT is faster than most, has worse collision properties, yet provides a faster overall hash table, because the faster hash (every operation) outweighs the extra full string checks (rare).<a href="https://github.com/rurban/smhasher" rel="nofollow">https://github.com/rurban/smhasher</a>

评论 #18263622 未加载

评论 #18264179 未加载

评论 #18263275 未加载

评论 #18263810 未加载

aapplebyover 6 years ago

Yep, unrolled AES instructions are the way to go if you need to hash gigs of stuff and you're on a CPU that supports them.-Austin, SMHasher author.

esaymover 6 years ago

Never heard of these guys, but they got several hundred hours of video footage of building a game from scratch: <a href="https://www.youtube.com/user/handmadeheroarchive/videos" rel="nofollow">https://www.youtube.com/user/handmadeheroarchive/videos</a> looks interesting.

pmoriciover 6 years ago

I wonder if speed will still be a reason for doing something like this going forward now that Intel is starting to include SHA acceleration instructions in its processors.

评论 #18264217 未加载

评论 #18263677 未加载

based2over 6 years ago

<a href="https://www.reddit.com/r/programming/comments/9pr5ir/meow_hash_a_noncryptographic_hash_capable_of_16/" rel="nofollow">https://www.reddit.com/r/programming/comments/9pr5ir/meow_ha...</a>

GolDDranksover 6 years ago

I just happen to need a fast, non-cryptographic checksum for large amount of data. I assumed using CRC32C would be the fastest thing available on x86_64 since SSE 4.2 includes an intrinsic for it.Can I expect Meow Hash being faster than that on x86_64? What about on some other platforms such as common ARM chips? Which one is more collision resistant?Also: the writer seems to be Casey Muratori, who is also the author of the Handmade Hero series! Kudos.

评论 #18263990 未加载

wolfhumbleover 6 years ago

Will this hash be useful to check if the content of files of all sizes has changed or not? Thinking about storing the hash of a file in a database to evaluate when changes has taken place.

man-and-laptopover 6 years ago

<a href="https://github.com/cmuratori/meow_hash/blob/master/meow_hash.h" rel="nofollow">https://github.com/cmuratori/meow_hash/blob/master/meow_hash...</a>What do AESRotate(foo,bar) and AESMerge(foo,bar) do?[edit]So it's a simple cyclic rotation preceded by A.Q0 = _mm512_aesdec_epi128(A.Q0, B.Q0)From Googling, _mm512_aesdec_epi128 looks like an Intel processor built-in.

im3w1lover 6 years ago

> Don’t use the Meow hash for tiny inputs. The minimum block size for the Meow hash is 256 bytes, so if you’re feeding it 8 byte data, 16 byte data, etc., it’s going to waste all of its time padding the inputThis seems like an important caveat that means it will only be used in specialized applications.

评论 #18265675 未加载

maq1234over 6 years ago

I think releasing it on yet-another-open-source-license instead of some of well-known ones severely degrades the possibility for this code to be used by companies.

评论 #18263689 未加载

TimTheTinkerover 6 years ago

I wonder if a cryptographer could take a look at Meow Hash. I'm curious how it would fail first when subjected to modern cryptanalytic methods.

ameliusover 6 years ago

How fast is it on other platforms, e.g. ARM?

评论 #18265473 未加载

ynnivover 6 years ago

Not even a mention of MD5?

评论 #18263325 未加载