科技回声

8 条评论

mastax大约 3 年前

> Try zstd --adapt. This feature automatically adapts the compression ratio as the IO conditions change to make the current optimal tradeoff between CPU and “keeping the pipe fed”.I didn't know about that, that's neat.These types of articles come up often, and it's good to proselytize about better algorithms. However the end of the article hints at an issue. Most of the hashing and compression in my life are done embedded in some system or protocol that I can't easily change. Yeah, Docker and Debian and Firefox should use zstd but there's not much I can do about it. I may reach for zstd when I'm moving a big file between systems, but I'd have to install it first and much of the time that's not worthwhile.

评论 #31292188 未加载

oconnor663大约 3 年前

> For example, one situation where you want an intentionally slow algorithm is when dealing with passwords.This is true, but there's also more to the story. Modern password hashes like Argon2 force attackers into a "time-memory tradeoff", to try to reduce the advantage that specialized hardware has over the general-purpose computers that human beings use. I find that a lot of folks have memorized a summary like "slow hashes are good", and that's doubly unfortunate, because 1) like the article says, fast hashes are good, and 2) slow hashes in and of themselves are no longer the best we can do for password security. I often wish that password hashes weren't even called "hashes", because it's just a very different problem that they're solving.> or even just a high number of rounds of SHA-512Please god no :)

评论 #31293983 未加载

评论 #31295919 未加载

评论 #31293902 未加载

HWR_14大约 3 年前

There are a lot of interesting algorithms, but as the author points out, he had to move the data to a RAM drive to avoid the disk access being the limiting factor. For a lot of use cases, it's not the CPU that is going to limit you.I did love the anecdote about adding gzip moving the bottleneck to the cpu from the network, and actually slowing down the whole system.

评论 #31291746 未加载

评论 #31291844 未加载

jonstewart大约 3 年前

These are some of my favorite tricks. SHA2-256 is never “fast” but it can be really slow if you’re using a non-SIMD implementation (openssl uses SIMD).A trick not mentioned here: for Python devs, import “orjson” instead of the json standard library; it is usually a drop-in replacement.

评论 #31292737 未加载

评论 #31292638 未加载

评论 #31293690 未加载

ur-whale大约 3 年前

There's always a huge focus on speed in these type of articles, and rightfully so for most applications.However, I often find myself in situations where I care neither about speed nor any other of the traditional performance metrics (memory consumption, latency, bandwidth, parallelizability, etc ...).In these situations, what I actually care about is:<pre><code> a) code size (as in: fits compiled in 512 bytes) b) code simplicity (as in: fits in head, takes up around 20 C++ LOC) </code></pre> Very unfortunately, there are very, very few algorithms that are designed to optimize along these lines.Exceptions:TEA : <a href="https://en.wikipedia.org/wiki/Tiny_Encryption_Algorithm" rel="nofollow">https://en.wikipedia.org/wiki/Tiny_Encryption_Algorithm</a>Speck: <a href="https://en.wikipedia.org/wiki/Speck_(cipher)" rel="nofollow">https://en.wikipedia.org/wiki/Speck_(cipher)</a>Both of these are ciphers, and both can be perverted into becoming hashes for various scenarios.But, there aren't any compression or native crypto-hard hashing algorithms that I know of that are specifically designed to optimize along that particular dimension.Blake3 or zstd are large pieces of code.

评论 #31302414 未加载

jorangreef大约 3 年前

This is a great list—to add two categories:If you need bloom filters, then use split block bloom filters [1] which use SIMD to increase speed from 30%-450%.If you need erasure coding, then use Cauchy Reed-Solomon instead of Reed-Solomon. Cauchy Reed-Solomon uses pure XOR so you can do erasure coding at the speed of per-core memory bandwidth.[1] <a href="https://arxiv.org/abs/2101.01719" rel="nofollow">https://arxiv.org/abs/2101.01719</a>

theden大约 3 年前

AWS S3 right now supports CRC32, CRC32C, SHA-1, and SHA-256 (<a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html" rel="nofollow">https://docs.aws.amazon.com/AmazonS3/latest/userguide/checki...</a>), which is interesting given the announcement to support the four wasn't long ago (<a href="https://aws.amazon.com/blogs/aws/new-additional-checksum-algorithms-for-amazon-s3/)—it" rel="nofollow">https://aws.amazon.com/blogs/aws/new-additional-checksum-alg...</a> seems like they went with what's more widely used than actually faster?

评论 #31292663 未加载

评论 #31291997 未加载

评论 #31292755 未加载

getup8大约 3 年前

We are considering using MD5 for user id + salt hashing for randomizing users for A/B testing. Should blake3 or xxhash work as a replacement for that use case?

评论 #31293261 未加载

评论 #31295340 未加载

8 条评论

mastax大约 3 年前

评论 #31292188 未加载

oconnor663大约 3 年前

评论 #31293983 未加载

评论 #31295919 未加载

评论 #31293902 未加载

HWR_14大约 3 年前

评论 #31291746 未加载

评论 #31291844 未加载

jonstewart大约 3 年前

评论 #31292737 未加载

评论 #31292638 未加载

评论 #31293690 未加载

ur-whale大约 3 年前

评论 #31302414 未加载

jorangreef大约 3 年前

theden大约 3 年前

评论 #31292663 未加载

评论 #31291997 未加载

评论 #31292755 未加载

getup8大约 3 年前

We are considering using MD5 for user id + salt hashing for randomizing users for A/B testing. Should blake3 or xxhash work as a replacement for that use case?

评论 #31293261 未加载

评论 #31295340 未加载

Use fast data algorithms (2021)

8 条评论

Use fast data algorithms (2021)

8 条评论