TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Use fast data algorithms (2021)

221 点作者 Ice_cream_suit大约 3 年前

8 条评论

mastax大约 3 年前
&gt; Try zstd --adapt. This feature automatically adapts the compression ratio as the IO conditions change to make the current optimal tradeoff between CPU and “keeping the pipe fed”.<p>I didn&#x27;t know about that, that&#x27;s neat.<p>These types of articles come up often, and it&#x27;s good to proselytize about better algorithms. However the end of the article hints at an issue. Most of the hashing and compression in my life are done embedded in some system or protocol that I can&#x27;t easily change. Yeah, Docker and Debian and Firefox should use zstd but there&#x27;s not much I can do about it. I may reach for zstd when I&#x27;m moving a big file between systems, but I&#x27;d have to install it first and much of the time that&#x27;s not worthwhile.
评论 #31292188 未加载
oconnor663大约 3 年前
&gt; For example, one situation where you want an intentionally slow algorithm is when dealing with passwords.<p>This is true, but there&#x27;s also more to the story. Modern password hashes like Argon2 force attackers into a &quot;time-memory tradeoff&quot;, to try to reduce the advantage that specialized hardware has over the general-purpose computers that human beings use. I find that a lot of folks have memorized a summary like &quot;slow hashes are good&quot;, and that&#x27;s doubly unfortunate, because 1) like the article says, fast hashes are good, and 2) slow hashes in and of themselves are no longer the best we can do for password security. I often wish that password hashes weren&#x27;t even called &quot;hashes&quot;, because it&#x27;s just a very different problem that they&#x27;re solving.<p>&gt; or even just a high number of rounds of SHA-512<p>Please god no :)
评论 #31293983 未加载
评论 #31295919 未加载
评论 #31293902 未加载
HWR_14大约 3 年前
There are a lot of interesting algorithms, but as the author points out, he had to move the data to a RAM drive to avoid the disk access being the limiting factor. For a lot of use cases, it&#x27;s not the CPU that is going to limit you.<p>I did love the anecdote about adding gzip moving the bottleneck to the cpu from the network, and actually slowing down the whole system.
评论 #31291746 未加载
评论 #31291844 未加载
jonstewart大约 3 年前
These are some of my favorite tricks. SHA2-256 is never “fast” but it can be really slow if you’re using a non-SIMD implementation (openssl uses SIMD).<p>A trick not mentioned here: for Python devs, import “orjson” instead of the json standard library; it is usually a drop-in replacement.
评论 #31292737 未加载
评论 #31292638 未加载
评论 #31293690 未加载
ur-whale大约 3 年前
There&#x27;s always a huge focus on speed in these type of articles, and rightfully so for most applications.<p>However, I often find myself in situations where I care neither about speed nor any other of the traditional performance metrics (memory consumption, latency, bandwidth, parallelizability, etc ...).<p>In these situations, what I actually care about is:<p><pre><code> a) code size (as in: fits compiled in 512 bytes) b) code simplicity (as in: fits in head, takes up around 20 C++ LOC) </code></pre> Very unfortunately, there are very, very few algorithms that are designed to optimize along these lines.<p>Exceptions:<p>TEA : <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Tiny_Encryption_Algorithm" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Tiny_Encryption_Algorithm</a><p>Speck: <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Speck_(cipher)" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Speck_(cipher)</a><p>Both of these are ciphers, and both can be perverted into becoming hashes for various scenarios.<p>But, there aren&#x27;t any compression or <i>native</i> crypto-hard hashing algorithms that I know of that are specifically designed to optimize along that particular dimension.<p>Blake3 or zstd are <i>large</i> pieces of code.
评论 #31302414 未加载
jorangreef大约 3 年前
This is a great list—to add two categories:<p>If you need bloom filters, then use split block bloom filters [1] which use SIMD to increase speed from 30%-450%.<p>If you need erasure coding, then use Cauchy Reed-Solomon instead of Reed-Solomon. Cauchy Reed-Solomon uses pure XOR so you can do erasure coding at the speed of per-core memory bandwidth.<p>[1] <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2101.01719" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2101.01719</a>
theden大约 3 年前
AWS S3 right now supports CRC32, CRC32C, SHA-1, and SHA-256 (<a href="https:&#x2F;&#x2F;docs.aws.amazon.com&#x2F;AmazonS3&#x2F;latest&#x2F;userguide&#x2F;checking-object-integrity.html" rel="nofollow">https:&#x2F;&#x2F;docs.aws.amazon.com&#x2F;AmazonS3&#x2F;latest&#x2F;userguide&#x2F;checki...</a>), which is interesting given the announcement to support the four wasn&#x27;t long ago (<a href="https:&#x2F;&#x2F;aws.amazon.com&#x2F;blogs&#x2F;aws&#x2F;new-additional-checksum-algorithms-for-amazon-s3&#x2F;)—it" rel="nofollow">https:&#x2F;&#x2F;aws.amazon.com&#x2F;blogs&#x2F;aws&#x2F;new-additional-checksum-alg...</a> seems like they went with what&#x27;s more widely used than actually faster?
评论 #31292663 未加载
评论 #31291997 未加载
评论 #31292755 未加载
getup8大约 3 年前
We are considering using MD5 for user id + salt hashing for randomizing users for A&#x2F;B testing. Should blake3 or xxhash work as a replacement for that use case?
评论 #31293261 未加载
评论 #31295340 未加载