TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

The Hunt for the Fastest Zero

133 pointsby nikbackmover 5 years ago

15 comments

heisenbitover 5 years ago
Way back in my 6502 days I entered a competition for the fastest sieve program. My program had self modifying code running on the zero page. To reset the 8K memory the program employed 24k memory for the store instructions.<p>The winning entry left me in the dust - rather than zeroing the memory in the winning program’s array was initialized with a template of the first few primes. There can be solutions that are even faster than the fastest possible zeroing.
drfuchsover 5 years ago
I’ve always wondered how much CPU time and memory bandwidth is taken up by the OS zeroing out pages before handing them out, as well as programs and libraries clearing chunks of memory.<p>I guess it’s enough that I’m surprised that there’s no hardware support for the memory system to support a way to handle it by itself on command, without taking up bus bandwidth or CPU cycles. Kind of like old-fashioned REP STOS but handled off-chip, as it were.<p>[Added:] Concerning various instructions for clearing whole cache lines in one go, you still end up with lots of dirty cache lines that have to be sent to L1, L2, ..., RAM (not to mention the stuff that was previously in those cache lines), so there’s still lots of bus bandwidth being consumed.
评论 #22105651 未加载
评论 #22110334 未加载
评论 #22105595 未加载
评论 #22105603 未加载
评论 #22105489 未加载
评论 #22105392 未加载
wildmusingsover 5 years ago
If you&#x27;re routinely zeroing this much memory and the performance matters, you might benefit from idle-zeroing it. That is, when you need to zero the massive block, just switch to a different block that has already been zeroed or partly-zeroed in the background. Whatever hasn&#x27;t already been zeroed, finish synchronously. The background thread doing the zeroing would be scheduled with the lowest priority, so that it only runs when the system otherwise has nothing to do.<p>At first, I thought you might just want to get fresh pages from the kernel (which are always zeroed), but this answer convinced me that might not actually be faster because of the overhead from syscalls and fiddling with virtual memory <a href="https:&#x2F;&#x2F;stackoverflow.com&#x2F;questions&#x2F;49896578&#x2F;fastest-way-to-zero-pages-in-linux" rel="nofollow">https:&#x2F;&#x2F;stackoverflow.com&#x2F;questions&#x2F;49896578&#x2F;fastest-way-to-...</a> . And Linux doesn&#x27;t idle-zero or pre-zero pages (though I believe there&#x27;s a switch to enable pre-zeroing for purposes of security hardening), so you&#x27;re probably gonna end up with the OS doing a synchronous[1] zeroing anyway.<p>[1] Synchronous from when you actually first write to each page. My understanding is that when your process gets new pages, they&#x27;re all mapped to a special zero-page and set to copy-on-write. So there is still some efficiency here in theory: you don&#x27;t have a long wait for the entire range to be zeroed all at once and you never have to zero pages that you don&#x27;t modify.
评论 #22106816 未加载
评论 #22108111 未加载
评论 #22105608 未加载
评论 #22111593 未加载
wruzaover 5 years ago
<p><pre><code> jmp memset </code></pre> Classic plot twist.
rob74over 5 years ago
This goes to show that sometimes it really pays to read the docs: the doc comment of &quot;fill&quot; says &quot;For char types filling contiguous areas of memory, this becomes an inline call to @c memset or @c wmemset&quot;...
评论 #22105526 未加载
评论 #22108276 未加载
评论 #22105342 未加载
eb0laover 5 years ago
&gt; If this were C, we would probably reach for memset<p>Actually I was thinking about bzero().<p>Seeing memset() made me smile :-)
评论 #22105673 未加载
评论 #22105429 未加载
评论 #22106807 未加载
BeeOnRopeover 5 years ago
Author here, happy for any feedback or questions.
评论 #22105530 未加载
评论 #22105947 未加载
评论 #22107662 未加载
thestoicattackover 5 years ago
Interestingly the std::array::fill member function is identical in the case of int or char, I suppose because there&#x27;s only one overload of fill and it has to take the element type. No idea if the generated stosq is as fast as built-in memset: <a href="https:&#x2F;&#x2F;godbolt.org&#x2F;z&#x2F;4iYGup" rel="nofollow">https:&#x2F;&#x2F;godbolt.org&#x2F;z&#x2F;4iYGup</a>
评论 #22108239 未加载
评论 #22108203 未加载
sradmanover 5 years ago
Daniel Lemire compares std::fill in C++ with memset in C in agreement with Travis Down <a href="https:&#x2F;&#x2F;lemire.me&#x2F;blog&#x2F;2020&#x2F;01&#x2F;20&#x2F;filling-large-arrays-with-zeroes-quickly-in-c&#x2F;" rel="nofollow">https:&#x2F;&#x2F;lemire.me&#x2F;blog&#x2F;2020&#x2F;01&#x2F;20&#x2F;filling-large-arrays-with-...</a>
PeterHacker123over 5 years ago
Thumbs up, but at least on macOS both is equally fast :-(
评论 #22101927 未加载
pharringtonover 5 years ago
Types are meaningful. At least to me, the discovery would have been to be mindful of your types when invoking idioms you&#x27;ve learned.
评论 #22107368 未加载
PaulDavisThe1stover 5 years ago
another day, another reason I am glad I only know <i>some</i> C++ idioms. If I had a byte-oriented block of data, it would never occur to me to use std::fill() ... because it would never occur to me use std::fill() for anything at all ! :)
gumbyover 5 years ago
As an anonymous `mmap()` returns zeroed pages it&#x27;s likely the fastest mechanism for large arrays.
评论 #22107261 未加载
评论 #22107854 未加载
评论 #22107276 未加载
underdeserverover 5 years ago
Another leaky abstraction.
aleccoover 5 years ago
Discussion on cpp subreddit <a href="https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;cpp&#x2F;comments&#x2F;erialk&#x2F;the_hunt_for_the_fastest_zero&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;cpp&#x2F;comments&#x2F;erialk&#x2F;the_hunt_for_th...</a>