TechEcho

15 comments

heisenbitover 5 years ago

Way back in my 6502 days I entered a competition for the fastest sieve program. My program had self modifying code running on the zero page. To reset the 8K memory the program employed 24k memory for the store instructions.The winning entry left me in the dust - rather than zeroing the memory in the winning program’s array was initialized with a template of the first few primes. There can be solutions that are even faster than the fastest possible zeroing.

drfuchsover 5 years ago

I’ve always wondered how much CPU time and memory bandwidth is taken up by the OS zeroing out pages before handing them out, as well as programs and libraries clearing chunks of memory.I guess it’s enough that I’m surprised that there’s no hardware support for the memory system to support a way to handle it by itself on command, without taking up bus bandwidth or CPU cycles. Kind of like old-fashioned REP STOS but handled off-chip, as it were.[Added:] Concerning various instructions for clearing whole cache lines in one go, you still end up with lots of dirty cache lines that have to be sent to L1, L2, ..., RAM (not to mention the stuff that was previously in those cache lines), so there’s still lots of bus bandwidth being consumed.

评论 #22105651 未加载

评论 #22110334 未加载

评论 #22105595 未加载

评论 #22105603 未加载

评论 #22105489 未加载

评论 #22105392 未加载

wildmusingsover 5 years ago

If you're routinely zeroing this much memory and the performance matters, you might benefit from idle-zeroing it. That is, when you need to zero the massive block, just switch to a different block that has already been zeroed or partly-zeroed in the background. Whatever hasn't already been zeroed, finish synchronously. The background thread doing the zeroing would be scheduled with the lowest priority, so that it only runs when the system otherwise has nothing to do.At first, I thought you might just want to get fresh pages from the kernel (which are always zeroed), but this answer convinced me that might not actually be faster because of the overhead from syscalls and fiddling with virtual memory <a href="https://stackoverflow.com/questions/49896578/fastest-way-to-zero-pages-in-linux" rel="nofollow">https://stackoverflow.com/questions/49896578/fastest-way-to-...</a> . And Linux doesn't idle-zero or pre-zero pages (though I believe there's a switch to enable pre-zeroing for purposes of security hardening), so you're probably gonna end up with the OS doing a synchronous[1] zeroing anyway.[1] Synchronous from when you actually first write to each page. My understanding is that when your process gets new pages, they're all mapped to a special zero-page and set to copy-on-write. So there is still some efficiency here in theory: you don't have a long wait for the entire range to be zeroed all at once and you never have to zero pages that you don't modify.

评论 #22106816 未加载

评论 #22108111 未加载

评论 #22105608 未加载

评论 #22111593 未加载

wruzaover 5 years ago

<pre><code> jmp memset </code></pre> Classic plot twist.

rob74over 5 years ago

This goes to show that sometimes it really pays to read the docs: the doc comment of "fill" says "For char types filling contiguous areas of memory, this becomes an inline call to @c memset or @c wmemset"...

评论 #22105526 未加载

评论 #22108276 未加载

评论 #22105342 未加载

eb0laover 5 years ago

> If this were C, we would probably reach for memsetActually I was thinking about bzero().Seeing memset() made me smile :-)

评论 #22105673 未加载

评论 #22105429 未加载

评论 #22106807 未加载

BeeOnRopeover 5 years ago

Author here, happy for any feedback or questions.

评论 #22105530 未加载

评论 #22105947 未加载

评论 #22107662 未加载

thestoicattackover 5 years ago

Interestingly the std::array::fill member function is identical in the case of int or char, I suppose because there's only one overload of fill and it has to take the element type. No idea if the generated stosq is as fast as built-in memset: <a href="https://godbolt.org/z/4iYGup" rel="nofollow">https://godbolt.org/z/4iYGup</a>

评论 #22108239 未加载

评论 #22108203 未加载

sradmanover 5 years ago

Daniel Lemire compares std::fill in C++ with memset in C in agreement with Travis Down <a href="https://lemire.me/blog/2020/01/20/filling-large-arrays-with-zeroes-quickly-in-c/" rel="nofollow">https://lemire.me/blog/2020/01/20/filling-large-arrays-with-...</a>

PeterHacker123over 5 years ago

Thumbs up, but at least on macOS both is equally fast :-(

评论 #22101927 未加载

pharringtonover 5 years ago

Types are meaningful. At least to me, the discovery would have been to be mindful of your types when invoking idioms you've learned.

评论 #22107368 未加载

PaulDavisThe1stover 5 years ago

another day, another reason I am glad I only know some C++ idioms. If I had a byte-oriented block of data, it would never occur to me to use std::fill() ... because it would never occur to me use std::fill() for anything at all ! :)

gumbyover 5 years ago

As an anonymous `mmap()` returns zeroed pages it's likely the fastest mechanism for large arrays.

评论 #22107261 未加载

评论 #22107854 未加载

评论 #22107276 未加载

underdeserverover 5 years ago

Another leaky abstraction.

aleccoover 5 years ago

Discussion on cpp subreddit <a href="https://www.reddit.com/r/cpp/comments/erialk/the_hunt_for_the_fastest_zero/" rel="nofollow">https://www.reddit.com/r/cpp/comments/erialk/the_hunt_for_th...</a>

15 comments

heisenbitover 5 years ago

drfuchsover 5 years ago

评论 #22105651 未加载

评论 #22110334 未加载

评论 #22105595 未加载

评论 #22105603 未加载

评论 #22105489 未加载

评论 #22105392 未加载

wildmusingsover 5 years ago

评论 #22106816 未加载

评论 #22108111 未加载

评论 #22105608 未加载

评论 #22111593 未加载

wruzaover 5 years ago

<pre><code> jmp memset </code></pre> Classic plot twist.

rob74over 5 years ago

评论 #22105526 未加载

评论 #22108276 未加载

评论 #22105342 未加载

eb0laover 5 years ago

> If this were C, we would probably reach for memsetActually I was thinking about bzero().Seeing memset() made me smile :-)

评论 #22105673 未加载

评论 #22105429 未加载

评论 #22106807 未加载

BeeOnRopeover 5 years ago

Author here, happy for any feedback or questions.

评论 #22105530 未加载

评论 #22105947 未加载

评论 #22107662 未加载

thestoicattackover 5 years ago

评论 #22108239 未加载

评论 #22108203 未加载

sradmanover 5 years ago

PeterHacker123over 5 years ago

Thumbs up, but at least on macOS both is equally fast :-(

评论 #22101927 未加载

pharringtonover 5 years ago

Types are meaningful. At least to me, the discovery would have been to be mindful of your types when invoking idioms you've learned.

评论 #22107368 未加载

PaulDavisThe1stover 5 years ago

gumbyover 5 years ago

As an anonymous `mmap()` returns zeroed pages it's likely the fastest mechanism for large arrays.

评论 #22107261 未加载

评论 #22107854 未加载

评论 #22107276 未加载

underdeserverover 5 years ago

Another leaky abstraction.

aleccoover 5 years ago

Discussion on cpp subreddit <a href="https://www.reddit.com/r/cpp/comments/erialk/the_hunt_for_the_fastest_zero/" rel="nofollow">https://www.reddit.com/r/cpp/comments/erialk/the_hunt_for_th...</a>

The Hunt for the Fastest Zero

15 comments

The Hunt for the Fastest Zero

15 comments