TechEcho

12 comments

wyldfire11 months ago

Drepper's "What Every Programmer Should Know About Memory" [1] is a good resource on a similar topic. Not so long ago, there was an analysis done on it in a series of blog posts [2] from a more modern perspective.[1] <a href="https://people.freebsd.org/~lstewart/articles/cpumemory.pdf" rel="nofollow">https://people.freebsd.org/~lstewart/articles/cpumemory.pdf</a>[2] <a href="https://samueleresca.net/analysis-of-what-every-programmer-should-know-about-memory/" rel="nofollow">https://samueleresca.net/analysis-of-what-every-programmer-s...</a>

emschwartz11 months ago

In a similar vein, Andrew Kelly, the creator of Zig, gave a nice talk about how to make use of the different speeds of different CPU operations in designing programs: Practical Data-Oriented Design <a href="https://vimeo.com/649009599" rel="nofollow">https://vimeo.com/649009599</a>

eikenberry11 months ago

In case you are wondering about your cache-line size on a Linux box, you can find it in sysfs.. something like..<pre><code> cat /sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size</code></pre>

评论 #40802993 未加载

评论 #40808224 未加载

hinkley11 months ago

Wait wait wait.M2 processors have 128 byte wide cache lines?? That's a big deal. We've been at 64 bytes since what, the Pentium?

评论 #40804092 未加载

评论 #40802728 未加载

ptelomere11 months ago

Something I've experienced first hand. Programming the ps3 forced you to manually do what CPU caches does in the background, which is why the ps3 was a pain in the butt for programmers who were so used to object-oriented style programming.It forced you to think in terms of: [array of input data -> operation -> array of intermediate data -> operation -> array of final output data]Our OOP game engine had to transform their OOP data to array of input data before feeding it into operation, basically a lot of unnecessary memory copies. We had to break objects into "operations", which was not intuitive. But, that got rid a lot of memory copies. Only then we managed to get decent performance.The good thing, by doing this we also get automatic performance increase on the xbox360 because we were consciously ? unconsciously ? optimizing for cache usage.

DLA11 months ago

I learned so much from this blog and from the discussion. HN is so awesome. +1 for learning about lacpu -C here.A while back I had to create a high speed steaming data processor (not a spark cluster and similar creatures), but a c program that could sit in-line in a high speed data stream and match specific patterns and take actions based on the type of pattern that hit. As part of optimizing for speed and throughput a colleague and I did an obnoxious level of experimentation with read sizes (slurps of data) to minimize io wait queues and memory pressure. Being aligned with the cache-line size, either 1x or 2x was the winner. Good low level close to the hardware c fun for sure.

boshalfoshal11 months ago

I think cache coherency protocols are less intuitive and less talked about when people discuss about caching, so it would be nice to have some discussion on that too.But otherwise this is a good general overview of how caching is useful.

ThatNiceGuyy11 months ago

Great article. I have always had an open question in my mind about struct alignment and this explained it very succinctly.

dangoldin11 months ago

Really cool stuff and a nice introduction but curious how much modern compilers do for you already. Especially if you shift to the JIT world - what ends up being the difference between code where people optimize for this vs write in a style optimized around code readability/reuse/etc.

评论 #40802341 未加载

评论 #40803613 未加载

评论 #40806375 未加载

slashdave11 months ago

"On the other hand, data coming from main memory cannot be assumed to be sequential and the data cache implementation will try to only fetch the data that was asked for."Not correct. Prefetching has been around for a while, and rather important in optimization.

评论 #40806264 未加载

branko_d11 months ago

Why is the natural alignment of structs equal to the size of their largest member?

评论 #40803518 未加载

评论 #40804032 未加载

seany6211 months ago

Super interesting. Thank you!

12 comments

wyldfire11 months ago

emschwartz11 months ago

eikenberry11 months ago

评论 #40802993 未加载

评论 #40808224 未加载

hinkley11 months ago

Wait wait wait.M2 processors have 128 byte wide cache lines?? That's a big deal. We've been at 64 bytes since what, the Pentium?

评论 #40804092 未加载

评论 #40802728 未加载

ptelomere11 months ago

DLA11 months ago

boshalfoshal11 months ago

ThatNiceGuyy11 months ago

Great article. I have always had an open question in my mind about struct alignment and this explained it very succinctly.

dangoldin11 months ago

评论 #40802341 未加载

评论 #40803613 未加载

评论 #40806375 未加载

slashdave11 months ago

评论 #40806264 未加载

branko_d11 months ago

Why is the natural alignment of structs equal to the size of their largest member?

评论 #40803518 未加载

评论 #40804032 未加载

seany6211 months ago

Super interesting. Thank you!

Exploring How Cache Memory Works

12 comments

Exploring How Cache Memory Works

12 comments