Drepper's "What Every Programmer Should Know About Memory" [1] is a good resource on a similar topic. Not so long ago, there was an analysis done on it in a series of blog posts [2] from a more modern perspective.<p>[1] <a href="https://people.freebsd.org/~lstewart/articles/cpumemory.pdf" rel="nofollow">https://people.freebsd.org/~lstewart/articles/cpumemory.pdf</a><p>[2] <a href="https://samueleresca.net/analysis-of-what-every-programmer-should-know-about-memory/" rel="nofollow">https://samueleresca.net/analysis-of-what-every-programmer-s...</a>
In a similar vein, Andrew Kelly, the creator of Zig, gave a nice talk about how to make use of the different speeds of different CPU operations in designing programs: Practical Data-Oriented Design <a href="https://vimeo.com/649009599" rel="nofollow">https://vimeo.com/649009599</a>
In case you are wondering about your cache-line size on a Linux box, you can find it in sysfs.. something like..<p><pre><code> cat /sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size</code></pre>
Something I've experienced first hand. Programming the ps3 forced you to manually do what CPU caches does in the background, which is why the ps3 was a pain in the butt for programmers who were so used to object-oriented style programming.<p>It forced you to think in terms of: [array of input data -> operation -> array of intermediate data -> operation -> array of final output data]<p>Our OOP game engine had to transform their OOP data to array of input data before feeding it into operation, basically a lot of unnecessary memory copies. We had to break objects into "operations", which was not intuitive. But, that got rid a lot of memory copies. Only then we managed to get decent performance.<p>The good thing, by doing this we also get automatic performance increase on the xbox360 because we were consciously ? unconsciously ? optimizing for cache usage.
I learned so much from this blog and from the discussion. HN is so awesome. +1 for learning about lacpu -C here.<p>A while back I had to create a high speed steaming data processor (not a spark cluster and similar creatures), but a c program that could sit in-line in a high speed data stream and match specific patterns and take actions based on the type of pattern that hit. As part of optimizing for speed and throughput a colleague and I did an obnoxious level of experimentation with read sizes (slurps of data) to minimize io wait queues and memory pressure. Being aligned with the cache-line size, either 1x or 2x was the winner. Good low level close to the hardware c fun for sure.
I think cache coherency protocols are less intuitive and less talked about when people discuss about caching, so it would be nice to have some discussion on that too.<p>But otherwise this is a good general overview of how caching is useful.
Really cool stuff and a nice introduction but curious how much modern compilers do for you already. Especially if you shift to the JIT world - what ends up being the difference between code where people optimize for this vs write in a style optimized around code readability/reuse/etc.
"On the other hand, data coming from main memory cannot be assumed to be sequential and the data cache implementation will try to only fetch the data that was asked for."<p>Not correct. Prefetching has been around for a while, and rather important in optimization.