TechEcho

6 comments

kccqzyover 2 years ago

A friend of mine is working on a custom memory controller for specialized devices at Google Cloud. The software people are giving the hardware people tremendous pressure in making sure the memory controller works well with all common powers-of-two strides, besides random strides.<p>There are some (very primitive by software standards) "hash functions" implemented in hardware to make that work. IIRC at that time they treat the relevant bits in an address as a vector in GF(2) and multiply by a certain hardcoded invertible matrix. (Of course there are properties for that matrix besides being invertible.) The idea is that if a single bit in the input address changes, all output bits should have equal probability of flipping and not flipping (the avalanche effect).

评论 #33435067 未加载

评论 #33433752 未加载

评论 #33434329 未加载

评论 #33432025 未加载

mastaxover 2 years ago

I can't tell you how many large-power-of-two stride loops I've seen. I'm going to propose that CPU (or compiler) makers make sure they aren't 10x slower.

评论 #33430158 未加载

评论 #33430599 未加载

评论 #33429662 未加载

posix_compliantover 2 years ago

I’m confused by the cache graphs: it looks like the direct-mapped cache has multiple lines drawn from a single cache line to different blocks in main memory.<p>From what I understand, a hardware cache wouldn’t store a line multiple times in main memory.

评论 #33429786 未加载

评论 #33429771 未加载

bell-cotover 2 years ago

Old Memory from the 1990's: Some high-end (back then) CPU had large (for then), fast L2 caches...which were merely 2(?)-way associative. But to mitigate the performance hit from cache thrashing, it also had a small-ish 64(?)-way associative "helper" D-cache.<p>(It might have been HP PA-RISC, from an article in Byte Magazine - but that's digging deep into the old memory murk.)<p>Might anyone recall more?

posix_compliantover 2 years ago

I have a(nother) question. If you’re iterating through a long array with chunksize larger than the cache line size, then every iteration of the loop will cause a cache miss. So it shouldn’t matter if it’s a power of two or not, every call to a[i]++ will be a cache miss. What gives?

评论 #33463188 未加载

alfiedotwtfover 2 years ago

If this website was a physical book, I would instabuy!

评论 #33431421 未加载

6 comments

kccqzyover 2 years ago

评论 #33435067 未加载

评论 #33433752 未加载

评论 #33434329 未加载

评论 #33432025 未加载

mastaxover 2 years ago

I can't tell you how many large-power-of-two stride loops I've seen. I'm going to propose that CPU (or compiler) makers make sure they aren't 10x slower.

Cache Associativity

6 comments

Cache Associativity

6 comments