Reminds me of the hilarious column by James Mickens, "The Slow Winter", on how these kinds of improvements used to be easy:<p>"“I wish that we could predict branches more accurately,” and you’d think, “maybe we can leverage three bits of state per branch to implement a simple saturating counter,” and you’d laugh and declare that such a stupid scheme would never work, but then you’d test it and it would be 94% accurate, and the branches would wake up the next morning and read their newspapers and the headlines would say OUR WORLD HAS BEEN SET ON FIRE."<p>PDF: <a href="https://www.usenix.org/system/files/1309_14-17_mickens.pdf" rel="nofollow">https://www.usenix.org/system/files/1309_14-17_mickens.pdf</a>
I just recently took an advanced computer architecture course, and I was stunned by how little I knew about how computers ~actually~ work.<p>Most people think "memory is fast, disk is slow" but the reality is "high-level caches are fast, other caches are slow, memory is really slow, and stay the hell away from disk". You're always taught to think in terms of memory but the good news is that very intelligent cache designers have made accessing memory directly a relatively infrequent event, and that's one of the only reasons our processors get to do any work at all.
Link to the paper: <a href="http://people.csail.mit.edu/devadas/pubs/acc-hpca14.pdf" rel="nofollow">http://people.csail.mit.edu/devadas/pubs/acc-hpca14.pdf</a>
Cache misses are a major problem many people don't think about or shy away from because you just need to assume they'll happen.<p>Missing L1 cache costs between 10-60 cycles, your 10x that for L2. Access to the direct RAM because you also missed L3? you start dealing in 100,000+ processor cycles of hang.<p>Glad to see were moving forward.