Up until now the gap between processor and memory speed was growing. Do you know of any hardware advances that this gap is going to shrink sometime in the future?<p>I'm thinking about data structures, which are currently outperformed by smarter cache-line friendly algorithms as for instance balanced mutable binary search trees vs smart trie or B-tree implementations. These index structures are of course meant as an example for the wide range of algorithms, which have to take cache-lines into account nowadays.<p>Do you think these "old" algorithms are getting some kind of renaissance in the future?
The gap will probably widen. Bandwidth is increasing in some cases (HBM, Apple's wide RAM) but latency is not getting any lower and there's nothing on the horizon that would lower latency. Unless your data is pretty small so that it fits completely in L3...
The way to close the gap is incompatible with the way we're all used to programming. Alternative computing architectures are required to get there.<p>Imagine a RISC CPU with a chunk of Static RAM, running slow (100 Mhz), and synchronously, with no need for any of the usual tweaks like cache, pipelines, branch prediction, etc. Such a CPU would be very small, and if you sized the RAM correctly, you could put all of it into a grid that would more than make up for the slow clock rate, and still be fairly low power.<p>In the end, though, I think we'll end up programming arrays of 4x4 bit look up tables that talk to their neighbors. But not directly, through a few layers of optimizing compilers. The advantage is that it could be fast, fault tolerant, and would all run in parallel. You could compute as fast as you could shovel data in one end, and out the other.