The article looks great but a doubling of speed actually seems kind of disappointing (What I'd like to see is would something like "big O" improvement).<p>I have been investigating cache-oblivious data structures for a while. They're impressive, complex and a bit difficult to fathom (for a cache-oblivious tree, you have to consider both relational structure created by pointers and the layout of all the pointer in memory - the article seems like it gives a nice overview compared to academic articles I've looked at).<p>The thing is, in most of the large applications I've tried to optimize, you could squeeze several speed doublings out of fairly easy inner-loop rewrites as soon as you started profiling the app. After that, you got more marginal results from optimizing this and that. Consider; if 25% of the time is spent retrieving data, a halving of time to do this results in a 12.5% increase in performance. Is that 12.5% worth the complexity of implementing a cache oblivious structure? Even if you application is a full database, I'm not sure if the trade-offs are worth it.