Here's a fun and relevant paper:<p>A Comparison of Cache Aware and Cache Oblivious Static Search Trees Using Program Instrumentation<p><a href="http://www.cs.amherst.edu/~ccm/cs34/papers/ladnerbst.pdf" rel="nofollow">http://www.cs.amherst.edu/~ccm/cs34/papers/ladnerbst.pdf</a>
...This is news? You've got a tight loop with a branch, and depending on which way the branch goes, you're going to touch completely different locations in memory.<p>The solution (at least for some problems) is just to lay your data out in memory better - optimal is a binary search tree in an array, like the way heaps are implemented.<p>That's what I did for bcache: <a href="http://evilpiepirate.org/git/linux-bcache.git/tree/drivers/md/bcache/bset.h#n56" rel="nofollow">http://evilpiepirate.org/git/linux-bcache.git/tree/drivers/m...</a>