I read the presentation here: <a href="http://tokutek.com/downloads/mysqluc-2010-fractal-trees.pdf" rel="nofollow">http://tokutek.com/downloads/mysqluc-2010-fractal-trees.pdf</a><p>The math looks a bit hand-wavy to me but I get the basic data structure, you basically have sorted arrays whose lengths are each a power of two, so 1, 2, 4, 8, etc...
When you insert a set of values, you can either fill an entire array or leave it empty. So for example if you have 5 values, you'd have an array of 1 element, an empty array with 2 spaces and a full array of 4 elements.<p>Each array is sorted by itself but the smaller arrays do not necessarily contain all values that are lesser than all values of bigger arrays.<p>To do an efficient lookup you have forward pointers from each node to the next-bigger node in the next array, so you can basically start your binary search at a given index in a bigger array and move faster.<p>The problem is when you are inserting lots of values because you have to merge arrays multiple times and overwrite a bunch of data, but the point is that you are getting better disk IO doing that than rebalancing a B-Tree because you're not making the disk head skip around all the time, thus achieving greater speed.<p>I'm curious about the details of this benchmark though, what kind of values are we talking about? Are they all the same size rows or do you have variable sized values (like strings?) in the benchmark?<p>This looks promising but I'd love more explanations from the authors...
Please note that the actual name for a "fractal tree" in the research literature is "Streaming (cache oblivious) B-Tree" (or at least they've very very closely related).<p>Writing good code wrt memory locality is SUPER important for writing high performance code, whether its in memory work, or larger than ram (eg for the DB). Also a fun exercise to try to understand how!
><i>At 3.5 million inserted documents, the exit velocity of standard MongoDB was 2.11 inserts per second...</i><p>That's terrifying - do people expect performance like this? Or was this crafted to be a pathological case? 100 element arrays don't seem too common, but that only makes this 300 million entries in e.g. a SQL table - I suspect my laptop running MySQL could outdo that kind of performance (but have no proof. I could be very wrong).
The title is a little misleading. It looks like an asyntotic improvement, that is 532x at the scale the benchmark ran. Looking at the graph, it appears to be a significant improvement, as the old version clearly dropped to about 0, while the new version looks constant. (It took some staring to see the downward trend)
I've discussed these results with the team at 10gen and their comment is basically "that's interesting but we're not looking at it at this time."<p>All told, based on my experience, MongoDB's performance still has a ways to go.
Nobody believes me when I say Fractals solve literally everything efficiency related. Even though it's true.<p>Nobody believed me when I introduced on-demand script injection for javascript, today it's IT etiquette.<p>The power of popularity I guess.