TechEcho

9 comments

imperio59over 12 years ago

I read the presentation here: <a href="http://tokutek.com/downloads/mysqluc-2010-fractal-trees.pdf" rel="nofollow">http://tokutek.com/downloads/mysqluc-2010-fractal-trees.pdf</a>The math looks a bit hand-wavy to me but I get the basic data structure, you basically have sorted arrays whose lengths are each a power of two, so 1, 2, 4, 8, etc... When you insert a set of values, you can either fill an entire array or leave it empty. So for example if you have 5 values, you'd have an array of 1 element, an empty array with 2 spaces and a full array of 4 elements.Each array is sorted by itself but the smaller arrays do not necessarily contain all values that are lesser than all values of bigger arrays.To do an efficient lookup you have forward pointers from each node to the next-bigger node in the next array, so you can basically start your binary search at a given index in a bigger array and move faster.The problem is when you are inserting lots of values because you have to merge arrays multiple times and overwrite a bunch of data, but the point is that you are getting better disk IO doing that than rebalancing a B-Tree because you're not making the disk head skip around all the time, thus achieving greater speed.I'm curious about the details of this benchmark though, what kind of values are we talking about? Are they all the same size rows or do you have variable sized values (like strings?) in the benchmark?This looks promising but I'd love more explanations from the authors...

评论 #4799680 未加载

carterschonwaldover 12 years ago

Please note that the actual name for a "fractal tree" in the research literature is "Streaming (cache oblivious) B-Tree" (or at least they've very very closely related).Writing good code wrt memory locality is SUPER important for writing high performance code, whether its in memory work, or larger than ram (eg for the DB). Also a fun exercise to try to understand how!

评论 #4799714 未加载

评论 #4800123 未加载

Groxxover 12 years ago

>At 3.5 million inserted documents, the exit velocity of standard MongoDB was 2.11 inserts per second...That's terrifying - do people expect performance like this? Or was this crafted to be a pathological case? 100 element arrays don't seem too common, but that only makes this 300 million entries in e.g. a SQL table - I suspect my laptop running MySQL could outdo that kind of performance (but have no proof. I could be very wrong).

评论 #4800458 未加载

评论 #4800437 未加载

gizmo686over 12 years ago

The title is a little misleading. It looks like an asyntotic improvement, that is 532x at the scale the benchmark ran. Looking at the graph, it appears to be a significant improvement, as the old version clearly dropped to about 0, while the new version looks constant. (It took some staring to see the downward trend)

yasonover 12 years ago

Whenever I see speed-up factors of the order of 100x, all I can think of "the original implementation must have been trivially superslow".

b0b0b0bover 12 years ago

Sounds tantalizing, but in reading it is there a risk of patent infringement liability?

nasalgoatover 12 years ago

I've discussed these results with the team at 10gen and their comment is basically "that's interesting but we're not looking at it at this time."All told, based on my experience, MongoDB's performance still has a ways to go.

hmexxover 12 years ago

have the mongodb architects commented on this?

X4over 12 years ago

Nobody believes me when I say Fractals solve literally everything efficiency related. Even though it's true.Nobody believed me when I introduced on-demand script injection for javascript, today it's IT etiquette.The power of popularity I guess.

评论 #4800293 未加载

评论 #4801001 未加载

9 comments

imperio59over 12 years ago

评论 #4799680 未加载

carterschonwaldover 12 years ago

评论 #4799714 未加载

评论 #4800123 未加载

Groxxover 12 years ago

评论 #4800458 未加载

评论 #4800437 未加载

gizmo686over 12 years ago

yasonover 12 years ago

Whenever I see speed-up factors of the order of 100x, all I can think of "the original implementation must have been trivially superslow".

b0b0b0bover 12 years ago

Sounds tantalizing, but in reading it is there a risk of patent infringement liability?

nasalgoatover 12 years ago

hmexxover 12 years ago

have the mongodb architects commented on this?

X4over 12 years ago

评论 #4800293 未加载

评论 #4801001 未加载

532x Performance Increase for MongoDB with fractal tree indexes

9 comments

532x Performance Increase for MongoDB with fractal tree indexes

9 comments