OP here. Peloton has been posted here before, but didn't get any attention.<p>I think this database is very interesting even if you don't care about the time saving part of it, since it claims to be a hybrid (OLAP and OLTP), it implements postgres' wire protocol and it claims to compile queries to machine code using LLVM [1].<p>[1]: <a href="https://www.youtube.com/watch?v=mzMnyYdO8jk" rel="nofollow">https://www.youtube.com/watch?v=mzMnyYdO8jk</a> (slideshow: <a href="http://www.cs.cmu.edu/~pavlo/slides/selfdriving-nov2016.pdf" rel="nofollow">http://www.cs.cmu.edu/~pavlo/slides/selfdriving-nov2016.pdf</a>)
Side note, but I really dislike the current trend (in in-memory databases, to be clear) of not bothering to include any real provisions for durability and justifying it by saying "NVRAM exists." It effectively doesn't for anyone who need to be able to deploy to off-the-shelf environments, and it's super expensive (and if you're going for performance, like most of the research projects are, countering by using the database in a clustered configuration would be counterproductive). Are there <i>any</i> cloud providers who provide NVRAM in any configuration?
The idea of write-behind logging is slick.<p><a href="http://www.cs.cmu.edu/~pavlo/papers/p337-arulraj.pdf" rel="nofollow">http://www.cs.cmu.edu/~pavlo/papers/p337-arulraj.pdf</a>
Does anyone know what happens after the query plan is generated in most database? I'm assuming individual step, like index scan, hashjoin are coded already and the plan steps are iterated and respective methods are called? So the execution steps are already compiled but the step traversal is kind of interpreted. With Peloton LLVM engine everything is merged together in a single sequence of machine code?<p>How much advantage does this give you? Is there really so many steps in the execution plan (the visible steps are usually < 50) but what about the internal actual compiled steps? Unless this is allowing merging and further simplification steps identifying redundant operation that gets trimmed of not sure where 100x performance improvement comes from.<p>Though I remember seeing the scala based in-memory query engine that was sort of doing simplification of the actual steps and doing very well in benchmark, maybe this is similar.
I wonder why they try to support both OLTP and OLAP workloads. Supporting both of these workloads requires too much work (both row and columnar storage types, different algorithms for both storage and querying etc) and they didn't even prove that autonomous systems (which is the main point of the project) can replace the existing databases.
This sure has a lot to live up to: trying to do two thing and do them
Well isn't very unix-y. There's a reason relational database are set up to have oltp schemas (highly notmalized tables for supporting transactions etc.) and olap schemas (star schemas for example, large sometimes flat fact and dimension tables etc.). Also I'm not sure about the learning part: any decent database these days will cache frequently used data and tables can be built as in-memory ones.