* [...] unified execution engine<p>* accelerating data management systems<p>* [...] streamlining their development<p>* [...] consolidate and unify data management systems<p>Can someone translate this to English? I can see and recognize the individual meanings of the words, but I don't understand what they're trying to say.
"In common usage scenarios, Velox takes a fully optimized query plan as input and performs the described computation. Considering Velox does not provide a SQL parser, a dataframe layer, or a query optimizer, it is usually not meant to be used directly by end-users; rather, it is mostly used by developers integrating and optimizing their compute engines."<p>So the way you use it is that you describe some computation over your data as a query plan, and you implement a dataframe layer so Velox knows how to retrieve data from your database, and then Velox will efficiently execute the query plan? But it doesn't even optimize, so the problem it solves is that these systems like Spark and Presto don't efficiently execute optimized queries?<p>This world is very far removed from me, does anyone have a concrete example of how Velox might help them? Why is Velox better than both Presto worker and Spark engine. Aren't those core components of the system?
Anyone know how this compares to Photon by Databricks? That’s probably the benchmark + arch comp I’d like to see…<p><a href="https://www.databricks.com/product/photon" rel="nofollow">https://www.databricks.com/product/photon</a>
> Ultimately, this fragmentation results in systems with different feature sets and inconsistent semantics — reducing the productivity of data users that need to interact with multiple engines to finish tasks.<p>> In order to address these challenges and to create a stronger, more efficient data infrastructure for our own products and the world, Meta has created and open sourced Velox.<p>Maybe I'm missing something here, but it sounds like a lot of separate services got created that solve the same or similar problem in slightly different ways. These services became hard to use because they were fragmented. So the solution is to keep <i>all</i> the services and build a complex service as a middle man?<p>Why not unify the good parts of all the services into one central service? Then deprecate and transition off all the old fragmented ones? I understand that it's really hard to coordinate all of this and properly transition, but isn't the alternative of having to maintain many slightly different services (and now a complex middle man) more detrimental long term?
> Velox leverages numerous runtime optimizations, such as filter and conjunct reordering, key normalization for array and hash-based aggregations and joins, dynamic filter pushdown, and adaptive column prefetching.<p>That's a strong set of capabilities. I'm excited to see where this goes -- this could catalyze a Cambrian explosion of data systems that offload execution to Velox.
I see this as a continued effort of middleware being rewritten in C++, Rust and Go to replace Java - seems like common wisdom "Java can be as fast as C" has finally been abolished as this situation progresses (Kubernetes and other newer cloud middleware written in Go instead of Java, etc.)
It sounds very similar to apache beam. You can actually create runners for various data management systems [1]<p>[1] <a href="https://beam.apache.org/documentation/runners/" rel="nofollow">https://beam.apache.org/documentation/runners/</a>
Is this similar to Arrow datafusion but in C++? Tbh I think every hot new dataframe or analytics db has such components. The basic idea is not too different from the textbook at first glance.
So this is an Apache Arrow database engine integrated into other databases? My main takeaway is that it's great to see more projects standardizing on Arrow and pushing it further down the stack.
could this be a name conflict with this
<a href="https://www.thermofisher.com/order/catalog/product/VELOX" rel="nofollow">https://www.thermofisher.com/order/catalog/product/VELOX</a> ?