TechEcho

7 comments

Python's Substrait seems like the biggest/most-used competitor-ish out there. I'd love some compare & contrast; my sense is that Substrait has a smaller ambition, more wants to be a language for talking about execution rather than a full on optimization/execution engine. <a href="https://github.com/substrait-io/substrait">https://github.com/substrait-io/substrait</a> .(Edit: ah, there's a recent talk discussing PyVelox trying to get Substrait integration. <a href="https://www.youtube.com/watch?v=l_kHxkGkNRg#t=18m22s" rel="nofollow">https://www.youtube.com/watch?v=l_kHxkGkNRg#t=18m22s</a> . However there's also discussion about the un-maintainedness of some of the current Substrait work here; unclear status. <a href="https://github.com/facebookincubator/velox/issues/8895">https://github.com/facebookincubator/velox/issues/8895</a>)We can also see from the Apache Arrow DataFusion discussion that they too see themselves as a bit of a Velox competitor. <a href="https://github.com/apache/arrow-datafusion/discussions/6441">https://github.com/apache/arrow-datafusion/discussions/6441</a>It's cool to see this space mature. I like that even Velox sees that Apache Arrow (underlying Apache Arrow DataFusion too) is industry standard tech that they ought work with. <a href="https://engineering.fb.com/2024/02/20/developer-tools/velox-apache-arrow-15-composable-data-management/" rel="nofollow">https://engineering.fb.com/2024/02/20/developer-tools/velox-...</a>Theres a solid Influx post talks to some of how they are composing the assorted technologies to build they next gen 3.0, which I find helpful for getting a sense of how all the pieces of a modern high-performance data engine slot together. <a href="https://www.influxdata.com/blog/flight-datafusion-arrow-parquet-fdap-architecture-influxdb/" rel="nofollow">https://www.influxdata.com/blog/flight-datafusion-arrow-parq...</a>

评论 #39820070 未加载

评论 #39822669 未加载

sakrasabout 1 year ago

My general take is that while the idea of composability is good, the implementations of these things are just frankly not of high quality. Velox/Acero in particular are all plagued by what I've come to call "Java syndrome", where everything is written as idiomatic Java but with C++ syntax. Virtual methods, std::shared_ptr galore (in lieu of garbage collection), random heap allocations, etc. As a result these systems tend to be bloated and significantly slower than they need to be.DuckDB is good though, and I predict its quality of implementation will keep "monolithic databases" relevant for a while longer.

评论 #39824949 未加载

评论 #39831263 未加载

评论 #39825230 未加载

redskyluanabout 1 year ago

Velox could be competitor of datafusion. It is more focus on execution engine and could be great to integrate to other high performance databases.Database will be split into pieces and rebuild!

评论 #39822138 未加载

评论 #39824279 未加载

sgt101about 1 year ago

I wonder how many of this sort of FAANG project really get used where they are built. I went for an interview at a FAANG years ago to work on a very big consumer product (when it was in relative infancy) and expected to find a hyper tech data backend to use... they told me that they were using mySQL.I didn't get the job so maybe they were just joking around with me - but the general despair that they evinced about their data situation makes me wonder!

评论 #39821627 未加载

评论 #39821844 未加载

评论 #39822738 未加载

评论 #39821862 未加载

评论 #39821574 未加载

pvgabout 1 year ago

A thread from late 2022: <a href="https://news.ycombinator.com/item?id=32673873">https://news.ycombinator.com/item?id=32673873</a>

HermitXabout 1 year ago

To the best of my knowledge, Meta has significantly reduced its investment in the Velox project. Apart from Meta, I'm not aware of any other major company that really uses Velox in a production environment. Frankly speaking, Velox may have already missed the window of opportunity for rapid development. If you're looking for a vectorized execution engine, you could consider ClickHouse (www.clickhouse.com) or StarRocks (www.starrocks.io). If your data analysis scenarios require more multi-table join operations, StarRocks is clearly a better choice.

评论 #39822823 未加载

评论 #39847141 未加载

zX41ZdbWabout 1 year ago

Many ideas look like they were influenced by ClickHouse, and some are direct copies. I'm surprised they didn't provide references to ClickHouse, where the implementations are proven in production in the first place.

评论 #39820646 未加载

评论 #39847162 未加载

评论 #39820384 未加载

7 comments

jauntywundrkindabout 1 year ago

评论 #39820070 未加载

评论 #39822669 未加载

sakrasabout 1 year ago

评论 #39824949 未加载

评论 #39831263 未加载

评论 #39825230 未加载

redskyluanabout 1 year ago

Velox could be competitor of datafusion. It is more focus on execution engine and could be great to integrate to other high performance databases.Database will be split into pieces and rebuild!

评论 #39822138 未加载

评论 #39824279 未加载

sgt101about 1 year ago

评论 #39821627 未加载

评论 #39821844 未加载

评论 #39822738 未加载

评论 #39821862 未加载

评论 #39821574 未加载

pvgabout 1 year ago

A thread from late 2022: <a href="https://news.ycombinator.com/item?id=32673873">https://news.ycombinator.com/item?id=32673873</a>

Velox: Meta's Unified Execution Engine [pdf]

7 comments

Velox: Meta's Unified Execution Engine [pdf]

7 comments