TechEcho

11 comments

Arrow has been the most exciting piece of technology I've seen in the last few years. The ecosystem being built around it is amazing, and it's standardizing a bunch of disparate data ecosystems.The arrow ecosystem nets you a great compute implementation, storage (parquet), and a great RPC framework (arrow flight).

评论 #39751582 未加载

评论 #39748592 未加载

rahulrsabout 1 year ago

SQL streaming engines really seem to be having a moment.As someone who is less familiar with all the players in the space, how should I think about Arroyo vs. streaming databases like Materialize or caching tools like Readyset?

评论 #39745141 未加载

amathabout 1 year ago

Nice work on the performance boost :).How does it compare with things like: 1. <a href="https://github.com/bytewax/bytewax">https://github.com/bytewax/bytewax</a> 2. <a href="https://github.com/pathwaycom/pathway">https://github.com/pathwaycom/pathway</a>I recently read this article (<a href="https://materializedview.io/p/from-samza-to-flink-a-decade-of-stream" rel="nofollow">https://materializedview.io/p/from-samza-to-flink-a-decade-o...</a>) about Flink and it commented on Flink grew to fit all of these different use cases (applications, analytics and ETL) with disjoint requirements that Confluent built kafka-streams, ksql and connector for. What of those would you say Arroyo is better suited for?

qazxcvbnmabout 1 year ago

Not exactly on-topic, but does anyone know of SQL-to-SQL optimisers or simplifiers (perhaps DataFusion would be able to do this)? I work with generated query systems and SQL macro systems that make fairly complex queries quite easy to generate, but often times come up with unnecessary joins/subqueries etc.I find myself needing to mechanically transform and simplify SQL every now and then, and it hardly seems something out of reach of automation, yet somehow I've never been able to find software that simplifies and transforms SQL source-to-source. When I've last looked, I've only found optimisers for SQL execution plans.

评论 #39747541 未加载

memsetabout 1 year ago

Hi! Just reading the docs, this looks really slick. I had a few questions:- When you create tables, are they always connected to a source? How does that work for the cloud version (ie, source = filesystem? would we just use s3, it seems.) - Does arroyo poll an s3 bucket for new files and automatically ingest? - Are you able to do ALTER TABLE? (What if data, or data types, are mismatched?) - Similarly, am I able to change the primary key (ie, clickhouse's ORDER BY or projections?) or change indexes?Any plans for HTTP as a source? (This is what we build and I'd be happy to prototype an integration!)

评论 #39747033 未加载

benrutterabout 1 year ago

Especially factoring in the streaming capabilities an arrow based SQL database is an exciting prospect!My assumption is that throughput could be increased quite a bit for loading data into arrow based libaries like polars or pandas since data doesn't have to be converted. Any idea if that works out?

评论 #39745174 未加载

fifiluraabout 1 year ago

I have one question that i could not quite find an answer to.In Flink you can set timers to wake an event up in arbitrary time without applying a window. Is this supported in Arroyo?

zenbowmanabout 1 year ago

This is a great writeup, I work on batch/streaming stuff at Google and I'm very excited by some of the stuff I see in the Rust ecosystem, Arroyo included.

mgaunardabout 1 year ago

How does it compare to DuckDB, which is an Arrow-compatible OLAP SQL database, easy to embed and just plain awesome?

评论 #39748637 未加载

Pucilowskiabout 1 year ago

how would I go about calling python code as a step, say if I wanted to explore a grid of parameters and fit models accordingly?

评论 #39747362 未加载

esafakabout 1 year ago

Looking forward to NATS support ;)

评论 #39755274 未加载

评论 #39746811 未加载

11 comments

pantsforbirdsabout 1 year ago

评论 #39751582 未加载

评论 #39748592 未加载

rahulrsabout 1 year ago

评论 #39745141 未加载

amathabout 1 year ago

qazxcvbnmabout 1 year ago

评论 #39747541 未加载

memsetabout 1 year ago

评论 #39747033 未加载

benrutterabout 1 year ago

评论 #39745174 未加载

fifiluraabout 1 year ago

I have one question that i could not quite find an answer to.In Flink you can set timers to wake an event up in arbitrary time without applying a window. Is this supported in Arroyo?

zenbowmanabout 1 year ago

This is a great writeup, I work on batch/streaming stuff at Google and I'm very excited by some of the stuff I see in the Rust ecosystem, Arroyo included.

mgaunardabout 1 year ago

How does it compare to DuckDB, which is an Arrow-compatible OLAP SQL database, easy to embed and just plain awesome?

评论 #39748637 未加载

Pucilowskiabout 1 year ago

how would I go about calling python code as a step, say if I wanted to explore a grid of parameters and fit models accordingly?

Building a streaming SQL engine with Arrow and DataFusion

11 comments

Building a streaming SQL engine with Arrow and DataFusion

11 comments