科技回声

1 comment

elvinyung将近 8 年前

Yes! I've been thinking of something like this for a while.For the sake of simple data integration, I think this sort of architecture is optimal. As it stands, Spark is basically already a distributed database without its own storage engine; tighter integration with a transactional storage engine means that you could get the full power of OLTP and OLAP (HTAP) under the same interface.Imagine that you could process transactions in Spark (pushing them down to the distributed storage engine), and then Spark could automatically use the changes to update a materialized view, and you could serve the updated materialized view directly from Spark for real-time decision support, using SQL plus richer analytics like machine learning, graph processing, etc. It's not quite a one-size-fits-all [1] database, but it's close.Put a PostgreSQL or MySQL wire protocol server in front of it, and application developers won't even have to know that they're using Spark.(I'm glossing over the fact that Spark currently isn't very good at transaction processing in the sense that it literally doesn't have much of a write API right now -- i.e. support for equivalents of SQL `begin`, `commit`, and rollback`, and updates/upserts in general -- but I think that's reasonably easy to add by pushing down this functionality to capable storage engines.)[1] <a href="https://cs.brown.edu/~ugur/fits_all.pdf" rel="nofollow">https://cs.brown.edu/~ugur/fits_all.pdf</a>

评论 #14927272 未加载

评论 #14927735 未加载

评论 #14926310 未加载

TiSpark sits Spark SQL on top of a storage engine to answer complex OLAP queries

1 comment

TiSpark sits Spark SQL on top of a storage engine to answer complex OLAP queries

1 comment