Yes! I've been thinking of something like this for a while.<p>For the sake of simple data integration, I think this sort of architecture is optimal. As it stands, Spark is basically already a distributed database without its own storage engine; tighter integration with a transactional storage engine means that you could get the full power of OLTP and OLAP (HTAP) under the same interface.<p>Imagine that you could process transactions in Spark (pushing them down to the distributed storage engine), and then Spark could automatically use the changes to update a materialized view, and you could serve the updated materialized view directly from Spark for real-time decision support, using SQL plus richer analytics like machine learning, graph processing, etc. It's not <i>quite</i> a one-size-fits-all [1] database, but it's close.<p>Put a PostgreSQL or MySQL wire protocol server in front of it, and application developers won't even have to know that they're using Spark.<p>(I'm glossing over the fact that Spark currently isn't very good at transaction processing in the sense that it literally doesn't have much of a write API right now -- i.e. support for equivalents of SQL `begin`, `commit`, and rollback`, and updates/upserts in general -- but I think that's reasonably easy to add by pushing down this functionality to capable storage engines.)<p>[1] <a href="https://cs.brown.edu/~ugur/fits_all.pdf" rel="nofollow">https://cs.brown.edu/~ugur/fits_all.pdf</a>