TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

TiSpark sits Spark SQL on top of a storage engine to answer complex OLAP queries

49 点作者 jinqueeny将近 8 年前

1 comment

elvinyung将近 8 年前
Yes! I&#x27;ve been thinking of something like this for a while.<p>For the sake of simple data integration, I think this sort of architecture is optimal. As it stands, Spark is basically already a distributed database without its own storage engine; tighter integration with a transactional storage engine means that you could get the full power of OLTP and OLAP (HTAP) under the same interface.<p>Imagine that you could process transactions in Spark (pushing them down to the distributed storage engine), and then Spark could automatically use the changes to update a materialized view, and you could serve the updated materialized view directly from Spark for real-time decision support, using SQL plus richer analytics like machine learning, graph processing, etc. It&#x27;s not <i>quite</i> a one-size-fits-all [1] database, but it&#x27;s close.<p>Put a PostgreSQL or MySQL wire protocol server in front of it, and application developers won&#x27;t even have to know that they&#x27;re using Spark.<p>(I&#x27;m glossing over the fact that Spark currently isn&#x27;t very good at transaction processing in the sense that it literally doesn&#x27;t have much of a write API right now -- i.e. support for equivalents of SQL `begin`, `commit`, and rollback`, and updates&#x2F;upserts in general -- but I think that&#x27;s reasonably easy to add by pushing down this functionality to capable storage engines.)<p>[1] <a href="https:&#x2F;&#x2F;cs.brown.edu&#x2F;~ugur&#x2F;fits_all.pdf" rel="nofollow">https:&#x2F;&#x2F;cs.brown.edu&#x2F;~ugur&#x2F;fits_all.pdf</a>
评论 #14927272 未加载
评论 #14927735 未加载
评论 #14926310 未加载