TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Delta Lake vs. Parquet: A Comparison

32 点作者 MrPowers超过 1 年前

10 条评论

fractaloop超过 1 年前
Iceberg (<a href="https:&#x2F;&#x2F;iceberg.apache.org" rel="nofollow">https:&#x2F;&#x2F;iceberg.apache.org</a>) is an open source alternative to Delta Lake that I cannot recommend enough. It organizes your Parquet files (or other serialization formats) in a logical structure with snapshots to allow time travel and git-like semantics for data management and Write-Audit-Publish strategies. My favorite use recently is the idempotent change data capture to ease replication in the event of failures. When your publishing job fails, you can simply replay the same diff between two snapshots and pick up where you left off.
评论 #39056080 未加载
评论 #39056510 未加载
BadHumans超过 1 年前
Comparing Delta Lake to Parquet is a bit nonsense isn&#x27;t it? Like comparing Postgres to a zip file. After trying all of the major open table formats, Iceberg in the future in my opinion. Delta is great if you use Databricks but otherwise I don&#x27;t see a compelling reason to use it over Iceberg.
评论 #39055995 未加载
评论 #39057287 未加载
xnx超过 1 年前
More comparisons (from a competitor?):<p>&quot;Apache Hudi vs Delta Lake vs Apache Iceberg - Data Lakehouse Feature Comparison&quot; <a href="https:&#x2F;&#x2F;www.onehouse.ai&#x2F;blog&#x2F;apache-hudi-vs-delta-lake-vs-apache-iceberg-lakehouse-feature-comparison" rel="nofollow">https:&#x2F;&#x2F;www.onehouse.ai&#x2F;blog&#x2F;apache-hudi-vs-delta-lake-vs-ap...</a>
评论 #39055831 未加载
querez超过 1 年前
I&#x27;m not well versed in these things, but at this point, aren&#x27;t you re-inventing database systems? Talking about things like ACID transactions, schema evolution, dropping columns, ... in the context of a file-format feels bizarre to me.
评论 #39055785 未加载
评论 #39055817 未加载
评论 #39055814 未加载
评论 #39056530 未加载
评论 #39055649 未加载
alexmolas超过 1 年前
Isn&#x27;t delta lake using parquet files? I don&#x27;t understand the comparison.<p>Also<p>&gt; Parquet tables are OK when data is in a single file but are hard to manage and unnecessarily slow when data is in many files<p>This is not true. Having worked with Spark it&#x27;s much better to have multiple &quot;small&quot; files than only one big file.
评论 #39055884 未加载
评论 #39059336 未加载
评论 #39055899 未加载
gregw2超过 1 年前
This is a weird comparison to make nowadays. A more relevant question is Delta Lake vs Iceberg.
评论 #39055999 未加载
Zizizizz超过 1 年前
Delta is pretty great, let&#x27;s you do upserts into tables in DataBricks much easier than without it.<p>I think the website is here: <a href="https:&#x2F;&#x2F;delta.io" rel="nofollow">https:&#x2F;&#x2F;delta.io</a>
评论 #39055523 未加载
orthoxerox超过 1 年前
Delta is nice, but a lot of features are missing from the FOSS version.<p>Hudi is nice, but they are in the middle of a big format change right now.<p>Iceberg is nice, but is the most conservative and slow format out of three.
评论 #39055632 未加载
评论 #39056488 未加载
lgsilver超过 1 年前
Databricks has been struggling to defend Delta against the fast-moving improvements and widening adoption of Iceberg, championed by two of its major competitors, AWS and Snowflake. This article seems like a bizarre, and maybe even misleading, artifact, given that no one in the industry is comparing Parquet to Delta. They’re weighing Iceberg, which like Delta, can organize and structure groups of parquet (or other format) files…
评论 #39056640 未加载
MrPowers超过 1 年前
Data Lakes (i.e. Parquet files in storage without a metadata layer) don&#x27;t support transactions, require expensive file listing operations, and don&#x27;t support basic DML operations like deleting rows.<p>Delta Lake stores data in Parquet files and adds a metadata layer to provide support for ACID transactions, schema enforcement, versioned data, and full DML support. Delta Lake also offers concurrency protection.<p>This post explains all the features offered by Delta Lake in comparison to a plain vanilla Parquet data lake.
评论 #39055811 未加载