TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Delta Lake vs. Parquet: A Comparison

32 pointsby MrPowersover 1 year ago

10 comments

fractaloopover 1 year ago
Iceberg (<a href="https:&#x2F;&#x2F;iceberg.apache.org" rel="nofollow">https:&#x2F;&#x2F;iceberg.apache.org</a>) is an open source alternative to Delta Lake that I cannot recommend enough. It organizes your Parquet files (or other serialization formats) in a logical structure with snapshots to allow time travel and git-like semantics for data management and Write-Audit-Publish strategies. My favorite use recently is the idempotent change data capture to ease replication in the event of failures. When your publishing job fails, you can simply replay the same diff between two snapshots and pick up where you left off.
评论 #39056080 未加载
评论 #39056510 未加载
BadHumansover 1 year ago
Comparing Delta Lake to Parquet is a bit nonsense isn&#x27;t it? Like comparing Postgres to a zip file. After trying all of the major open table formats, Iceberg in the future in my opinion. Delta is great if you use Databricks but otherwise I don&#x27;t see a compelling reason to use it over Iceberg.
评论 #39055995 未加载
评论 #39057287 未加载
xnxover 1 year ago
More comparisons (from a competitor?):<p>&quot;Apache Hudi vs Delta Lake vs Apache Iceberg - Data Lakehouse Feature Comparison&quot; <a href="https:&#x2F;&#x2F;www.onehouse.ai&#x2F;blog&#x2F;apache-hudi-vs-delta-lake-vs-apache-iceberg-lakehouse-feature-comparison" rel="nofollow">https:&#x2F;&#x2F;www.onehouse.ai&#x2F;blog&#x2F;apache-hudi-vs-delta-lake-vs-ap...</a>
评论 #39055831 未加载
querezover 1 year ago
I&#x27;m not well versed in these things, but at this point, aren&#x27;t you re-inventing database systems? Talking about things like ACID transactions, schema evolution, dropping columns, ... in the context of a file-format feels bizarre to me.
评论 #39055785 未加载
评论 #39055817 未加载
评论 #39055814 未加载
评论 #39056530 未加载
评论 #39055649 未加载
alexmolasover 1 year ago
Isn&#x27;t delta lake using parquet files? I don&#x27;t understand the comparison.<p>Also<p>&gt; Parquet tables are OK when data is in a single file but are hard to manage and unnecessarily slow when data is in many files<p>This is not true. Having worked with Spark it&#x27;s much better to have multiple &quot;small&quot; files than only one big file.
评论 #39055884 未加载
评论 #39059336 未加载
评论 #39055899 未加载
gregw2over 1 year ago
This is a weird comparison to make nowadays. A more relevant question is Delta Lake vs Iceberg.
评论 #39055999 未加载
Zizizizzover 1 year ago
Delta is pretty great, let&#x27;s you do upserts into tables in DataBricks much easier than without it.<p>I think the website is here: <a href="https:&#x2F;&#x2F;delta.io" rel="nofollow">https:&#x2F;&#x2F;delta.io</a>
评论 #39055523 未加载
orthoxeroxover 1 year ago
Delta is nice, but a lot of features are missing from the FOSS version.<p>Hudi is nice, but they are in the middle of a big format change right now.<p>Iceberg is nice, but is the most conservative and slow format out of three.
评论 #39055632 未加载
评论 #39056488 未加载
lgsilverover 1 year ago
Databricks has been struggling to defend Delta against the fast-moving improvements and widening adoption of Iceberg, championed by two of its major competitors, AWS and Snowflake. This article seems like a bizarre, and maybe even misleading, artifact, given that no one in the industry is comparing Parquet to Delta. They’re weighing Iceberg, which like Delta, can organize and structure groups of parquet (or other format) files…
评论 #39056640 未加载
MrPowersover 1 year ago
Data Lakes (i.e. Parquet files in storage without a metadata layer) don&#x27;t support transactions, require expensive file listing operations, and don&#x27;t support basic DML operations like deleting rows.<p>Delta Lake stores data in Parquet files and adds a metadata layer to provide support for ACID transactions, schema enforcement, versioned data, and full DML support. Delta Lake also offers concurrency protection.<p>This post explains all the features offered by Delta Lake in comparison to a plain vanilla Parquet data lake.
评论 #39055811 未加载