TechEcho

10 comments

fractaloopover 1 year ago

Iceberg (<a href="https://iceberg.apache.org" rel="nofollow">https://iceberg.apache.org</a>) is an open source alternative to Delta Lake that I cannot recommend enough. It organizes your Parquet files (or other serialization formats) in a logical structure with snapshots to allow time travel and git-like semantics for data management and Write-Audit-Publish strategies. My favorite use recently is the idempotent change data capture to ease replication in the event of failures. When your publishing job fails, you can simply replay the same diff between two snapshots and pick up where you left off.

评论 #39056080 未加载

评论 #39056510 未加载

BadHumansover 1 year ago

Comparing Delta Lake to Parquet is a bit nonsense isn't it? Like comparing Postgres to a zip file. After trying all of the major open table formats, Iceberg in the future in my opinion. Delta is great if you use Databricks but otherwise I don't see a compelling reason to use it over Iceberg.

评论 #39055995 未加载

评论 #39057287 未加载

xnxover 1 year ago

More comparisons (from a competitor?):"Apache Hudi vs Delta Lake vs Apache Iceberg - Data Lakehouse Feature Comparison" <a href="https://www.onehouse.ai/blog/apache-hudi-vs-delta-lake-vs-apache-iceberg-lakehouse-feature-comparison" rel="nofollow">https://www.onehouse.ai/blog/apache-hudi-vs-delta-lake-vs-ap...</a>

评论 #39055831 未加载

querezover 1 year ago

I'm not well versed in these things, but at this point, aren't you re-inventing database systems? Talking about things like ACID transactions, schema evolution, dropping columns, ... in the context of a file-format feels bizarre to me.

评论 #39055785 未加载

评论 #39055817 未加载

评论 #39055814 未加载

评论 #39056530 未加载

评论 #39055649 未加载

alexmolasover 1 year ago

Isn't delta lake using parquet files? I don't understand the comparison.Also> Parquet tables are OK when data is in a single file but are hard to manage and unnecessarily slow when data is in many filesThis is not true. Having worked with Spark it's much better to have multiple "small" files than only one big file.

评论 #39055884 未加载

评论 #39059336 未加载

评论 #39055899 未加载

gregw2over 1 year ago

This is a weird comparison to make nowadays. A more relevant question is Delta Lake vs Iceberg.

评论 #39055999 未加载

Zizizizzover 1 year ago

Delta is pretty great, let's you do upserts into tables in DataBricks much easier than without it.I think the website is here: <a href="https://delta.io" rel="nofollow">https://delta.io</a>

评论 #39055523 未加载

orthoxeroxover 1 year ago

Delta is nice, but a lot of features are missing from the FOSS version.Hudi is nice, but they are in the middle of a big format change right now.Iceberg is nice, but is the most conservative and slow format out of three.

评论 #39055632 未加载

评论 #39056488 未加载

lgsilverover 1 year ago

Databricks has been struggling to defend Delta against the fast-moving improvements and widening adoption of Iceberg, championed by two of its major competitors, AWS and Snowflake. This article seems like a bizarre, and maybe even misleading, artifact, given that no one in the industry is comparing Parquet to Delta. They’re weighing Iceberg, which like Delta, can organize and structure groups of parquet (or other format) files…

评论 #39056640 未加载

MrPowersover 1 year ago

Data Lakes (i.e. Parquet files in storage without a metadata layer) don't support transactions, require expensive file listing operations, and don't support basic DML operations like deleting rows.Delta Lake stores data in Parquet files and adds a metadata layer to provide support for ACID transactions, schema enforcement, versioned data, and full DML support. Delta Lake also offers concurrency protection.This post explains all the features offered by Delta Lake in comparison to a plain vanilla Parquet data lake.

评论 #39055811 未加载

10 comments

fractaloopover 1 year ago

评论 #39056080 未加载

评论 #39056510 未加载

BadHumansover 1 year ago

评论 #39055995 未加载

评论 #39057287 未加载

xnxover 1 year ago

评论 #39055831 未加载

querezover 1 year ago

评论 #39055785 未加载

评论 #39055817 未加载

评论 #39055814 未加载

评论 #39056530 未加载

评论 #39055649 未加载

alexmolasover 1 year ago

评论 #39055884 未加载

评论 #39059336 未加载

评论 #39055899 未加载

gregw2over 1 year ago

This is a weird comparison to make nowadays. A more relevant question is Delta Lake vs Iceberg.

评论 #39055999 未加载

Zizizizzover 1 year ago

Delta is pretty great, let's you do upserts into tables in DataBricks much easier than without it.I think the website is here: <a href="https://delta.io" rel="nofollow">https://delta.io</a>

Delta Lake vs. Parquet: A Comparison

10 comments

Delta Lake vs. Parquet: A Comparison

10 comments