TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

101x Airbyte, 11x Estuary, Postgres to Iceberg

5 pointsby pkhodiyar14 days ago
Hi HN, we&#x27;ve been developing OLake, an open-source connector specifically designed for replicating data from PostgreSQL into Apache Iceberg. We recently ran some detailed benchmarks comparing its performance and cost against several popular data movement tools: Fivetran, Debezium (using the memiiso setup mentioned), Estuary, and Airbyte.<p>We wanted to share the results, as they show OLake performing very competitively, often exceeding the speed of both open-source and commercial alternatives, while offering the cost advantages of a self-hosted open-source solution.<p>The benchmarks covered both full initial loads and Change Data Capture (CDC) on a large dataset (billions of rows for full load, tens of millions of changes for CDC) over a 24-hour window.<p>Link to entire benchmark postgres - https:&#x2F;&#x2F;olake.io&#x2F;docs&#x2F;connectors&#x2F;postgres&#x2F;benchmarks<p>For full loads, OLake achieved throughput of around 46,262 rows&#x2F;sec, processing over 4 billion rows in 24 hours.<p>This was essentially on par with Fivetran (46,395 RPS) and significantly faster than Debezium (14,839 RPS - 3.1x slower), Estuary (3,982 RPS - 11.6x slower on a smaller processed dataset), and Airbyte (457 RPS - 101x slower before it failed the long test).<p>The most striking results were in CDC performance.<p>For processing 50 million changes, OLake completed the task in 22.5 minutes at 36,982 rows&#x2F;sec. Fivetran took 31 minutes (1.4x slower), Debezium took 60 minutes (2.7x slower), Estuary took 4.5 hours (12x slower), and Airbyte took 23 hours (63x slower).<p>This indicates OLake delivers significantly lower latency for propagating changes from PostgreSQL to Iceberg.<p>On the cost side, OLake is open source and self-hosted. The cost is simply the infrastructure. Running the benchmarks on a substantial VM (64 vcpus, 128 GiB memory) for 24 hours cost less than $75.<p>Comparing this to the vendor list prices for the data synced in the tests: Fivetran&#x27;s full load cost $7,446 ($1.86&#x2F;M rows), Estuary&#x27;s full load cost $4,462 ($12.97&#x2F;M rows), Airbyte Cloud&#x27;s partial full load cost $5,560 ($438.8&#x2F;M rows).<p>For CDC, Fivetran cost $2,257 ($45.14&#x2F;M rows), Estuary cost $22.72 ($0.45&#x2F;M rows), and Airbyte Cloud cost $148.95 ($2.98&#x2F;M rows).<p>While Estuary shows a low per-row cost for CDC in this specific test, the overall picture strongly favors the predictable, infra-based cost of self-hosted OLake, especially for large-scale replication.<p>In summary, these benchmarks suggest OLake can match or exceed the speed of leading proprietary tools for PostgreSQL to Iceberg replication, offers superior CDC latency compared to all tested alternatives, and provides a significantly lower and more predictable cost structure due to being open source and self-hosted.<p>You can find more details on the benchmarks and the tool itself in our documentation.<p>Happy to discuss the results and our approach.

no comments

no comments