TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Reliably replicating data between Postgres and ClickHouse

97 点作者 saisrirampur3 个月前

5 条评论

woodhull3 个月前
We&#x27;ve used PeerDB&#x27;s hosted offering for sync&#x27;ing data from Postgres to Clickhouse both pre and post acquisition by Clickhouse Inc. We&#x27;ve also helped test the integrated sync features in Clickhouse Cloud built on top of PeerDB. We&#x27;re using it to power customer facing features within our product.<p>It works well. Their team is great. I feel a bit spoiled having had as much access to the engineering team during the private beta as we&#x27;ve experienced.<p>It&#x27;s great for use cases where it makes sense to sync postgres tables across to clickhouse without denormalizing them. PeerDB can transform rows in a single table sent via CDC using a lua scripting language, but it can&#x27;t (yet!) denormalize data into clickhouse that is stored in 3NF on Postgres across multiple tables.<p>On the clickhouse query side, we end up wanting denormalized data for query performance and to avoid JOINs. It&#x27;s frequently not a great idea to query in clickhouse using the same table structure as you&#x27;re using in your transactional db.<p>In our experience we sync a few tables with PeerDB but mostly end up using app-level custom code to sync denormalized data into Clickhouse for our core use-cases. Most of the PeerDB sync&#x27;d tables end up as Clickhouse Dictionaries which we then use in our queries.<p>PeerDB works well and I like it for what it is. Just don&#x27;t expect to be satisfied with querying in Clickhouse against the same table structure as you&#x27;ve got in Postgres unless your data size is tiny.<p>Curious to know about how others are using it and the architectures you&#x27;ve developed.
评论 #43145958 未加载
评论 #43163355 未加载
评论 #43150095 未加载
评论 #43147823 未加载
jascha_eng3 个月前
Not to sound too sales-y but if you are looking into clickhouse and are currently based on postgres, you might also want to check out timescale. Since we&#x27;re just a postgres extension it&#x27;s 100% compatible with existing systems but provides a lot of the same speed benefits as clickhouse for analytical queries.<p>Don&#x27;t be confused by the timeseries branding.
评论 #43143956 未加载
评论 #43143873 未加载
评论 #43143941 未加载
评论 #43144257 未加载
评论 #43143882 未加载
评论 #43147144 未加载
评论 #43144098 未加载
评论 #43143753 未加载
spapas823 个月前
If anybody&#x27;s interested I&#x27;ve written a script in python that executes a query in a postgres db and saves the results to a parquet file (the script is not as simple as one would expect because of the fact that parquet is not as simple as csv and because the amount of data may be huge).<p>The parquet file is a columnar friendly friendly that can then be simply inserted to clickhouse or duckdb or even queried directly.<p>This script and a cron job are enough for my (not very complex) needs on replicating my postgres data on clickhouse for fast queries.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;spapas&#x2F;pg-parquet-py">https:&#x2F;&#x2F;github.com&#x2F;spapas&#x2F;pg-parquet-py</a>
评论 #43144272 未加载
评论 #43149219 未加载
评论 #43152493 未加载
jedberg3 个月前
&gt; Once the databases are selected, the next challenge is ensuring reliable, ongoing data replication from PostgreSQL to ClickHouse. While it’s possible to handle this at the application layer by writing data directly into ClickHouse, replicating data at the database level tends to be simpler and more reliable.<p>I&#x27;m curious if you have data that backs this up, or if it&#x27;s more of a &quot;gut feeling&quot; sort of thing. At first blush, I agree with you, but at the same time, by doing it at the application level, it opens up so many more possibilities, such as writing &quot;pre-coalesced&quot; data to the data warehouse or pre-enriching the data that goes into the data warehouse.
评论 #43147079 未加载
lukaslalinsky3 个月前
With all these CDC solutions based on logical replication, how are PostgreSQL failovers handled? You can&#x27;t really rewind ClickHouse. Does it leave it with inconsistencies? Does it rebuild the DB fully?
评论 #43151912 未加载
评论 #43151918 未加载