TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Peerdb Streams – Simple, native Postgres change data capture

177 点作者 saisrirampur大约 1 年前
Hello HN, I am Sai Srirampur, one of the Co-founders of PeerDB. (<a href="https:&#x2F;&#x2F;github.com&#x2F;PeerDB-io&#x2F;peerdb">https:&#x2F;&#x2F;github.com&#x2F;PeerDB-io&#x2F;peerdb</a>). We spent the past 7 months building a solid experience to replicate data from Postgres to data warehouses. Now we&#x27;re expanding to queues.<p>PeerDB Streams provides a simple and native way to replicate changes as they happen in Postgres to Queues (Kafka, Redpanda, Google PubSub, etc). We use Postgres logical decoding to enable Change Data Capture (CDC).<p>Blog post here: <a href="https:&#x2F;&#x2F;blog.peerdb.io&#x2F;peerdb-streams-simple-native-postgres-change-data-capture">https:&#x2F;&#x2F;blog.peerdb.io&#x2F;peerdb-streams-simple-native-postgres...</a>. 10-min quickstart here: <a href="https:&#x2F;&#x2F;docs.peerdb.io&#x2F;quickstart&#x2F;streams-quickstart">https:&#x2F;&#x2F;docs.peerdb.io&#x2F;quickstart&#x2F;streams-quickstart</a>.<p>We chose queues as many users found that existing tools are complex. Debezium is the most used tool for this use-case. It has large production usage. However, a common pain point among our users is that it has a significant learning curve taking months to productionize.<p>A few issues are: a) Interacting through a command line interface, understanding the various settings, and learning best practices for running it in production is not trivial. Debezium UI, released to address usability concerns [1], is still in an incubating state [2]. Additionally, reading Debezium resources to get started can be overwhelming [3]. b) Supporting data formats and transformations isn’t easy. It needs a Java project, building JAR packages and setting up a runtime path on the kafka connect plugin. c)Debezium is not as native as Kafka for other queues and doesn’t offer the same level of configurability. For example, with Event Hubs, it is difficult to stream to topics spread across namespaces and subscriptions.<p>TL;DR Debezium aims to provide a comprehensive experience for engineers to implement CDC rather than making it dead simple for them. So you can do a lot with Debezium but need to know a lot about it.<p>At PeerDB, we are building a simple yet comprehensive experience for Postgres CDC. The goal is to enable engineers to build prod-grade Postgres CDC with a minimal learning curve, within a few days.<p>PeerDB’s feature-set isn&#x27;t at Debezium&#x27;s level yet, and as we evolve, we might face similar challenges. However, we&#x27;re putting usability at the forefront and we believe that we can achieve the above goal.<p>First, PeerDB offers a simple UI to set up Postgres and Kafka by creating PEERs and initiating CDC by creating a MIRROR. Through the UI, users can monitor the progress of CDC, including throughput and latency; set up alerts to Slack&#x2F;Email based on replication slot growth; investigate Postgres-native metrics, including slot size, etc. Here is a demo showing of PeerDB UI in action:<p><a href="https:&#x2F;&#x2F;www.loom.com&#x2F;share&#x2F;ebcfb7646a1e48738835853b760e5d04" rel="nofollow">https:&#x2F;&#x2F;www.loom.com&#x2F;share&#x2F;ebcfb7646a1e48738835853b760e5d04</a><p>Second, for users who prefer a CLI, we provide a Postgres-compatible SQL layer to manage CDC. This offers the same level of features as the UI and is more intuitive compared to bash scripts.<p>Third, users can perform row-level transformations using Lua scripts executed at runtime. This enables features such as encrypting&#x2F;masking PII data, supporting various data formats (JSON, MsgPack, Protobuf, etc.), and more. We offer a script editor along with a bunch of useful templates [5].<p>Fourth, we provide native connectors to non-Kafka targets. We also provide native configurability options tailored to these platforms. For example, with Event Hubs, users can perform CDC to topics distributed across different namespaces and subscriptions [4].<p>Finally, We are laser focused on Postgres, enabling specific optimizations like native metrics for replication, wait-events, and # of connections. Features like faster initial loads through parallel snapshotting and decoding transactions in-flight are in private beta.<p>Our hope is to provide the best data-movement experience for Postgres. PeerDB Streams is another step in that direction. We would love to get your feedback on product experience, our thesis and anything else that comes to your mind. It would be super useful for us. Thank you!<p>References:<p>[1] <a href="https:&#x2F;&#x2F;debezium.io&#x2F;blog&#x2F;2020&#x2F;10&#x2F;22&#x2F;towards-debezium-ui&#x2F;" rel="nofollow">https:&#x2F;&#x2F;debezium.io&#x2F;blog&#x2F;2020&#x2F;10&#x2F;22&#x2F;towards-debezium-ui&#x2F;</a> [2] <a href="https:&#x2F;&#x2F;debezium.io&#x2F;documentation&#x2F;reference&#x2F;stable&#x2F;operations&#x2F;debezium-ui.html" rel="nofollow">https:&#x2F;&#x2F;debezium.io&#x2F;documentation&#x2F;reference&#x2F;stable&#x2F;operation...</a> [3] <a href="https:&#x2F;&#x2F;medium.com&#x2F;@cooper.wolfe&#x2F;i-hated-debezium-so-much-i-did-it-myself-b43b0efc20a9" rel="nofollow">https:&#x2F;&#x2F;medium.com&#x2F;@cooper.wolfe&#x2F;i-hated-debezium-so-much-i-...</a> [4] <a href="https:&#x2F;&#x2F;blog.peerdb.io&#x2F;enterprise-grade-replication-from-postgres-to-azure-event-hubs">https:&#x2F;&#x2F;blog.peerdb.io&#x2F;enterprise-grade-replication-from-pos...</a> [5] <a href="https:&#x2F;&#x2F;github.com&#x2F;PeerDB-io&#x2F;examples">https:&#x2F;&#x2F;github.com&#x2F;PeerDB-io&#x2F;examples</a> [5] <a href="https:&#x2F;&#x2F;app.peerdb.cloud" rel="nofollow">https:&#x2F;&#x2F;app.peerdb.cloud</a> [6] <a href="https:&#x2F;&#x2F;github.com&#x2F;PeerDB-io&#x2F;PeerDB">https:&#x2F;&#x2F;github.com&#x2F;PeerDB-io&#x2F;PeerDB</a>

11 条评论

gniting大约 1 年前
Nice to see more product development and offerings in this area. Well done.<p>[Full disclosure, I work for Prisma and we have a similar product called Pulse (<a href="https:&#x2F;&#x2F;prisma.io&#x2F;pulse" rel="nofollow">https:&#x2F;&#x2F;prisma.io&#x2F;pulse</a>)]<p>Another use case for CDC is compliance. I reckon that in the near future, to ensure with data compliance regulations, CDC will become the better option for devs vs traditional seek&#x2F;update&#x2F;delete functions.
flockonus大约 1 年前
What is Change Data Capture (CDC) ?<p>Peerdb doesn&#x27;t seem to inform on the core of the problem it solves, here&#x27;s a reference from Debezium (mentioned in the text)<p>&gt; set up and configure Debezium to monitor your databases, and then your applications consume events for each row-level change made to the database. Only committed changes are visible, so your application doesn&#x27;t have to worry about transactions or changes that are rolled back.<p>It&#x27;s good to know! This model seems to turn row changes into effectively a side-effect invocation for a queue.
评论 #40278400 未加载
评论 #40278564 未加载
arsalanb大约 1 年前
Noob question: What is the advantage of replicating data into a warehouse vs. just querying it in place on a postgres database?
评论 #40280830 未加载
评论 #40279325 未加载
评论 #40279986 未加载
评论 #40284448 未加载
评论 #40282024 未加载
评论 #40280323 未加载
jensneuse大约 1 年前
We&#x27;re currently in the process of adding Kafka support to EDFS (Event Driven Federated Subscriptions), a specification to add NATS, Kafka, and other pub sub or streaming services to a federated graph.<p>There&#x27;s one thing missing and we could hook it up to your CDC solution. For each message, we need to set the __typename field. Does your solution have a way to accomplish this?<p>EDFS reference: <a href="https:&#x2F;&#x2F;cosmo-docs.wundergraph.com&#x2F;router&#x2F;event-driven-federated-subscriptions-edfs" rel="nofollow">https:&#x2F;&#x2F;cosmo-docs.wundergraph.com&#x2F;router&#x2F;event-driven-feder...</a>
评论 #40282590 未加载
take-five大约 1 年前
How do you handle Postgres cluster failover? Does PeerDB automatically restore logical replication slot on a new primary?
评论 #40279162 未加载
sayadxiarkakh大约 1 年前
Are BigQuery&#x27;s clustered and partitioned supported (both as a source and sink).<p>Plus how is the deduplication process handled? Fivetran for example creates staging tables and scans the target table. Since it does support BigQuery&#x27;s integer based partitioning. A table partitioned by Primary key helps in cost optimizations.
评论 #40281060 未加载
评论 #40281567 未加载
zknill大约 1 年前
Why do you recommend a heartbeat table to mitigate WAL slot growth if the PeerDB Stream targets a specific table? Presumably this means that the WAL slot is subscribed to all table changes, even if only specific tables are actually included in the CDC? Why not just subscribe to the WAL that you need?
vivzkestrel大约 1 年前
what is the difference between using this library and PG_LISTEN&#x2F;NOTIFY with triggers for each row change?
评论 #40284998 未加载
semicognitive大约 1 年前
Would love to know how this compares to Supabase&#x27;s realtime. <a href="https:&#x2F;&#x2F;github.com&#x2F;supabase&#x2F;realtime">https:&#x2F;&#x2F;github.com&#x2F;supabase&#x2F;realtime</a>
adontz大约 1 年前
SQS and EventBridge targets not even on the roadmap? Why? Any specific reason?
评论 #40278019 未加载
tarun_anand大约 1 年前
Does peerdb help in replicating from one citus cluster to another citus cluster ?<p>If not, then is there any other solution?
评论 #40282440 未加载