TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

ClickHouse for usage metering with Kafka Connect

9 点作者 hekike超过 1 年前
Hi HN, at OpenMeter, we switched to using ClickHouse as our database to pre-aggregate and query usage.<p>Additionally, our initial choice of the streaming processor, ksqlDB, revealed limitations when scaling for small to medium-sized producers. The constraint was around ksqlDB&#x27;s limited capacity to run persisted queries per instance (20 on Confluent Cloud) and its lack of support for clusterization.<p>With ClickHouse, the new processing pipeline looks as: API -&gt; Kafka -&gt; Kafka Connect -&gt; ClickHouse<p>Under the hood, we use the ClickHouse Kafka Connect Sink plugin to ensure consistent data movement. This plugin guarantees exactly-once delivery between Kafka topics and ClickHouse tables, which is critical, as Kafka Connect tasks are only aware of the latest topic offset acknowledged by the consumer. For example, consumers can fail to acknowledge a processed offset due to a network error or an exception. This is great as exactly-once inserts prevent dropping or double-inserting usage, leading to incorrect billing.<p>In OpenMeter, we pre-aggregate usage events into one-minute tumbling windows to reduce the number of rows we need to scan at query time. To do this, with ClickHouse, we use the AggregatingMergeTree table engine that enables incremental data aggregation when combined with MaterializedView. In ClickHouse, MaterializedViews are trigger-based and update when new records are inserted into the source table. Consequently, the corresponding materialized views are updated whenever Kafka Connect transfers a batch of events to ClickHouse. This also means inserts can fail when the view cannot process a record at trigger. We send failed events into the Dead Letter Queue topic for later processing.<p>To help ClickHouse with hot topics, we will consider adding an extra streaming aggregation step for high-producers, but this time with a more horizontally scalable stream processor like Arroyo. This would reduce ClickHouse insert batch sizes. Based on our tests, ClickHouse works best if batch sizes are 50-100k and less frequent than per second.<p>To see it in action, check out our open-source repo: https:&#x2F;&#x2F;github.com&#x2F;openmeterio&#x2F;openmeter

暂无评论

暂无评论