TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

ClickHouse for usage metering with Kafka Connect

9 pointsby hekikeover 1 year ago
Hi HN, at OpenMeter, we switched to using ClickHouse as our database to pre-aggregate and query usage.<p>Additionally, our initial choice of the streaming processor, ksqlDB, revealed limitations when scaling for small to medium-sized producers. The constraint was around ksqlDB&#x27;s limited capacity to run persisted queries per instance (20 on Confluent Cloud) and its lack of support for clusterization.<p>With ClickHouse, the new processing pipeline looks as: API -&gt; Kafka -&gt; Kafka Connect -&gt; ClickHouse<p>Under the hood, we use the ClickHouse Kafka Connect Sink plugin to ensure consistent data movement. This plugin guarantees exactly-once delivery between Kafka topics and ClickHouse tables, which is critical, as Kafka Connect tasks are only aware of the latest topic offset acknowledged by the consumer. For example, consumers can fail to acknowledge a processed offset due to a network error or an exception. This is great as exactly-once inserts prevent dropping or double-inserting usage, leading to incorrect billing.<p>In OpenMeter, we pre-aggregate usage events into one-minute tumbling windows to reduce the number of rows we need to scan at query time. To do this, with ClickHouse, we use the AggregatingMergeTree table engine that enables incremental data aggregation when combined with MaterializedView. In ClickHouse, MaterializedViews are trigger-based and update when new records are inserted into the source table. Consequently, the corresponding materialized views are updated whenever Kafka Connect transfers a batch of events to ClickHouse. This also means inserts can fail when the view cannot process a record at trigger. We send failed events into the Dead Letter Queue topic for later processing.<p>To help ClickHouse with hot topics, we will consider adding an extra streaming aggregation step for high-producers, but this time with a more horizontally scalable stream processor like Arroyo. This would reduce ClickHouse insert batch sizes. Based on our tests, ClickHouse works best if batch sizes are 50-100k and less frequent than per second.<p>To see it in action, check out our open-source repo: https:&#x2F;&#x2F;github.com&#x2F;openmeterio&#x2F;openmeter

no comments

no comments