TechEcho

Hey HN! I want to share a new open source project I've been working on called Trench (<a href="https://trench.dev" rel="nofollow">https://trench.dev</a>). It's open source analytics infrastructure for tracking events, page views, and identifying users, and it's built on top of ClickHouse and Kafka.<a href="https://github.com/frigadehq/trench">https://github.com/frigadehq/trench</a>I built Trench because the Postgres table we used for tracking events at our startup (<a href="http://frigade.com/">http://frigade.com/</a>) was getting expensive and becoming a performance bottleneck as we scaled to millions of end users.Many companies run into the same problem as us (e.g. Stripe, Heroku: <a href="https://brandur.org/fragments/events" rel="nofollow">https://brandur.org/fragments/events</a>). They often start by adding a basic events table to their relational database, which works at first, but can become an issue as the application scales. It’s usually the biggest table in the database, the slowest one to query, and the longest one to back up.With Trench, we’ve put together a single Docker image that gives you a production-ready tracking event table built for scale and speed. When we migrated our tracking table from Postgres to Trench, we saw a 42% reduction in cost to serve on our primary Postgres cluster and all lag spikes from autoscaling under high traffic were eliminated.Here are some of the core features:* Fully compliant with the Segment tracking spec e.g. track(), identify(), group(), etc.* Can handle thousands of events per second on a single node* Query tracking data in real-time with read-after-write guarantees* Send data anywhere with throttled and batched webhooks* Single production-ready docker image. No need to manage and roll your own Kafka/ClickHouse/Nodejs/etc.* Easily plugs into any cloud hosted ClickHouse and Kafka solutions e.g. ClickHouse Cloud, ConfluentTrench can be used for a range of use cases. Here are some possibilities:1. Real-Time Monitoring and Alerting: Set up real-time alerts and monitoring for your services by tracking custom events like errors, usage spikes, or specific user actions and sending that data anywhere with Trench’s webhooks2. Event Replay and Debugging: Capture all user interactions in real-time for event replay3. A/B Testing Platform: Capture events from different users and groups in real time. Segment users by querying in real time and serve the right experiences to the right users4. Product Analytics for SaaS Applications: Embed Trench into your existing SaaS product to power user audit logs or tracking scripts on your end-users’ websites5. Build a custom RAG model: Easily query event data and give users answers in real-time. LLMs are really good at writing SQLThe project is open-source and MIT-licensed. If there’s interest, we’re thinking about adding support for Elastic Search, direct data integrations (e.g. Redshift, S3, etc.), and an admin interface for creating queries, webhooks, etc.Have you experienced the same issues with your events tables? I'd love to hear what HN thinks about the project.

13 comments

bosky1017 months ago

1) Appreciate the single image to get started, but am particularly curious how you handle different events of a new user going to different nodes.2) any admin interface or just the rest API?3) a little bit on the clickhouse table and engine choices?4) stats on Ingesting and querying tbe same time5) node doesn't support the clickhouse TCP interface. This was a major bottleneck even with batching of 50k events (or 30 secs whichever comes first)6) CH indexes?7) how are events partitioned to a Kafka partition? By userId? Any assumptions on minimum fieldsWill try porting our in-house marketing automation backend (posthog frontend compatible) to this and see how it goes (150M+ events per day)Kudos all around. Love all 3 of your technology choices.

评论 #41975168 未加载

hitradostava7 months ago

Looks interesting, we solved this problem with Kinesis Firehose, S3 and Athena. Pricing is cheap, you can run any arbitrary SQL query and there is zero infrastructure to maintain.

评论 #41979061 未加载

antman7 months ago

How does it scale? Can you spin up multiple containers? For upcoming features auto archiving to cloud storage old data would be great.

评论 #41976278 未加载

Attummm7 months ago

Looks great, but what is missing for me are use cases.Why should I use it? What are the unique selling points of your project?

评论 #41974909 未加载

codegeek7 months ago

Looks good. In market for something like this and I just ran it locally. how do I visualize data ? Is Grafana not included by default.Also, minor issue in your docs. There is an extra comma in the sample JSON under the sample event. The fragment below:<pre><code> "properties": { "totalAccounts": 4, "country": "Denmark" }, }] </code></pre> I had to remove that comma at the end.

评论 #41976889 未加载

d_watt7 months ago

Looks super interesting. Any positioning thoughts on this vs <a href="https://jitsu.com">https://jitsu.com</a> ?

评论 #41975259 未加载

brody_slade_ai7 months ago

I've been exploring open source data analytics software and it's been a game-changer. I mean the flexibility and cost savings are huge perks. I've been looking into Apache Spark and KNIME, and they both seem like great options

Incipient7 months ago

>LLMs are really good at writing SQLUnfortunately not my experience. Possibly not well promoted, but trying to get vscode copilot to generate anything involving semi-basic joins fall quite flat.

oulipo7 months ago

What is the advantage of this rather than using a postgres plugin for clickhouse and S3 storage of the data to build a kind of data-warehouse, which wouldn't require the bloat of Kafka?

评论 #41977604 未加载

remram7 months ago

If you don't mind me asking, why the name "Trench"?

评论 #41988900 未加载

asdev7 months ago

how is this different from Posthog?

评论 #41979219 未加载

评论 #41977572 未加载

oulipo7 months ago

Could this be used to log IoT object events? or is it more for app analytics?

评论 #41977494 未加载

biddendidden7 months ago

I _totally_ associate 'trench' with 'analytics'. Oh, perhaps the author associates it with 'infrastructure'? Just stupid.

13 comments

bosky1017 months ago

评论 #41975168 未加载

hitradostava7 months ago

Looks interesting, we solved this problem with Kinesis Firehose, S3 and Athena. Pricing is cheap, you can run any arbitrary SQL query and there is zero infrastructure to maintain.

评论 #41979061 未加载

antman7 months ago

How does it scale? Can you spin up multiple containers? For upcoming features auto archiving to cloud storage old data would be great.

评论 #41976278 未加载

Attummm7 months ago

Looks great, but what is missing for me are use cases.Why should I use it? What are the unique selling points of your project?

评论 #41974909 未加载

codegeek7 months ago

评论 #41976889 未加载

d_watt7 months ago

Looks super interesting. Any positioning thoughts on this vs <a href="https://jitsu.com">https://jitsu.com</a> ?

评论 #41975259 未加载

brody_slade_ai7 months ago

Incipient7 months ago

>LLMs are really good at writing SQLUnfortunately not my experience. Possibly not well promoted, but trying to get vscode copilot to generate anything involving semi-basic joins fall quite flat.

oulipo7 months ago

What is the advantage of this rather than using a postgres plugin for clickhouse and S3 storage of the data to build a kind of data-warehouse, which wouldn't require the bloat of Kafka?

评论 #41977604 未加载

remram7 months ago

If you don't mind me asking, why the name "Trench"?

评论 #41988900 未加载

asdev7 months ago

how is this different from Posthog?

评论 #41979219 未加载

评论 #41977572 未加载

oulipo7 months ago

Could this be used to log IoT object events? or is it more for app analytics?

评论 #41977494 未加载

biddendidden7 months ago

I _totally_ associate 'trench' with 'analytics'. Oh, perhaps the author associates it with 'infrastructure'? Just stupid.

Show HN: Trench – Open-source analytics infrastructure

13 comments

Show HN: Trench – Open-source analytics infrastructure

13 comments