TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

We Built a 19 PiB Logging Platform with ClickHouse and Saved Millions

54 pointsby samberabout 1 year ago

5 comments

rorycrispinabout 1 year ago
Hey! I'm the original author of this post. I'm so excited to share our journey with ClickHouse and the open source Observability world, I'll be happy to answer any questions you may have!
评论 #39914086 未加载
everfrustratedabout 1 year ago
Great write up.<p>&gt;The recent efforts to move the JSON type to production-ready status will be highly applicable to our logging use case. This feature is currently being rearchitected, with the development of the Variant type providing the foundation for a more robust implementation. When ready, we expect this to replace our map with more strongly typed (i.e. not uniformly typed) metadata structures that are also possibly hierarchical.<p>Very happy to see ClickHouse dogfooding itself for storing logs - hope this will help to hasten the work on improving the the json type more suitable to dynamic documents.
评论 #39912049 未加载
ankitnayanabout 1 year ago
Interesting post.<p>How do you apply restrictions on your queries? Otherwise a few concurrent queries scanning huge data or being slow due to groupby, etc can slowdown the system.<p>Also, I see a sorting key of `ORDER BY (PodName, Timestamp)`. While debugging, filtering by service_name, deployment_name, env, region, etc is probably going to be slow?
GrumpyNlabout 1 year ago
Its a log. When a log is that big, is it still use full?
rthnbgrredfabout 1 year ago
The use-case of 19 PiB of logging data feels very constructed to me. I worked for smaller and bigger companies and never faced logging into the petabyte range. I&#x27;m not saying it&#x27;s not a thing, FAANG level companies certainly have such needs, but they have their own large scale solutions already. The question remains, besides bragging, who is the average Joe with 19 PiB of logging data you might want to address as potential customer?<p>What would be useful from my perspective are benchmarks in the more common terabyte range. How much faster is it to query compared to existing cloud offering, what features does e.g. Datadog vs Clickhouse has to analyze the data? In the end the raw data is not much useful if you cannot easily find and extract meaningful data out of it.
评论 #39915001 未加载