TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Scaling product analytics built on ClickHouse

64 pointsby macoboabout 3 years ago

2 comments

barrkelabout 3 years ago
ClickHouse is awesome, but most of the benefits come from columnar storage and you need to design around that. Be aware of how the thing works and how computer architecture works, because sympathy with the machine is what reaps rewards.<p>You want to minimize the number and size of columns touched when filtering and aggregating. If you need source data, store it relationally or in a document store and only select the key from CH. Don&#x27;t put JSON in CH, fat columns don&#x27;t make sense. And CH can be just as slow as MySQL if you select a whole wide row but only apply predicates to a handful of columns. Only touch the columns you need.<p>Joins are super expensive because it costs a whole lot of instructions to look up a row on a per value basis. CH can use vectorized operations to eliminate or aggregate multiple &quot;rows&quot; with instruction level parallelism because the column data is contiguous. Joins are going to be an order of magnitude slower, just with memory latency randomly hopping around a hash table. Insert data prejoined; use the low cardinality string column, substitute (i.e. precalculate) conditions on low cardinality relations with integer IN tests, denormalize high cardinality relations.<p>Partitions and various other storage level tricks are a way to eek out better perf for mutations when they&#x27;re needed. Rebuild a subset of the data and swap it in and out. This is common on Hadoop based columnar stores like Parquet and last I looked at CH it was getting better ways to shuffle partitions around, &quot;attach partition from&quot; and so on.
评论 #30773312 未加载
评论 #30774418 未加载
评论 #30775410 未加载
评论 #30777151 未加载
tiffanyhabout 3 years ago
I like Posthog (and ClickHouse).<p>If the author of this post is reading, just a recommendation - when you write posts talking about performance increases - the reader kind of expects to see some type of before&#x2F;after graph that pictorially shows the improvements.
评论 #30767973 未加载