TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

ClickHouse gets lazier and faster: Introducing lazy materialization

366 点作者 tbragin22 天前

19 条评论

tmoertel22 天前
This optimization <i>should</i> provide dramatic speed-ups when taking random samples from massive data sets, especially when the wanted columns can contain large values. That&#x27;s because the basic SQL recipe relies on a LIMIT clause to determine which rows are in the sample (see query below), and this new optimization promises to defer reading the big columns until the LIMIT clause has filtered the data set down to a tiny number of lucky rows.<p><pre><code> SELECT * FROM Population WHERE weight &gt; 0 ORDER BY -LN(1.0 - RANDOM()) &#x2F; weight LIMIT 100 -- Sample size. </code></pre> Can anyone from ClickHouse verify that the lazy-materialization optimization speeds up queries like this one? (I want to make sure the randomization in the ORDER BY clause doesn&#x27;t prevent the optimization.)
评论 #43764309 未加载
评论 #43764290 未加载
评论 #43766771 未加载
jurgenkesker22 天前
I really like Clickhouse. Discovered it recently, and man, it&#x27;s such a breath of fresh air compared to suboptimal solutions I used for analytics. It&#x27;s so fast and the CLI is also a joy to work with.
评论 #43768992 未加载
评论 #43765131 未加载
评论 #43767097 未加载
评论 #43766011 未加载
评论 #43768253 未加载
simonw22 天前
Unrelated to the new materialization option, this caught my eye:<p>&quot;this query sorts all 150 million values in the helpful_votes column (which isn’t part of the table’s sort key) and returns the top 3, in just 70 milliseconds cold (with the OS filesystem cache cleared beforehand) and a processing throughput of 2.15 billion rows&#x2F;s&quot;<p>I clearly need to update my mental model of what might be a slow query against modern hardware and software. Looks like that&#x27;s so fast because in a columnar database it only has to load that 150 million value column. I guess sorting 150 million integers in 70ms shouldn&#x27;t be surprising.<p>(Also &quot;Peak memory usage: 3.59 MiB&quot; for that? Nice.)<p>This is a really great article - very clearly explained, good diagrams, I learned a bunch from it.
评论 #43764126 未加载
评论 #43767805 未加载
评论 #43767921 未加载
评论 #43764431 未加载
评论 #43764361 未加载
kwillets22 天前
Late Materialization, 19 years later.<p><a href="https:&#x2F;&#x2F;dspace.mit.edu&#x2F;bitstream&#x2F;handle&#x2F;1721.1&#x2F;34929&#x2F;MIT-CSAIL-TR-2006-078.pdf;sequence=1" rel="nofollow">https:&#x2F;&#x2F;dspace.mit.edu&#x2F;bitstream&#x2F;handle&#x2F;1721.1&#x2F;34929&#x2F;MIT-CSA...</a>
评论 #43771894 未加载
mmsimanga22 天前
IMHO if ClickHouse had Windows native release that does not need WSL or a Linux virtual machine it would be more popular than DuckDB. I remember for years MySQL being way more popular than PostgreSQL. One of the reasons being MySQL had a Windows installer.
评论 #43768015 未加载
评论 #43769174 未加载
Onavo22 天前
Reminder clickhouse can be optionally embedded, you don&#x27;t need to reach for Duck just because of hype (it&#x27;s buggy as hell everytime I tried it).<p><a href="https:&#x2F;&#x2F;clickhouse.com&#x2F;blog&#x2F;chdb-embedded-clickhouse-rocket-engine-on-a-bicycle" rel="nofollow">https:&#x2F;&#x2F;clickhouse.com&#x2F;blog&#x2F;chdb-embedded-clickhouse-rocket-...</a>
评论 #43766023 未加载
justmarc22 天前
Clickhouse is a masterpiece of modern engineering with absolute attention to performance.
skeptrune22 天前
&gt;Despite the airport drama, I’m still set on that beach holiday, and that means loading my eReader with only the best.<p>What a nice touch. Technical information and diagrams in this were top notch, but the fact there was also some kind of narrative threaded in really put it over the top for me.
xiasongh22 天前
Has anyone compared ClickHouse and StarRocks[0]? Join performance seems a lot better on StarRocks a few months ago but I&#x27;m not sure if that still holds true.<p>[0] <a href="https:&#x2F;&#x2F;www.starrocks.io&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.starrocks.io&#x2F;</a>
评论 #43774606 未加载
vjerancrnjak22 天前
It&#x27;s quite amazing how a db like this shows that all of those row-based dbs are doing something wrong, they can&#x27;t even approach these speeds with btree index structures. I know they like transactions more than Clickhouse, but it&#x27;s just amazing to see how fast modern machines are, billions of rows per second.<p>I&#x27;m pretty sure they did not even bother to properly compress the dataset, with some tweaking, could have probably been much smaller than 30GBs. The speed shows that reading the data is slower than decompressing it.<p>Reminds me of that Cloudflare article where they had a similar idea about encryption being free (slower to read than to decrypt) and finding a bug, that when fixed, materialized this behavior.<p>The compute engine (chdb) is a wonder to use.
评论 #43766075 未加载
simianwords22 天前
Maybe I&#x27;m too inexperienced in this field but reading the mechanism I think this would be an obvious optimisation. Is it not?<p>But credit where it is due, obviously clickhouse is an industry leader.
评论 #43764704 未加载
评论 #43765284 未加载
ohnoesjmr22 天前
Wonder how well this propagates down to subqueries&#x2F;CTE&#x27;s
hexo21 天前
Whats up with these unscrollable websites? i dont get it. i scroll down a bit and it jumps up making it impossible to use.
apwell2322 天前
is apache druid still a player in this space ? Never seem to hear about it anymore. why would someone choose it over clickhouse?
评论 #43771095 未加载
higeorge1322 天前
That’s an awesome change. Will that also work for limit offset queries?
评论 #43774557 未加载
meta_ai_x22 天前
can we take the &quot;packing your luggage&quot; analogy and only pack the things we actually use in the trip and apply that to clickhouse?
评论 #43769581 未加载
jangliss21 天前
Thought this was Clickhole.com and was waiting for the payoff to the joke
dangoodmanUT22 天前
God clickhouse is such great software, if it only it was as ergonomic as duckdb, and management wasn&#x27;t doing some questionable things (deleting references to competitors in GH issues, weird legal letters, etc.)<p>The CH contributors are really stellar, from multiple companies (Altinity, Tinybird, Cloudflare, ClickHouse)
评论 #43767210 未加载
评论 #43767905 未加载
评论 #43767987 未加载
评论 #43773820 未加载
tnolet21 天前
We adopted ClickHouse ~4 years ago. We COULD have stayed on just Postgres. With a lot of bells, whistles, aggregation, denormalisation, aggressive retention limits and job queues etc. we could have gotten acceptable response times for our interactive dashboard.<p>But we chose ClickHouse and now we just pump in data with little to no optimization.
评论 #43770780 未加载
评论 #43770553 未加载
评论 #43771473 未加载
评论 #43773821 未加载