TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

“SQL queries ran up to 29 times faster on CrateDB than they did on PostgreSQL”

7 pointsby nslaterabout 8 years ago

1 comment

MarkusWinandabout 8 years ago
I&#x27;m curios if this is a fair benchmark (considering it was conducted by one of the vendors).<p>So I downloaded the mentioned whitepaper to look at the indexes they created in PostgreSQL.<p>The only thing mentioned in the whitepaper is this table definition:<p><pre><code> CREATE TABLE IF NOT EXISTS t1 ( &quot;uuid&quot; VARCHAR, &quot;ts&quot; TIMESTAMP, &quot;tenant_id&quot; INT4, &quot;sensor_id&quot; VARCHAR, &quot;sensor_type&quot; VARCHAR, &quot;v1&quot; INT4, &quot;v2&quot; INT4, &quot;v3&quot; FLOAT4, &quot;v4&quot; FLOAT4, &quot;v5&quot; BOOL, &quot;week_generated&quot; TIMESTAMP, &quot;taxonomy&quot; ltree ); </code></pre> The table definition in Crate is a little different:<p><pre><code> CREATE TABLE IF NOT EXISTS b.t2 ( &quot;uuid&quot; STRING, &quot;ts&quot; TIMESTAMP, &quot;tenant_id&quot; INTEGER, &quot;sensor_id&quot; STRING, &quot;sensor_type&quot; STRING, &quot;v1&quot; INTEGER, &quot;v2&quot; INTEGER, &quot;v3&quot; FLOAT, &quot;v4&quot; FLOAT, &quot;v5&quot; BOOLEAN, &quot;week_generated&quot; TIMESTAMP GENERATED ALWAYS AS date_trunc(&#x27;week&#x27;, ts), INDEX &quot;taxonomy&quot; USING FULLTEXT (sensor_type) WITH (analyzer=&#x27;tree&#x27;) ) PARTITIONED BY (&quot;week_generated&quot;) CLUSTERED BY (&quot;tenant_id&quot;) INTO 3 SHARDS; </code></pre> I don&#x27;t know anything about how Crate works, but I see PARTITIONED BY (&quot;week_generated&quot;) and CLUSTERED BY (&quot;tenant_id&quot;) INTO 3 SHARDS.<p>Now looking at the first query, both, their PARTITION BY as well as CLUSTER BY columns appear in the query:<p><pre><code> SELECT min(v1) as v1_min, max(v1) as v1_max, avg(v1) as v1_avg, sum(v1) as v1_sum FROM b.t2 WHERE tenant_id = ? AND week_generated BETWEEN ? AND ?; </code></pre> Unless there is a sufficiently good index used in PostgreSQL, this doesn&#x27;t seem to be a fair benchmark.<p>In the Appendix they mention how the partitioning could be implemented in PostgreSQL, also mention the indexes to be used together with the partitioning, but also say: &quot;Because of this, we abandoned the partitioning approach for our benchmark, instead opting for a single PostgreSQL table with no partition logic.&quot;<p>I didn&#x27;t find any statement about if&#x2F;which indexes they were using in PostgreSQL when measuring their benchmarks.<p>They also use a rather old PostgreSQL version (9.2 — released in 2012). Now, we have PostgreSQL 9.6 which has some parallel query execution support that might also change this figures dramatically.
评论 #14275066 未加载