I'm curios if this is a fair benchmark (considering it was conducted by one of the vendors).<p>So I downloaded the mentioned whitepaper to look at the indexes they created in PostgreSQL.<p>The only thing mentioned in the whitepaper is this table definition:<p><pre><code> CREATE TABLE IF NOT EXISTS t1 (
"uuid" VARCHAR,
"ts" TIMESTAMP, "tenant_id" INT4, "sensor_id" VARCHAR,
"sensor_type" VARCHAR, "v1" INT4,
"v2" INT4,
"v3" FLOAT4,
"v4" FLOAT4,
"v5" BOOL,
"week_generated" TIMESTAMP, "taxonomy" ltree
);
</code></pre>
The table definition in Crate is a little different:<p><pre><code> CREATE TABLE IF NOT EXISTS b.t2 ( "uuid" STRING,
"ts" TIMESTAMP,
"tenant_id" INTEGER, "sensor_id" STRING, "sensor_type" STRING,
"v1" INTEGER,
"v2" INTEGER,
"v3" FLOAT,
"v4" FLOAT,
"v5" BOOLEAN,
"week_generated" TIMESTAMP GENERATED ALWAYS AS date_trunc('week', ts),
INDEX "taxonomy" USING FULLTEXT (sensor_type) WITH (analyzer='tree')
) PARTITIONED BY ("week_generated")
CLUSTERED BY ("tenant_id") INTO 3 SHARDS;
</code></pre>
I don't know anything about how Crate works, but I see PARTITIONED BY ("week_generated") and
CLUSTERED BY ("tenant_id") INTO 3 SHARDS.<p>Now looking at the first query, both, their PARTITION BY as well as CLUSTER BY columns appear in the query:<p><pre><code> SELECT min(v1) as v1_min, max(v1) as v1_max, avg(v1) as v1_avg, sum(v1) as v1_sum
FROM b.t2
WHERE tenant_id = ? AND week_generated BETWEEN ? AND ?;
</code></pre>
Unless there is a sufficiently good index used in PostgreSQL, this doesn't seem to be a fair benchmark.<p>In the Appendix they mention how the partitioning could be implemented in PostgreSQL, also mention the indexes to be used together with the partitioning, but also say: "Because of this, we abandoned the partitioning approach for our benchmark, instead opting for a single PostgreSQL table with no partition logic."<p>I didn't find any statement about if/which indexes they were using in PostgreSQL when measuring their benchmarks.<p>They also use a rather old PostgreSQL version (9.2 — released in 2012). Now, we have PostgreSQL 9.6 which has some parallel query execution support that might also change this figures dramatically.