A composite index can also be used for a partial index scan[1], so if you're frequently looking for just int100 as well as the int100,int1000 combination (as per the article's example), then a composite index int100,int1000 can be used for queries just filtering on int100.<p>The order of the columns in the composite index might matter then. We got some nice savings by reordering columns in indexes and changing the join order in queries (our DB wasn't <i>that</i> smart) or adding a "useless" filters to the where clause, allowing us to consolidate multiple indexes into one composite one.<p>[1]: might be using the wrong terminology here, I'm not talking about a partial index[2], but using only the first N dimensions of a M-dimensional index.<p>[2]: <a href="https://www.postgresql.org/docs/current/indexes-partial.html" rel="nofollow">https://www.postgresql.org/docs/current/indexes-partial.html</a>
Comparison on Clickhouse, also runs in about 30-40ms, however there's no indexing being used and this is a full-table scan.<p><pre><code> create table if not exists test_table
(
id UInt64,
text1 String,
text2 String,
int1000 UInt64,
int100 UInt64,
int10 UInt64,
int10_2 UInt64
)
engine = MergeTree()
order by (id)
;
insert into test_table
with
repeat('b', 1024) as one_kib,
repeat('b', 255) as bytes_255
select
number as id,
one_kib,
bytes_255,
rand() % 1000 as int1000,
rand() % 100 as int100,
rand() % 10 as int10,
rand() % 10 as int10_2
from numbers(10e6)
;
> select count(*) from test_table where int1000 = 1 and int100 = 1;
┌─count()─┐
│ 9949 │
└─────────┘
1 row in set. Elapsed: 0.034 sec. Processed 10.00 million rows, 160.00 MB (290.93 million rows/s., 4.65 GB/s.)
</code></pre>
The same table but with 1B rows instead, runs in ~1800ms<p><pre><code> > select count(*) from test_table where int1000 = 1 and int100 = 1;
┌─count()─┐
│ 999831 │
└─────────┘
1 row in set. Elapsed: 1.804 sec. Processed 1.00 billion rows, 16.00 GB (554.24 million rows/s., 8.87 GB/s.)
</code></pre>
[1] Converted the table create and insert logic from here: <a href="https://github.com/sirupsen/napkin-math/blob/master/newsletter/20-compound-vs-combining-indexes/test.rb" rel="nofollow">https://github.com/sirupsen/napkin-math/blob/master/newslett...</a>
<p><pre><code> For these 64-bit index entries we’d expect to have to scan roughly:
index_row_size⋅rows=2⋅64bit⋅10^5=1.5MiB
</code></pre>
Where do the 10^5 rows come from? With a composite index and a point query doesn't the database scan just the 100 returned rows?
Isn't the cause of the difference between MySQL and PostgreSQL in this particular case is that COUNT is (was?) implemented weirdly in PostgreSQL and was always slower than in MySQL?<p>Can PostgreSQL respond to any query straight from index without touching the rows if the index contains all necessary fields?