This is a super useful thing to know and I'm thankful for this article highlighting this aspect of Postgres.<p>But I would disagree with the takeaway to focus on optimizing your indexes and not your tables. The reason is that the order of columns in a multi-column index is highly meaningful and intentional in order to support match on a range of values for the last column in the index. The way databases work you can only utilize a multi-column index on (customer_id int4, date timestamp) if have an equality match on customer_id, like "WHERE customer_id = 1 AND BETWEEN '2024-01-01' and '2025-01-01'". If you reorder these columns in the index to put the larger date column first, then, sure, you save space in the index, but you also make it worthless – it will never be used by the query above. As such, optimizing a multi-column index is only useful when all the columns are queried for equality rather than a range.<p>In contrast, when you are creating a NEW table you might not think hard about the order of the columns in the table, and especially not about the data-sizes of each column and their alignment. But doing so at the time you create the table can be tremendously beneficial if it is going to be very large. It is important to note that you not only save space on-disk, but in precious RAM when the tuples are loaded.
From the article:<p><pre><code> Having the data aligned ensures faster access time when retrieving pages from disk.
</code></pre>
Byte-level alignment cannot possibly have anything to do with retrieving pages from disk, simply because the unit of retrieval is the whole page. From the hardware/OS perspective, a page is just an opaque blob of bytes (comprised from one or more blocks on the physical drive).<p>Only after these bytes have reached RAM does the byte-level alignment play a role, because CPU works slower on misaligned data.<p>The article itself then goes on to illustrates the above (and seemingly contradict itself):<p><pre><code> SQLite does not pad or align columns within a row. Everything is tightly packed together using minimal space. Two consequences of this design:
SQLite has to work harder (use more CPU cycles) to access data within a row once it has that row in memory.
SQLite uses fewer bytes on disk, less memory, and spends less time moving content around because there are fewer bytes to move.</code></pre>
The ever-useful postgres_dba tool (<a href="https://github.com/NikolayS/postgres_dba">https://github.com/NikolayS/postgres_dba</a>) can help with this. Install it, start psql, run :dba and choose option p1 to see how much table space you would save by reordering your columns.
Interesting. But if this is important, why doesn't Postgres do it invisibly, automatically? Surely there's a most-correct sequence, with the longest values first, and the shortest values last, and then (probably) the variable-length values?<p>It could then happily report back the data in the order that the DB Admin/developer expects.
It’s really something you don’t know how this applies to your Postgres DB, until you run into a situation where you do.<p>The author explains this very well, it’s a good read! I’ve learned about this padding little over a year ago, while I was designing a data intensive application with a colleague. I was skeptical about the advantage at first, but for our specific design, where we have 100 to 480+ columns in one table it makes a huge difference on table store size. Not so much on the indexes, though.
Wild guess: this is a left-over from the old days of Sun SPARC processors which could only access doubles on 8-byte aligned memory addresses.<p>For Intel processors you just paid a performance penality for unaligned accesses but SPARC processors would generate a TRAP for unaligned accesses.<p>I'd really like to see some performance testing of alignment for modern Intel/ARM CPUs. My guess is that tightly packing would be better as more data would fit into caches and you would have less i/o.<p>Modern CPUs are very different from back in the 90's when PostgreSQL was designed. Posgresql has traditionally been very slow to stop supporting old OSes that are no longer used.
It's been awhile since I dug into the bowels of PG; but the overall row size with respect to page size used to be important. If you had a table where every row took up 4097 bytes and the page size was 8K; the your disk footprint for that table was double. Only one row fit in a page and the other (nearly) half of each page was wasted.<p>If that is still true, then alignment issues could push you over the edge if you are close to it.
Very interesting, but I think the author overstates the importance of alignment a little. Unless your data/indexes are already of a challenging size (or you expect them to be imminently) for your hardware then fiddling with byte alignment details feels like a fairly premature optimisation.<p>Disk is cheap, memory is plentiful, your time is expensive etc.
Good material!<p>It has to be said, the order of columns in correctly designed multicolumn BTREE indexes is governed by the shape of the queries the indexes support.<p>So don't arbitarily reorder columns in your indexes to handle alignment.
Since in most applications the order doesn't matter, why postegres don't make itself the recording to archive better data alignment instead the padding approach?
The content itself is fine but the unnecessary image above the label “ <i>You have the right to remain aligned</i>” with the usual AI-generated garbage text cheapens the entire article.