Optimizing Postgres table layout for maximum efficiency

204 pointsby napsterbr8 months ago

14 comments

This is a super useful thing to know and I'm thankful for this article highlighting this aspect of Postgres.But I would disagree with the takeaway to focus on optimizing your indexes and not your tables. The reason is that the order of columns in a multi-column index is highly meaningful and intentional in order to support match on a range of values for the last column in the index. The way databases work you can only utilize a multi-column index on (customer_id int4, date timestamp) if have an equality match on customer_id, like "WHERE customer_id = 1 AND BETWEEN '2024-01-01' and '2025-01-01'". If you reorder these columns in the index to put the larger date column first, then, sure, you save space in the index, but you also make it worthless – it will never be used by the query above. As such, optimizing a multi-column index is only useful when all the columns are queried for equality rather than a range.In contrast, when you are creating a NEW table you might not think hard about the order of the columns in the table, and especially not about the data-sizes of each column and their alignment. But doing so at the time you create the table can be tremendously beneficial if it is going to be very large. It is important to note that you not only save space on-disk, but in precious RAM when the tuples are loaded.

评论 #41773526 未加载

评论 #41770482 未加载

评论 #41770450 未加载

branko_d7 months ago

From the article:<pre><code> Having the data aligned ensures faster access time when retrieving pages from disk. </code></pre> Byte-level alignment cannot possibly have anything to do with retrieving pages from disk, simply because the unit of retrieval is the whole page. From the hardware/OS perspective, a page is just an opaque blob of bytes (comprised from one or more blocks on the physical drive).Only after these bytes have reached RAM does the byte-level alignment play a role, because CPU works slower on misaligned data.The article itself then goes on to illustrates the above (and seemingly contradict itself):<pre><code> SQLite does not pad or align columns within a row. Everything is tightly packed together using minimal space. Two consequences of this design: SQLite has to work harder (use more CPU cycles) to access data within a row once it has that row in memory. SQLite uses fewer bytes on disk, less memory, and spends less time moving content around because there are fewer bytes to move.</code></pre>

评论 #41774212 未加载

gnfargbl7 months ago

The ever-useful postgres_dba tool (<a href="https://github.com/NikolayS/postgres_dba">https://github.com/NikolayS/postgres_dba</a>) can help with this. Install it, start psql, run :dba and choose option p1 to see how much table space you would save by reordering your columns.

评论 #41772369 未加载

wccrawford7 months ago

Interesting. But if this is important, why doesn't Postgres do it invisibly, automatically? Surely there's a most-correct sequence, with the longest values first, and the shortest values last, and then (probably) the variable-length values?It could then happily report back the data in the order that the DB Admin/developer expects.

评论 #41770566 未加载

评论 #41770049 未加载

rtuin7 months ago

It’s really something you don’t know how this applies to your Postgres DB, until you run into a situation where you do.The author explains this very well, it’s a good read! I’ve learned about this padding little over a year ago, while I was designing a data intensive application with a colleague. I was skeptical about the advantage at first, but for our specific design, where we have 100 to 480+ columns in one table it makes a huge difference on table store size. Not so much on the indexes, though.

评论 #41770548 未加载

silvestrov7 months ago

Wild guess: this is a left-over from the old days of Sun SPARC processors which could only access doubles on 8-byte aligned memory addresses.For Intel processors you just paid a performance penality for unaligned accesses but SPARC processors would generate a TRAP for unaligned accesses.I'd really like to see some performance testing of alignment for modern Intel/ARM CPUs. My guess is that tightly packing would be better as more data would fit into caches and you would have less i/o.Modern CPUs are very different from back in the 90's when PostgreSQL was designed. Posgresql has traditionally been very slow to stop supporting old OSes that are no longer used.

didgetmaster7 months ago

It's been awhile since I dug into the bowels of PG; but the overall row size with respect to page size used to be important. If you had a table where every row took up 4097 bytes and the page size was 8K; the your disk footprint for that table was double. Only one row fit in a page and the other (nearly) half of each page was wasted.If that is still true, then alignment issues could push you over the edge if you are close to it.

remus7 months ago

Very interesting, but I think the author overstates the importance of alignment a little. Unless your data/indexes are already of a challenging size (or you expect them to be imminently) for your hardware then fiddling with byte alignment details feels like a fairly premature optimisation.Disk is cheap, memory is plentiful, your time is expensive etc.

评论 #41770032 未加载

评论 #41770615 未加载

评论 #41769807 未加载

评论 #41772221 未加载

SoftTalker7 months ago

This is the sort of thing that good DBAs used to know about and manage, but nowadays that isn't a fashionable job.

评论 #41772155 未加载

OliverJones7 months ago

Good material!It has to be said, the order of columns in correctly designed multicolumn BTREE indexes is governed by the shape of the queries the indexes support.So don't arbitarily reorder columns in your indexes to handle alignment.

SchwKatze7 months ago

Since in most applications the order doesn't matter, why postegres don't make itself the recording to archive better data alignment instead the padding approach?

koolba7 months ago

The content itself is fine but the unnecessary image above the label “ You have the right to remain aligned” with the usual AI-generated garbage text cheapens the entire article.

评论 #41769841 未加载

tiffanyh7 months ago

How much of this article is rooted in hard drives with physically mechanical spinning disk latency assumptions, vs ssd/nvme?

评论 #41776592 未加载

评论 #41775566 未加载

delduca7 months ago

Could the Django ORM deal with this?

评论 #41770578 未加载

评论 #41770740 未加载