You can go faster than this. (Mind you, the following suggestions will only matter if the DB is still your real bottleneck. Once you’re using COPY, you start to have to optimize the rest of your ETL pipeline as well.)<p>I’ve done all of the following (except the last bit about pg_bulkload) in my own ETL stack. They work!<p>——<p>1. COPY supports a binary format in place of CSV. It’s proprietary (it’s not even exposed in libpq, only statically linked into the postmaster, pgdump and pgrestore binaries!) but also pretty easy to implement. Here’s a python package implementing it: <a href="https://pypi.org/project/pgcopy/" rel="nofollow">https://pypi.org/project/pgcopy/</a>. As long as you’ve got the rows in memory at some point, just generate that instead of CSV. The rest of the architecture is the same.<p>2. Run concurrent SQL COPY commands on separate connections for each [staging] table you’re populating. This allows PG to populate the tables in parallel.<p>3. Create (or TRUNCATE) the staging tables in the same transaction that contains the COPY. This way, the COPY won’t need to operate on a differential MVCC state, but rather can operate on the “raw” table. (And, if this is the “base” data load for a given table, rather than an incremental one against an already-populated table, then you don’t need a staging table; the target table itself is now your staging table. Just ensure that any CREATE INDEX or CREATE PRIMARY KEY or ADD CONSTRAINT happens <i>after</i> you populate the table with data. No need for a TEMPORARY/unlogged intermediate; the performance would actually be worse if you added one.)<p>4. Pre-bake your .sql, embedding the `FORMAT binary` data, <i>then</i> stream it once you’re done. I can’t tell you how many times the problem with an ETL pipeline wasn’t PG’s ingestion speed, but rather the speed at which the ETL transformer process was writing to the socket. The rest of your pipeline can be asynchronous, but actually streaming into your PG instance takes up valuable shared resources on the instance. Assuming that the instance being ETLed to isn’t just a replication master, and actually receives read traffic, you’ll want writes to that instance to complete as quickly as possible—even at the expense of making the rest of your ETL pipeline more complicated. So pre-bake your .sql files, and then stream them. (One advantage, once you do this: you don’t need a runtime with an SQL client library any more. You can just spawn instances of `psql foo.sql`. These processes don’t even have to coincide with your transform stage any more! Push your `.sql` files to object storage, and then run `psql` on a dedicated loader VM—or from an object-storage-lifecycle-event-triggered Lambda function, if you like.)<p>Fun fact: when you `pgdump --format=directory`, pgrestore(1) will do techniques #1 - #4 for you automatically.<p>Key insight: the pgdump(1) and pgrestore(1) pair of binaries are optimized to heck and back. If pgdump(1) or pgrestore(1) has some special way of doing something, it’s probably for performance; look into doing that thing in your pipeline. (Even if there’s no library for your runtime to allow you to do that thing. Write one!)<p>——<p>You can go faster than even pgdump+pgrestore will “naively” do, while still being compatible with “standard Postgres”, if you’re willing to mangle your data architecture a bit. Specifically, you can take advantage of Postgres’s table partitioning support. (Yes, now you have to deal with partitions. But hey, you probably already had to at the data sizes we’re talking about.)<p>5. Turn your incremental loads into table foo (via fresh staging tables) into base loads of fresh partitions of table foo. Each load creates its own partition. Techniques #1 and #2 will accelerate and parallelize the loading of sibling partitions of a parent table, just as well as they’ll accelerate the loading of separate tables.<p>Once you do this, you don’t need <i>any</i> staging tables, because every load is a “base” load of a table. (Yes, you’ll have a lot of tables. You’ll need some agent sitting around to do async rollups using `CREATE TABLE AS ...` to consolidate these. If you have access to the TimescaleDB extension, you might be able to trick it into doing this for you automatically.)<p>#6 (or #3b). Include your CREATE INDEX statements in the same transaction that CREATEs and COPYies into the partitions (after the data is populated, though.) One of the worst thing for the production performance of a data warehouse is contention between application reads and an index rebuild. If you always create new partitions, and only ever index them when you’re creating them, then your indices will be created against tables that aren’t visible to new clients yet, and so clients and the indexer will never get in one-another’s way.<p>——<p>If, after all this, you’re looking for the absolute fastest way to load data into Postgres, <i>and</i> you have control over your PG instance (i.e. it’s not a hosted DBaaS), you could try <a href="https://pgxn.org/dist/pg_bulkload/" rel="nofollow">https://pgxn.org/dist/pg_bulkload/</a>. It’s a PG extension that serves as a faster replacement for the COPY command, by bypassing most of the buffers and locks related to PG’s individual request-handling. It shouldn’t cause any problems with MVCC, but only because it assumes/requires that you’re also doing all of the above, so you’re never trying to load into a table users can “see.” (I haven’t tried it personally, but if PG’s ingest speed becomes my bottleneck again, I just might.)