I love postgres too. We went from a brittle event sourcing architecture to postgres_fdw, works really well.
We do something similar with a few differences:<p>We swap entire schemas on every run (one schema being one upstream database or collection of analysis) atomically instead of deleting data and then inserting data. This means anyone querying the dw never sees an empty table, it's yesterdays data until instantly its todays data, consistently for the entire source. If something breaks the data is just stale, never missing.<p>We don't do long running selects from foreign tables. This is because a long running select of a huge table can take many minutes. If an upstream database deploys a migration which changes the database schema that migration could time-out and stop the deploy while waiting for the select to finish so it can aquire its ACCESS EXCLUSIVE lock. This means we never hold even ACCESS SHARE locks for more than a couple of seconds. Basically the agreement is "the DW should never interrupt the operation of the upstream databases". We do this by a custom batched_create_table_from procedure. For integer id tables it's trivial. For uuid tables we fetch all the uuids in one go into a temp table with an integer id and then get chunks of uuid values at a time. This works well because we're always sending indexed id values to the foreign server so the foreign server can use its primary key indexes to only give us the data we want for that chunk, without scanning the entire table. It's a bit of a hack and I wish postgres_fdw's fetch_size parameter would release locks between fetches. Of course that would break atomicity so I'm doubting it's gonna happen.<p>We run an external cron schedule which launches an ECS task that runs a bash script with our sql files that define the DW's schema. It's a really primitive DAG with bash & and wait. No scheduling happens in the db itself. This means we can schedule, check logs, monitor, etc like all our other ECS tasks. Some bash magic lets us define basically a DAG, though we run blocks of independent tasks (sql files) instead of building a proper dependency tree between individual tasks.<p>I learned about ctid recently, that could have worked for the chunking.<p>We're currently moving to bigquery and DBT for a few reasons. Primarily to be able to handle silly amounts of user interaction data where the experience for analysts is just much better in bigquery. DBT is also really nice, though I do miss the simplicity of just sql files. The development setup for the old DW relied on rebuilding it locally on a developer machine, which eventually ran into storage limitations and people getting lazy and not rebuilding from scratch before submitting their PRs, we just never got around to building proper CI for it. Rebuilding our postgres DW from scratch takes a few hours now, the same logic in bigquery runs in 10 minutes, which is undeniably amazing.