What is the best way to ETL data from one Postgres cluster to another (by ETL I mean things like data sanitization and denormalization)?<p>a) I think most people end up doing some kind of batch synchronization, but I'm interested in streaming solutions.<p>b) A lot of folks use trigger based replication, but triggers have to be on the primary/master node, and not just on the replicas.<p>c) Another common solution is to force the database client to write to a message broker, but that opens the door to data discrepancies and synchronization issues.<p>d) In theory I think the best way is to do something like bottledwater-pg or pg_kafka [1] [2] [3], but I'm not sure how battle hardened these are. I think logical replication of the WAL is the right approach, but there is still not much tooling around this.<p>[1] <a href="https://github.com/confluentinc/bottledwater-pg" rel="nofollow">https://github.com/confluentinc/bottledwater-pg</a><p>[2] <a href="https://github.com/xstevens/pg_kafka" rel="nofollow">https://github.com/xstevens/pg_kafka</a><p>[3] <a href="https://github.com/xstevens/decoderbufs" rel="nofollow">https://github.com/xstevens/decoderbufs</a><p>PS: There are a bunch of interesting MySQL solutions out there, such as Zendesk's Maxwell:<p><a href="https://github.com/zendesk/maxwell" rel="nofollow">https://github.com/zendesk/maxwell</a>
With regards to psql2csv: The default psql can already do this very nicely. Just use \copy.<p>\copy (select whatever from whatever) to 'yourlocalfile.csv' with (format 'csv')<p>and if you want column headers, add a "header true" inside of the with clause.<p>Generally, \copy works just like COPY[1] but it does so from the remote server to the local machine, whereas file names given to COPY are relative to the server.<p>Yes. A dedicated tool might feel easier initially, but once you know how \copy works, you can always get a CSV file from whatever database you're connected to and no matter what machine you're on.<p>[1]: <a href="http://www.postgresql.org/docs/current/static/sql-copy.html" rel="nofollow">http://www.postgresql.org/docs/current/static/sql-copy.html</a>
I found this talk by Christophe Pettus [1] very informative. The title is somewhat misleading as most of the talk has little to do with Python, but it's a good introduction to more advanced Postgres concepts. Also available in PDF form [2].<p>[1] <a href="https://www.youtube.com/watch?v=0uCxLCmzaG4" rel="nofollow">https://www.youtube.com/watch?v=0uCxLCmzaG4</a><p>[2] <a href="http://thebuild.com/presentations/pycon-2014-pppp.pdf" rel="nofollow">http://thebuild.com/presentations/pycon-2014-pppp.pdf</a>
Does anyone know of sample PostgreSQL databases? Something akin to Northwind (<a href="https://northwinddatabase.codeplex.com/" rel="nofollow">https://northwinddatabase.codeplex.com/</a>)?
Can't it just be a 'list' of Postgres stuff? Do we really expect that it's generated by an algorithm if someone doesn't write 'curated'? If it's a good list, I'm going to find it interesting no matter how it was created.