Is it not common knowledge that dropping indexes improves database insert performance?<p>Of course, that's often not an option when you you're loading records into a live database that's also getting queries, you usually don't want every query to result in a full table scan.<p>This was well known 20+ years ago when I was an entry-level DBA, and I assumed it was still well known today.
I must be getting old.<p>Kids, many years ago, even before jQuery, software would come with documentation that you could read and it would tell you how to use it effectively.<p>I know, crazy right? But to this day some of that old software, of which PostgreSQL is an example, still has this documentation that you can read, even before you use the software in a production system.<p>Yeah, yeah, I know Agile and Docker solved the problem of ever having to document anything, but this is the way things used to be and a few of us are stuck in our ways and still like it.
Wow. You've gotta love the audacity in making a big announcement like this based on the developer finally getting around to <i>reading the docs</i>. "Drop constraints and indexes for faster imports" is mass import 101.<p>The entirety of the "Why We Did It" section:<p>-----<p>> This improvement was pioneered by James Gordon, the Coalition’s lead developer.<p>> He drew instruction from PostgreSQL’s official documentation, which reads:<p>>> [snipping quoted sections from PostgreSQL manual at <a href="https://www.postgresql.org/docs/10/static/populate.html#POPULATE-RM-INDEXES" rel="nofollow">https://www.postgresql.org/docs/10/static/populate.html#POPU...</a> ]<p>>Gordon’s code handles this task using rarely utilized, low-level tools in Django’s database manager.<p>-----<p>Sadly, in the current day and age, a developer actually taking the time to RTFM may indeed qualify as "pioneering" work!<p>Perhaps the rest of us need to start trumpeting our accomplishments when we find some clearly-stated performance gain in the manual, rather than hiding our heads in embarrassment for not finding out until we released version 2.2 of our mass DB import tool.
Do you have any numbers on how much <i>extra</i> time (that is, time spent servicing queries above the normal query times when your tables are indexed) application queries take after the loads, but before the index rebuilds are complete?<p>If so, how does that compare, in aggregate, to the time saved in the loads?<p>Or are you simply not putting the application back into service until the index rebuilds have finished? How long does that take, compared to the time saved?<p>EDIT: I'm mostly asking these questions to nudge people to think about them in the course of trying this in their own environments. It's my day job to think about these kinds of things; I've worn the PostgreSQL DBA hat for over a decade now.
From personal experience: PostgreSQL's COPY commands aren't really all that performant, indexes or no.<p>Our project saw SIGNIFICANTLY better performance with batched multi-threaded INSERTs. If you can run a few hundred load threads and manage the concurrency correctly (not trivial), it will chew through big loads like a monster.<p>If I ever have the time/excuse, I want to go back and try a multi-threaded COPY. But if you need speed and have a choice between multi-threaded INSERTs or a single-threaded COPY, go with the INSERTs every time.
Why is everyone acting so pissy about the fact that someone happened to find a performance improvement trick by reading the docs? Isn't this what's supposed to happen?<p>None of you read all the performance "tricks" to Postgres before writing your first SQL statement.<p>Every day, somebody's born who doesn't know how to boos the performance of COPY by dropping indexes.
It looks like the article is directed at those doing bulk uploads to their system and not at developers necessarily. This may not be universally intuitive to people outside of Database admins and developers.
They should have stuck with bog-standard mySQL to get this time saving for free - if you restore a SQL dump created with phpMyAdmin or mysqldump then all the disable index commands are already in there, good to go, and in SQL.<p>Whoever wrote the Django bit didn't really do a good job on the defaults.