TechEcho

12 comments

Johnny555over 7 years ago

Is it not common knowledge that dropping indexes improves database insert performance?Of course, that's often not an option when you you're loading records into a live database that's also getting queries, you usually don't want every query to result in a full table scan.This was well known 20+ years ago when I was an entry-level DBA, and I assumed it was still well known today.

评论 #16236833 未加载

评论 #16238484 未加载

评论 #16238139 未加载

drblastover 7 years ago

I must be getting old.Kids, many years ago, even before jQuery, software would come with documentation that you could read and it would tell you how to use it effectively.I know, crazy right? But to this day some of that old software, of which PostgreSQL is an example, still has this documentation that you can read, even before you use the software in a production system.Yeah, yeah, I know Agile and Docker solved the problem of ever having to document anything, but this is the way things used to be and a few of us are stuck in our ways and still like it.

评论 #16236838 未加载

评论 #16236367 未加载

评论 #16236781 未加载

评论 #16237472 未加载

cookiecaperover 7 years ago

Wow. You've gotta love the audacity in making a big announcement like this based on the developer finally getting around to reading the docs. "Drop constraints and indexes for faster imports" is mass import 101.The entirety of the "Why We Did It" section:-----> This improvement was pioneered by James Gordon, the Coalition’s lead developer.> He drew instruction from PostgreSQL’s official documentation, which reads:>> [snipping quoted sections from PostgreSQL manual at <a href="https://www.postgresql.org/docs/10/static/populate.html#POPULATE-RM-INDEXES" rel="nofollow">https://www.postgresql.org/docs/10/static/populate.html#POPU...</a> ]>Gordon’s code handles this task using rarely utilized, low-level tools in Django’s database manager.-----Sadly, in the current day and age, a developer actually taking the time to RTFM may indeed qualify as "pioneering" work!Perhaps the rest of us need to start trumpeting our accomplishments when we find some clearly-stated performance gain in the manual, rather than hiding our heads in embarrassment for not finding out until we released version 2.2 of our mass DB import tool.

评论 #16236341 未加载

评论 #16236395 未加载

rosserover 7 years ago

Do you have any numbers on how much extra time (that is, time spent servicing queries above the normal query times when your tables are indexed) application queries take after the loads, but before the index rebuilds are complete?If so, how does that compare, in aggregate, to the time saved in the loads?Or are you simply not putting the application back into service until the index rebuilds have finished? How long does that take, compared to the time saved?EDIT: I'm mostly asking these questions to nudge people to think about them in the course of trying this in their own environments. It's my day job to think about these kinds of things; I've worn the PostgreSQL DBA hat for over a decade now.

评论 #16236131 未加载

jacobkgover 7 years ago

Do you also disable autovacuum while this is running? That is another good trick for speeding up large database imports.

评论 #16236976 未加载

p2t2pover 7 years ago

N̶o̶ ̶s̶h̶i̶t̶,̶ ̶S̶h̶e̶r̶l̶o̶c̶k̶!̶ You don't say?

评论 #16236267 未加载

评论 #16236847 未加载

karlmdavisover 7 years ago

From personal experience: PostgreSQL's COPY commands aren't really all that performant, indexes or no.Our project saw SIGNIFICANTLY better performance with batched multi-threaded INSERTs. If you can run a few hundred load threads and manage the concurrency correctly (not trivial), it will chew through big loads like a monster.If I ever have the time/excuse, I want to go back and try a multi-threaded COPY. But if you need speed and have a choice between multi-threaded INSERTs or a single-threaded COPY, go with the INSERTs every time.

评论 #16237432 未加载

rtpgover 7 years ago

Why is everyone acting so pissy about the fact that someone happened to find a performance improvement trick by reading the docs? Isn't this what's supposed to happen?None of you read all the performance "tricks" to Postgres before writing your first SQL statement.Every day, somebody's born who doesn't know how to boos the performance of COPY by dropping indexes.

评论 #16237726 未加载

beached_whaleover 7 years ago

It looks like the article is directed at those doing bulk uploads to their system and not at developers necessarily. This may not be universally intuitive to people outside of Database admins and developers.

Theodoresover 7 years ago

They should have stuck with bog-standard mySQL to get this time saving for free - if you restore a SQL dump created with phpMyAdmin or mysqldump then all the disable index commands are already in there, good to go, and in SQL.Whoever wrote the Django bit didn't really do a good job on the defaults.

评论 #16236301 未加载

keredsonover 7 years ago

yep

j_sover 7 years ago

Yikes <a href="https://www.xkcd.com/1053" rel="nofollow">https://www.xkcd.com/1053</a>

12 comments

Johnny555over 7 years ago

评论 #16236833 未加载

评论 #16238484 未加载

评论 #16238139 未加载

drblastover 7 years ago

评论 #16236838 未加载

评论 #16236367 未加载

评论 #16236781 未加载

评论 #16237472 未加载

cookiecaperover 7 years ago

评论 #16236341 未加载

评论 #16236395 未加载

rosserover 7 years ago

评论 #16236131 未加载

jacobkgover 7 years ago

Do you also disable autovacuum while this is running? That is another good trick for speeding up large database imports.

评论 #16236976 未加载

p2t2pover 7 years ago

N̶o̶ ̶s̶h̶i̶t̶,̶ ̶S̶h̶e̶r̶l̶o̶c̶k̶!̶ You don't say?

评论 #16236267 未加载

评论 #16236847 未加载

karlmdavisover 7 years ago

评论 #16237432 未加载

rtpgover 7 years ago

评论 #16237726 未加载

beached_whaleover 7 years ago

Theodoresover 7 years ago

评论 #16236301 未加载

keredsonover 7 years ago

yep

j_sover 7 years ago

Yikes <a href="https://www.xkcd.com/1053" rel="nofollow">https://www.xkcd.com/1053</a>

Boosting the performance of PostgreSQL’s COPY command by dropping indexes

12 comments

Boosting the performance of PostgreSQL’s COPY command by dropping indexes

12 comments