TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Boosting the performance of PostgreSQL’s COPY command by dropping indexes

37 pointsby palewireover 7 years ago

12 comments

Johnny555over 7 years ago
Is it not common knowledge that dropping indexes improves database insert performance?<p>Of course, that&#x27;s often not an option when you you&#x27;re loading records into a live database that&#x27;s also getting queries, you usually don&#x27;t want every query to result in a full table scan.<p>This was well known 20+ years ago when I was an entry-level DBA, and I assumed it was still well known today.
评论 #16236833 未加载
评论 #16238484 未加载
评论 #16238139 未加载
drblastover 7 years ago
I must be getting old.<p>Kids, many years ago, even before jQuery, software would come with documentation that you could read and it would tell you how to use it effectively.<p>I know, crazy right? But to this day some of that old software, of which PostgreSQL is an example, still has this documentation that you can read, even before you use the software in a production system.<p>Yeah, yeah, I know Agile and Docker solved the problem of ever having to document anything, but this is the way things used to be and a few of us are stuck in our ways and still like it.
评论 #16236838 未加载
评论 #16236367 未加载
评论 #16236781 未加载
评论 #16237472 未加载
cookiecaperover 7 years ago
Wow. You&#x27;ve gotta love the audacity in making a big announcement like this based on the developer finally getting around to <i>reading the docs</i>. &quot;Drop constraints and indexes for faster imports&quot; is mass import 101.<p>The entirety of the &quot;Why We Did It&quot; section:<p>-----<p>&gt; This improvement was pioneered by James Gordon, the Coalition’s lead developer.<p>&gt; He drew instruction from PostgreSQL’s official documentation, which reads:<p>&gt;&gt; [snipping quoted sections from PostgreSQL manual at <a href="https:&#x2F;&#x2F;www.postgresql.org&#x2F;docs&#x2F;10&#x2F;static&#x2F;populate.html#POPULATE-RM-INDEXES" rel="nofollow">https:&#x2F;&#x2F;www.postgresql.org&#x2F;docs&#x2F;10&#x2F;static&#x2F;populate.html#POPU...</a> ]<p>&gt;Gordon’s code handles this task using rarely utilized, low-level tools in Django’s database manager.<p>-----<p>Sadly, in the current day and age, a developer actually taking the time to RTFM may indeed qualify as &quot;pioneering&quot; work!<p>Perhaps the rest of us need to start trumpeting our accomplishments when we find some clearly-stated performance gain in the manual, rather than hiding our heads in embarrassment for not finding out until we released version 2.2 of our mass DB import tool.
评论 #16236341 未加载
评论 #16236395 未加载
rosserover 7 years ago
Do you have any numbers on how much <i>extra</i> time (that is, time spent servicing queries above the normal query times when your tables are indexed) application queries take after the loads, but before the index rebuilds are complete?<p>If so, how does that compare, in aggregate, to the time saved in the loads?<p>Or are you simply not putting the application back into service until the index rebuilds have finished? How long does that take, compared to the time saved?<p>EDIT: I&#x27;m mostly asking these questions to nudge people to think about them in the course of trying this in their own environments. It&#x27;s my day job to think about these kinds of things; I&#x27;ve worn the PostgreSQL DBA hat for over a decade now.
评论 #16236131 未加载
jacobkgover 7 years ago
Do you also disable autovacuum while this is running? That is another good trick for speeding up large database imports.
评论 #16236976 未加载
p2t2pover 7 years ago
N̶o̶ ̶s̶h̶i̶t̶,̶ ̶S̶h̶e̶r̶l̶o̶c̶k̶!̶ You don&#x27;t say?
评论 #16236267 未加载
评论 #16236847 未加载
karlmdavisover 7 years ago
From personal experience: PostgreSQL&#x27;s COPY commands aren&#x27;t really all that performant, indexes or no.<p>Our project saw SIGNIFICANTLY better performance with batched multi-threaded INSERTs. If you can run a few hundred load threads and manage the concurrency correctly (not trivial), it will chew through big loads like a monster.<p>If I ever have the time&#x2F;excuse, I want to go back and try a multi-threaded COPY. But if you need speed and have a choice between multi-threaded INSERTs or a single-threaded COPY, go with the INSERTs every time.
评论 #16237432 未加载
rtpgover 7 years ago
Why is everyone acting so pissy about the fact that someone happened to find a performance improvement trick by reading the docs? Isn&#x27;t this what&#x27;s supposed to happen?<p>None of you read all the performance &quot;tricks&quot; to Postgres before writing your first SQL statement.<p>Every day, somebody&#x27;s born who doesn&#x27;t know how to boos the performance of COPY by dropping indexes.
评论 #16237726 未加载
beached_whaleover 7 years ago
It looks like the article is directed at those doing bulk uploads to their system and not at developers necessarily. This may not be universally intuitive to people outside of Database admins and developers.
Theodoresover 7 years ago
They should have stuck with bog-standard mySQL to get this time saving for free - if you restore a SQL dump created with phpMyAdmin or mysqldump then all the disable index commands are already in there, good to go, and in SQL.<p>Whoever wrote the Django bit didn&#x27;t really do a good job on the defaults.
评论 #16236301 未加载
keredsonover 7 years ago
yep
j_sover 7 years ago
Yikes <a href="https:&#x2F;&#x2F;www.xkcd.com&#x2F;1053" rel="nofollow">https:&#x2F;&#x2F;www.xkcd.com&#x2F;1053</a>