A one-line change decreased our build times by 99%

462 pointsby luordover 4 years ago

51 comments

JOnAgainover 4 years ago

I think it takes some real humility to post this. No doubt someone will follow up with an “of course...” or “if you don’t understand the tech you use...” comment.But thank you for this. It takes a bit of courage to point out you’ve been doing something grotesquely inefficient for years and years.

评论 #24892104 未加载

评论 #24891966 未加载

评论 #24894322 未加载

segfaultbuserrover 4 years ago

Better title: A one-line change decreased our "git clone" times by 99%.It's a bit misleading to use "build time" to describe this improvement, as it makes people think about build systems, compilers, header files, or cache. On the other hand, the alternative title is descriptive and helpful to all developers, not only just builders - people who simply need to clone a branch from a large repository can benefit from this tip as well.

评论 #24892285 未加载

评论 #24892218 未加载

dada78641over 4 years ago

This reminds me of my first programming job in 2005, working with Macromedia Flash. They had one other Flash programmer who only worked there every once in a while because he was actually studying in college, and he was working on some kind of project from hell that, among other problems, took about two minutes to build to SWF.Eventually they stopped asking him to come because he couldn't get anything done, and so I had a look at it. In the Movie Clip library of the project I found he had an empty text field somewhere that was configured to include a copy of almost the entire Unicode range, including thousands of CJK characters, so each time you built the SWF it would collect and compress numerous different scripts from different fonts as vectors for use by the program. And it wasn't even being used by anything.Once I removed that one empty text field, builds went down to about ~3 seconds.

评论 #24895455 未加载

dustedover 4 years ago

This is the most I've ever gotten out of pinterest, other than this, it's just the "wrong site that google turns up, that I can't use because it wants me to create an account just to watch the image I searched for"

评论 #24893953 未加载

评论 #24893889 未加载

评论 #24893831 未加载

评论 #24895831 未加载

评论 #24894728 未加载

评论 #24893870 未加载

评论 #24893936 未加载

评论 #24895687 未加载

评论 #24899784 未加载

mcvover 4 years ago

On my first job, 20 years ago, we used a custom Visual C framework that generated one huge .h file that connected all sorts of stuff together. Amongst other things, that .h file contained a list of 10,000 const uints, which were included in every file, and compiled in every file. Compiling that project took hours. At some point I wrote a script that changed all those const uints to #define, which cut our build time to a much more manageable half hour.Project lead called it the biggest productivity improvement in the project; now we could build over lunch instead of over the weekend.If there's a step in your build pipeline that takes an unreasonable amount of time, it's worth checking why. In my current project, the slowest part of our build pipeline is the Cypress tests. (They're also the most unreliable part.)

评论 #24894562 未加载

评论 #24894746 未加载

评论 #24895843 未加载

aidanhsover 4 years ago

I sympathise a lot with this post! Git cloning can be shockingly slow.As a personal anecdote, clones of the Rust repository in CI used to be pretty slow, and on investigating we found out that one key problem was cloning the LLVM submodule (which Rust has a fork of).In the end we put in place a hack to download the tar.gz of our LLVM repo from github and just copy it in place of the submodule, rather than cloning it. [0]Also, as a counterpoint to some other comments in this thread - it's really easy to just shrug off CI getting slower. A few minutes here and there adds up. It was only because our CI would hard-fail after 3 hours that the infra team really started digging in (on this and other things) - had we left it, I suspect we might be at around 5 hours by now! Contributors want to do their work, not investigate "what does a git clone really do".p.s. our first take on this was to have the submodules cloned and stored in the CI cache, then use the rather neat `--reference` flag [1] to grab objects from this local cache when initialising the submodule - incrementally updating the CI cache was way cheaper than recloning each time. Sadly the CI provider wasn't great at handling multi-GB caches, so we went with the approach outlined above.[0] <a href="https://github.com/rust-lang/rust/blob/1.47.0/src/ci/init_repo.sh#L50-L68" rel="nofollow">https://github.com/rust-lang/rust/blob/1.47.0/src/ci/init_re...</a>[1] <a href="https://github.com/rust-lang/rust/commit/0347ff58230af512c9521bdda7877b8bef9e9d34#diff-a14d83f2e928fc5906d026a42cb16f021b452709b88bc3fd85c63e741cbd9a42R70" rel="nofollow">https://github.com/rust-lang/rust/commit/0347ff58230af512c95...</a>

评论 #24892835 未加载

评论 #24895987 未加载

sxpover 4 years ago

> Even though we’re telling Git to do a shallow clone, to not fetch any tags, and to fetch the last 50 commits ...What is the reason for cloning 50 commits? Whenever I clone a repo off GitHub for a quick build and don't care about sending patches back, I always use --depth=1 to avoid any history or stale assets. Is there a reason to get more commits if you don't care about having a local copy of the history? Do automated build pipelines need more info?

评论 #24892536 未加载

评论 #24892369 未加载

评论 #24893779 未加载

AdamJacobMullerover 4 years ago

I expected this to be some micro-optimization of moving a thing from taking 10 seconds to 100ms.> Cloning our largest repo, Pinboard went from 40 minutes to 30 seconds.This is both very impressive as well as very disheartening. If a process in my CI was taking 40 minutes I would be investigating sooner than a 40-minute delay.I don't mean to throw shade on the pintrest engineering team, but, it speaks to an institutional complacency with things like this.I'm sure everyone was happy when the clone took 1 second.I doubt anyone noticed when the clone took 1 minute.Someone probably started to notice when the clone took 5 minutes but didn't look.Someone probably tried to fix it when the clone was taking 10 minutes and failed.I wonder what 'institutional complacencies' we have. Problems we assume are unsolvable but are actually very trivial to solve.

评论 #24892598 未加载

评论 #24893458 未加载

评论 #24892668 未加载

评论 #24895236 未加载

评论 #24893105 未加载

评论 #24896093 未加载

评论 #24893379 未加载

gpapilionover 4 years ago

I’ve found as an industry we’ve moved to more complex tools, but haven’t built the expertise in them to truly engineer solutions using them. I think lots of organizations could find major optimizations, but it requires really learning about the technology you’re utilizing.

评论 #24892230 未加载

评论 #24892276 未加载

评论 #24892093 未加载

评论 #24892126 未加载

评论 #24892679 未加载

chinhodadoover 4 years ago

When I first joined one of my previous jobs, the build process had a checkout stage where it was blowing away the git folder and checked out from scratch the whole repo every time (!). Since the build machine was reserved for that build job I simply made some changes to do git clean -dfx & git reset --hard & git checkout origin branch. It shaved off like 15 minutes of the build time, which was something like 50% of the total build time.

评论 #24892311 未加载

评论 #24892323 未加载

SamuelAdamsover 4 years ago

> In the case of Pinboard, that operation would be fetching more than 2,500 branches.Ok, I'll ask: why does a single repository have over 2,500 branches? Why not delete the ones you no longer use?

评论 #24895545 未加载

评论 #24895480 未加载

评论 #24895502 未加载

评论 #24895908 未加载

评论 #24895476 未加载

评论 #24898014 未加载

jniedrauerover 4 years ago

One of the (many) things that drives me batty about Jenkins is that there are two different ways to represent everything. These days the "declarative pipelines" style seems to be the first class citizen, but most of the documentation still shows the old way. I can't take the code in this example and compare it trivially to my pipelines because the exact same logic is represented in a completely different format. I wish they would just deprecate one or the other.

chrisweeklyover 4 years ago

I find the self-congratulatory tone in the post kind of off-putting, akin to "I saved 99% on my heating bill when I started closing doors and windows in the middle of winter."If your repos weigh in at 20GB in size, with 350k commits, subject to 60k pulls in a single day, having someone with half a devops clue take a look at what your Jenkinsfile is doing with git is not exactly rocket science or a needle in a haystack. (Here's hoping they discover branch pruning too; how many of those 2500 branches are active?)As a consultant I've seen plenty of apallingly poor workflows and practices, so this isn't all that remarkable... but for me the post seems kind of pointless.

评论 #24892460 未加载

评论 #24892532 未加载

评论 #24892996 未加载

YokoZarover 4 years ago

Can someone explain the intended meaning behind calling six different repositories "monorepos"?It sounds to me like you don't have a monorepo at all and instead have six repositories for six project areas.

评论 #24892983 未加载

muststopmythsover 4 years ago

I'm a git noob, so I'm sorry if this sounds dumb but wouldn'tgit clone --single-branchachieve the same thing (i.e, check out only the branch you want to build) ?Also, why would you not only check out one branch when doing CI ?

评论 #24900393 未加载

tracer4201over 4 years ago

I truly appreciate articles like this — it’s warming to see other companies running into the kinds of issues I’ve ran into or had to deal with, and more so that their culture openly discusses and shares these learnings with the broader community.The most effective organizations I’ve worked at built mechanisms and processes to disseminate these kinds of learnings and have regular brown bags on how a particular problem was solved or how others can apply their lessons.Keep it up Pinterest engineering folks.

uglycoyoteover 4 years ago

He says that "Pinboard has more than 350K commits and is 20GB in size when cloned fully." I'm not clear though, exactly what "cloned fully" means in context of the unoptimized/optimized situation.He says it went from 40 minutes to 30 seconds. Does this mean they found a way to grab the whole 20GB repo in 30 seconds? seems pretty darn fast to grab 20GB, but maybe on fast internal networks?Or maybe they meant that it was 20GB if you grabbed all of the many thousands of garbage branches, when Jenkins really only needed to test "master", and finding a solution that allowed them to only grab what they needed made things faster.I'm also curious about the incremental vs "cloning fully" aspect of it. Does each run of Jenkins clone the repo from scratch or does it incrementally pull into a directory where it has been cloned before? I could see how in a cloning-from-scratch situation the burden of cloning every branch that ever existed would be large, whereas incrementally I would think it wouldn't matter that much.

评论 #24903981 未加载

bluedinoover 4 years ago

My similar story goes like this: We had CRM software that let you setup user defined menu options. Someone at our organization decided to make a set of nested menu options where you could configure a product, with every possible combination being assigned a value!So if you had a large, blue second generation widget with a foo accessory and option buzz, you were value 30202, and if was the same one except red, it was 26420...Every time the CRM software started up, it cycled through the options, generated a new XML file with all the results, this took about a minute and created like a 60MB file.The fix was to basically version the XML file and the options definition file. If someone had already generated that file, just load the XML file instead of parsing and looping through the options file. Started up in 5 seconds!What was the excuse that it took so long in the first place? "The CRM software is written in Java, so it's slow."

saagarjhaover 4 years ago

Seems like there's a lot of hostility towards the title, which might be considered the engineering blog equivalent of clickbait. If the authors are around, the post was quite informative and interesting to read, but I'm sure it would have been much more palatable with a more descriptive title.But back on topic: does anyone have any insight into when git fetches things, and what it chooses to grab? It is just "when we were writing git we chose these things as being useful to have a 'please update things before running this command' implicitly run before them"? For example, git pull seems to run a fetch for you, etc.

sambeover 4 years ago

Ok, I'll ask the obvious question: why did setting the branches option to master not already do this?EDIT<a href="https://www.jenkins.io/doc/pipeline/steps/workflow-scm-step/" rel="nofollow">https://www.jenkins.io/doc/pipeline/steps/workflow-scm-step/</a> makes it sounds like the branches option specifies which branches to monitor for changes, after which all branches are fetched. This still seems like a counter-intuitive design that doesn't fit the most common cases.

jtchangover 4 years ago

This is good info. Need to check my own build pipelines now and see if we are just blindingly cloning everything or not. 40 minutes to do a clone is a pretty long time to wait though.

quickthrower2over 4 years ago

Parkinson's Law of builds. "work expands so as to fill the time available for its completion", or in this case the available time is the point at which people can't stand the build taking too long. 30-60 minutes is normal because anything > 1 minute required you to context-switch anyway, and > 60 minutes means you are now at risk of taking a day if you have a work queue of a 1-pizza team. So [1..60] range causes a grumble but nothing will be done.

nathan_f77over 4 years ago

Is there any way to do this for GitLab CI [1]? I'm using GIT_DEPTH=1, but I'm not sure how to set refspecs. It's not too important right now since it only takes about 11 seconds to clone the git repo, but maybe it's a quick win as well.[1] <a href="https://docs.gitlab.com/ee/ci/large_repositories/" rel="nofollow">https://docs.gitlab.com/ee/ci/large_repositories/</a>

评论 #24894692 未加载

评论 #24896427 未加载

cmaover 4 years ago

> For Pinboard alone, we do more than 60K git pulls on business days.Can anyone explain this? Seems ripe for another 99% improvement even with hundreds of devs.

评论 #24892761 未加载

timzamanover 4 years ago

Misleading title. They reduced their clone time by 99%. Not their build time.

评论 #24894005 未加载

inopinatusover 4 years ago

My CI servers have to build branches as well, though. A fresh clone for every build? No wonder it was slow, but even this solution seems inefficient. My preferred general solution is a persistent repository clone per build host, maintained by incremental fetch, and use git worktree add, not git clone, to checkout each build.

Dylan16807over 4 years ago

Well, good advice, and good for them, but> Cloning monorepos that have a lot of code and history is time consuming, and we need to do it frequently throughout the day in our continuous integration pipelines.No you don't!If removing per-build clones was the only way to speed things up, I'm absolutely sure you could figure out how with medium difficulty at most.

villgaxover 4 years ago

60K pulls per day for 100 commit in a day? What tests are being done that can't leverage earlier pulls?

ibainsover 4 years ago

Thus just shows how poor visibility into git is, I hope it gets better.Building a product with poor visibility and ridiculing users for not knowing internals is the worst practice in Computer Science.Hadoop did the same, and has set a record of fastest software to become legacy.Super nice to see great comments here and the nice article.

mandeepjover 4 years ago

Looks like Pinterest’s team is confused about Git Branches. These are not real full copy versions of the main branch like in SVN or TFS. A branch in Git world is simply a pointer to a specific commit in the code push history.Having said that, happy to be proven wrong, and learn about it.

评论 #24892062 未加载

Scaevolusover 4 years ago

For CI on large repos, you can do much better than this by using a persistent git cache. It takes a little finessing to destroy it if it's corrupt and avoid concurrent modifications, but it's extremely worth it.

评论 #24913359 未加载

yoz-yover 4 years ago

Because of strife with 99% claim. If the pull time took 39.9min (and thus build took 0.1min = 6sec) then a 99% decrease in pull time would result in 99% decrease of total time and you would get 30sec total time in the end. (Rounding to 0 decimal places).Not that any of this is important for the article to be interesting. In a previous job we had to fight long pull times and we quickly created a git repo for CI that would sit on a machine next to the CI server and would periodically pull from GitHub to avoid the CI to do pulls over Internet.

soulofmischiefover 4 years ago

The title is a bit of a misnomer, isn't it?> This simple one line change reduced our clone times by 99% and significantly reduced our build times as a result.Sounds like it didn't reduce build times quite by 99%.

joshribakoffover 4 years ago

Misleading title. They reduced git clone time 99%, not build times.

xatttover 4 years ago

Will this mean even more Google image search spam‽

ar7hurover 4 years ago

Alternative title: "How one line of code made our build time 100x what it should have been"

mister_hnover 4 years ago

I'm not impressed by the author of the post, since it's also something documented in the plugin, saying that you should not checkout all the branches, if not interested. The default behaviour of course is to get all of them.

JoeAltmaierover 4 years ago

So git doesn't scale well with wide, deep source histories? That's a failing of git I think, not the Engineers who may even have written that line when the source base was far less gnarly.

cjover 4 years ago

I once reduced the speed of our test suite from 10 mins to < 5 minutes by changing 2 characters in 1 line...Then bcrypt work factor! It was originally 12, reduced it to 1 (don’t worry, production is still 12)

nabarazover 4 years ago

Is it a common practice to clone the repo on every build (especially on web apps)? I just have Jenkins navigate to an app folder, run few git commands (hard reset, pull), and build (webpack).

KnobbleMcKneesover 4 years ago

The article is erroneous in many ways as others have described, but the main error I see is that it says 'git clone' is run before the fetch.It should be 'git init'

wruzaover 4 years ago

It is pinteresting that a webapp for making your image saving obsession easier to satisfy takes hundreds to thousands developer actions per day and repository sizes of tens of gigabytes.

jakub_gover 4 years ago

Semi-related for JS developers: if you do `eslint` as part of your build, make sure `node_modules` (and `node_modules` in subolders if you have monorepo-ish solution) is excluded.

andrelaszloover 4 years ago

We recently reduced our build times by 5-10% or so by changing the default bcrypt iteration count (for tests). It also felt silly once we found it.

评论 #24893948 未加载

评论 #24895087 未加载

评论 #24894000 未加载

csoursover 4 years ago

Troubleshooting CI/CD feels like troubleshooting a printer: What the hell is it doing now and why is it doing that?!

fortran77over 4 years ago

I'd rather Pinterest increased their build times by 99% so they could do less damage to search results.

leothekimover 4 years ago

“We have six main repositories at Pinterest: Pinboard, Optimus, Cosmos, Magnus, iOS, and Android. Each one is a monorepo and houses a large collection of language-specific services.”What is an “iOS monorepo” supposed to be like?

TylerEover 4 years ago

@Dang, can we get an edit?This did NOT slash build times 99%, but rather time to do a git pull.

评论 #24891962 未加载

评论 #24892031 未加载

hansdieter1337over 4 years ago

missleading title. Not the build time was decreased by 99%. Only the git checkout step was.

lerpapooover 4 years ago

tldr can i guess it was doing some extra network roundtrips or something?

smsm42over 4 years ago

TLDR: they reduced "git clone" time on their massive monorepo by making it only check out master branch when building in Jenkins.