Why Google Stores Billions of Lines of Code in a Single Repository (2016)

478 点作者 bwag将近 7 年前

38 条评论

hobls将近 7 年前

I feel terrible for anyone who sees this and thinks, “ah! I should move to a monorepo!” I’ve seen it several times, and the thing they all seem to overlook is that Google has THOUSANDS of hours of effort put into the tooling for their monorepo. Slapping lots of projects into a single git repo without investing in tooling will not be a pleasant experience.

评论 #17606933 未加载

评论 #17606573 未加载

评论 #17606236 未加载

评论 #17606031 未加载

评论 #17607121 未加载

评论 #17605964 未加载

评论 #17606282 未加载

评论 #17607921 未加载

评论 #17605971 未加载

评论 #17606160 未加载

评论 #17605976 未加载

评论 #17606028 未加载

评论 #17610406 未加载

评论 #17606460 未加载

评论 #17606836 未加载

评论 #17606552 未加载

评论 #17607236 未加载

评论 #17607014 未加载

评论 #17607849 未加载

评论 #17606947 未加载

评论 #17606295 未加载

jgibson将近 7 年前

Is it just me, or are a lot of people here conflating source control management and dependency management? The two don't have to be combined. For example, if you have Python Project X that depends on Python Project Y, you can either have them A) in different scm repos, with a requirements.txt link to a server that hosts the wheel artifact, B) have them in the same repo and refer to each other from source, or C) have them in the same repository, but still have Project X list its dependency of project Y in a requirements.txt file at a particular version. With the last option, you get the benefit of mono-repo tooling (easier search, versioning, etc) but you can control your own dependencies if you want.edit: I do have one question though, does googles internal tool handle permissions on a granular basis?

评论 #17607758 未加载

评论 #17607728 未加载

评论 #17607682 未加载

评论 #17608129 未加载

评论 #17607661 未加载

评论 #17608921 未加载

评论 #17609165 未加载

评论 #17608061 未加载

senozhatsky将近 7 年前

Well, it's not so uncommon. For instance, OpenBSD, NetBSD repos are sort of monolithic. And, believe it or not, there are some advantages. For instance, let's take a look at OpenBSD 5.5 [0] release notes:> OpenBSD is year 2038 ready and will run well> beyond Tue Jan 19 03:14:07 2038 UTCOpenBSD 5.5 was released on May 1, 2014. While Linux is still "not quite there yet" y2038-wise. y2038 is a very complex issue, while it may look simple - time_t and clock_t should be 64-bit. This requires changes both on the kernel -- new sys-calls interfaces [stat()], new structures layouts [struct stat], new sizeof()-s, etc. -- and the user space sides. This, basically, means ABI breakage: newer kernels will not be able to run older user space binaries. So how did OpenBSD handle that? The reason why y2038 problem looked so simple to OpenBSD was a "monolithic repository". It's a self-contained system, with the kernel and user space built together out of a single repository. OpenBSD folks changed both user space and kernel space in "one shot".IOW, a monolithic repository makes some things easier:a) make a dramatic change to Ab) rebuild the worldc) see what's broken, patch itd) while there are regressions or build breakages, goto (b)e) commit everything[0] <a href="http://www.openbsd.org/55.html?hn" rel="nofollow">http://www.openbsd.org/55.html?hn</a>[UPDATE: fixed spelling errors... umm, some of them]-ss

评论 #17607001 未加载

评论 #17607356 未加载

ChrisCinelli将近 7 年前

Managing dependencies and versions across repos is a pain. Refactoring across repos is quite hard when your code spreads across repos considering the tree of dependencies.Unfortunately Git checkout all the code, including history, at once and it does not scale to big codebases.The approach that Facebook chose with Mercurial seems a good compromise ( <a href="https://code.fb.com/core-data/scaling-mercurial-at-facebook/" rel="nofollow">https://code.fb.com/core-data/scaling-mercurial-at-facebook/</a> )

评论 #17605778 未加载

评论 #17605758 未加载

评论 #17606002 未加载

评论 #17605941 未加载

评论 #17605757 未加载

whack将近 7 年前

Maybe I'm not cool enough to understand this, but I don't see the draw for monorepos. Imagine if you're a tool owner, and you want to make a change that presents significant improvements for 99.9% of people, but causes significant problems for 0.1% of your users. In a versioned world, you can release your change as a new version, and allow your users to self-select if/when/how they want to migrate to the new version. But in a monorepo, you have to either trample over the 0.1%, or let the 0.1% hold everyone else hostage.Conversely, imagine if you're using some tools developed by a far off team within the company. Every time the tooling team decides to make a change, it will immediately and irrevocably propagate into your stack, whether you like it or not.If you were at a startup and had a production critical project, would you hardcode specific versions for all your dependencies, and carefully test everything before moving to newer versions? Or would you just set everything to LATEST and hope that none of your dependencies decide to break you the next day? Working with a monorepo is essentially like the latter.

评论 #17606181 未加载

评论 #17606572 未加载

评论 #17606147 未加载

评论 #17606187 未加载

评论 #17606213 未加载

评论 #17606157 未加载

makecheck将近 7 年前

This is clearly detrimental to external projects such as Go packaging, since their own developers will never be looking at dependency problems in the same way as outside groups.Monorepo also bugs me because there will always be some external package you need, and invariably it’s almost impossible to integrate due to years of colleagues making internal-only things assume everything imaginable about the structure and behavior of the monorepo. There will be problems not handled, etc. and it leads to a lot of NIH development because it’s almost easier in the end.Also, it just feels risky from an engineering perspective: if your repository or tools have any upper limits, it seems like you will inevitably find them with a humongous repo. And that will be Break The Company Day because your entire process is essentially set up for monorepo and no one will have any idea how to work without it.

评论 #17606943 未加载

评论 #17607186 未加载

tzhenghao将近 7 年前

Having worked at different companies adopting both monorepo and the multiple repos approach, I find monorepo a better normalizer at scale in consolidating all "software" that runs the company.Just like what many commenters here have mentioned, the monorepo approach is a forcing function on keeping compatibility issues at bay.What you don't want is to end up in a situation where teams reinvent their own wheels instead of building on top of existing code, and at scale, I think the multiple repo approach tends to breed such codebase smell. [1] I'm sure 8000 repos is living hell for most organizations.[1] - <a href="https://www.youtube.com/watch?v=kb-m2fasdDY" rel="nofollow">https://www.youtube.com/watch?v=kb-m2fasdDY</a>

评论 #17607369 未加载

mlthoughts2018将近 7 年前

One of my former managers had worked a long time at Google and was present for the advent of Google’s in-house tooling developed around their monorepo.His account was that it was basically accidental, at first resulting from short term fire drills, and then creating a snowball effect where the momentum of keeping things in the Perforce monorepo and building tooling around it just happened to be the local optimum, and nobody was interested in slowing down or assessing a better way.He personally thought working with the monorepo was horrible, and in the company where I worked with him, we had dozens of isolated project repos in Git, and used packaging to deploy dependencies. His view, at least, was that the development experience and reliability of this approach was vastly better than Google’s approach, which practically required hiring amazing candidates just to have a hope of a smooth development experience for everyone else.I laugh cynically to myself about this any time I ever hear anyone comment as if Google’s monorepo or tooling are models of success. It was an accidental, path-dependent kludge on top of Perforce, and there is really no reason to believe it’s a good idea, certainly not the mere fact that Google uses this approach.

评论 #17607391 未加载

haglin将近 7 年前

Google's handling of their source code makes me wanna work there.I don't like distributed version control systems with hundreds of repositories spread out. It makes management more complicated. I understand this is a minority view, but that is my experience. It was easier to work in a single Perforce repository than hundreds of Git or Mercurial repos.

评论 #17606608 未加载

a-dub将近 7 年前

It should be noted that the monolithic model is somewhat encouraged by the client mapping system in Perforce, which was Google's first version control system so it is unclear to me if this was deliberate or just a side effect of the best VCS of the time.I also still have doubts around the value of a monorepo, in the article they claim it's valuable because you get:Unified versioning, one source of truth;Extensive code sharing and reuse;Simplified dependency management;Atomic changes;Large-scale refactoring;Collaboration across teams;Flexible team boundaries and code ownership; andCode visibility and clear tree structure providing implicit team namespacing.With the exception of the niceness of atomic changes for large scale refactoring, I don't really see how the rest are better supported by throwing everything into one, rather than having a bunch of little repos and a little custom tooling to keep them in sync.

评论 #17606496 未加载

techbio将近 7 年前

Previous thread:<a href="https://news.ycombinator.com/item?id=11991479" rel="nofollow">https://news.ycombinator.com/item?id=11991479</a>

评论 #17605740 未加载

ridiculous_fish将近 7 年前

> Google's monolithic software repository, which is used by 95% of its software developers worldwide, meets the definition of an ultra-large-scale4 system, providing evidence the single-source repository model can be scaled successfullyThis 95% number is the most surprising part of the article. That implies that the sum of engineers working on Android + Chrome + ChromeOS + all the Google X stuff + long tail of smaller non-google3 projects (Chromecast, etc) constitute only 5% of their engineers. Is e.g. Android really that small?

评论 #17605834 未加载

评论 #17605884 未加载

评论 #17605844 未加载

评论 #17607663 未加载

评论 #17605911 未加载

stevesimmons将近 7 年前

My company has a 50m LOC Python codebase in a monorepo. It works really well, given the rate of change of thousands of developers globally. That is only possible because of the significant investment in devtools, testing and the deployment infrastructure.Here is "Python at Massive Scale", my talk about it at PyData London earlier this year:<a href="https://youtu.be/ZYD9yyMh9Hk" rel="nofollow">https://youtu.be/ZYD9yyMh9Hk</a>

timkrueger将近 7 年前

We work with an monorepo since Septemeber 2017. I wrote about the migration:<a href="https://timkrueger.me/a-maven-git-monorepo/" rel="nofollow">https://timkrueger.me/a-maven-git-monorepo/</a>Our developers like it, because they can use 'mkdir' to create a new component, search threw the complete codebase with 'grep' and navigate with 'cd'.

jamesmiller5将近 7 年前

I wish more developers knew of the wonderful "repo" tool[0] developed by the Android devs which allows a monorepo _perspective_ of many git repositories. Breakdown of the repo tool and example manifest files <a href="http://blog.udinic.com/2014/05/24/aosp-part-1-get-the-code-using-the-manifest-and-repo/" rel="nofollow">http://blog.udinic.com/2014/05/24/aosp-part-1-get-the-code-u...</a>[0] <a href="https://source.android.com/setup/develop/repo" rel="nofollow">https://source.android.com/setup/develop/repo</a>

wrayjustin将近 7 年前

> includes approximately one billion files...> including approximately two billion lines of code_also_> in nine million unique source filesI should insert a joke about how well the system would do if each source file contained more than two lines of code.But seriously, this summary could use some work.

评论 #17606963 未加载

tsycho将近 7 年前

It's not just devops that you need to pull off a large monorepo; the other big thing is a strong testing culture. You have to be able to rely on unit tests from across the code base being a sufficient indicator of whether your commit is good. AND a presubmit process that can compute which parts of the monorepo get affected by your diff, and run tests against them automatically before committing your diff.Google not only has the above but also has a strong pre-submission code review process which catches large classes of bugs in advance.

malkia将近 7 年前

Here is the video (with Rachel Potvin), predating the article by some months: <a href="https://www.youtube.com/watch?v=W71BTkUbdqE" rel="nofollow">https://www.youtube.com/watch?v=W71BTkUbdqE</a>

vbezhenar将近 7 年前

I've used monorepo for few small related projects and it worked just fine for me. Much easier to make related changes across several projects.

joe_fishfish将近 7 年前

This is probably a stupid question, but I couldn't find an answer. Does this mean Google keeps all of its different products in all their different languages and environments in one repo? So like, Android lives in the same repo as Gmail, which is the same repo as all the Waymo code and the Google search engine code as well? That seems insane to me.

评论 #17635183 未加载

评论 #17608381 未加载

评论 #17611333 未加载

paulddraper将近 7 年前

Version controlled repositories are like business offices.You can have your entire company in one location, or the entire company in separate locations. The most important thing is the logical rather than physical organization: team structure, executive leadership, inter-org dependencies, etc. You can achieve autonomy and good structure with or without separate locations.A single location reduces barriers, but at some point multiple locations can solve physical and logistical challenges. General rule of thumb is to own and operate office space in a few locations as possible, but at some point you have to take drastic measures one way or another.(Notice that Google had to invent their own proprietary version control system just for their monorepo. And not even Google actually uses a single repo as the source of truth: e.g. Chromium and Android.)

paulie_a将近 7 年前

Im sure properly organized it's okay, but from what I've seen it's mediocre at best, especially with legacy/technical debt it's a huge mistake.Start breaking that repo apart, because it probably isn't very/hopefully depending on the debt that exists.

评论 #17607098 未加载

the_arun将近 7 年前

Seems like Google uses its own custom Source Control & tools - <a href="https://www.quora.com/What-version-control-system-does-Google-use-and-why" rel="nofollow">https://www.quora.com/What-version-control-system-does-Googl...</a>.

carapace将近 7 年前

<a href="https://en.wikipedia.org/wiki/Conway%27s_law" rel="nofollow">https://en.wikipedia.org/wiki/Conway%27s_law</a>> "organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations."Interestingly, in light of the above adage, this massive repo is organized (if that's the word for it) like a bazaar or flea market. (Rather than like a phone book <a href="https://en.wikipedia.org/wiki/Yellow_pages" rel="nofollow">https://en.wikipedia.org/wiki/Yellow_pages</a> )

alexeiz将近 7 年前

> Trunk-based development. ... is beneficial in part because it avoids the painful merges that often occur when it is time to reconcile long-lived branches. Development on branches is unusual and not well supported at Google, though branches are typically used for releases.This sounds like the SVN model to me where branches are cumbersome and therefore they are very rare. After getting used to the Git branching model where branches are free and merges are painless, it would be very hard to go back to the old development model without branches.

jbergknoff将近 7 年前

How does CI work with a monorepo? Do you always have to run all the tests and build all the artifacts? Or are there nice ways to say "just build this part of the repo"?

评论 #17605851 未加载

评论 #17605868 未加载

评论 #17605846 未加载

评论 #17605848 未加载

nicodjimenez将近 7 年前

I have slight experience with both monorepos and smaller repos and I think they can both work. The advantage of smaller repos is that it forces different components to expose well designed API's. Bigger repos make sense for products and embedded software, smaller repos make sense for platforms build up of small services communicating on the internet.

评论 #17606645 未加载

jorblumesea将近 7 年前

Is this really relevant for anyone except for "google scale" companies? For most teams, managing 30-40 services backed by git repos isn't a huge task and doesn't cause many problems.Is there mature tooling that helps teams manage this, or is this proprietary google magic tooling?

评论 #17606013 未加载

testcross将近 7 年前

I don't understand why gitlab/github/bitbucket don't provide better tools for monorepo. This is a topic pretty trendy. But there is absolutely no tools helping with control access, good ci, ...

评论 #17606483 未加载

IloveHN84将近 7 年前

The giant monorepo works only if you're using SVN, with Git it would be tremendous

评论 #17608760 未加载

axaxs将近 7 年前

Sorry, but as someone who has been in orgs that do both, mono repo is a mistake. Constant needs to pull unrelated changes before pushing, pipelines requiring to grab the whole repo for dependencies, etc. I understand the arguments for mono repo, but never think it's nothing that outweighs the cons.

评论 #17607197 未加载

prepend将近 7 年前

I love these articles. Is there a wiki or collection of detailed descriptions of large company tech practices that isn’t marketing blargh.I read years ago about Google data ingest, locator process but neglected to bookmark so now can’t find the reference.

评论 #17606355 未加载

gervase将近 7 年前

Should probably have a [2016] tag.

评论 #17605849 未加载

emmelaich将近 7 年前

(2016)

tflinton将近 7 年前

A repo including configuration and data.How about we stop considering google an engineering leader and just a search leader?

tflinton将近 7 年前

A repo with configuration, secrets and data?Can we stop considering google an engineering leader and just a search algorithm leader?

curtis将近 7 年前

I think monorepos make a lot of sense when you're talking about millions of lines of code. I'm not at all sure they make sense when you're talking about billions.

评论 #17605718 未加载

评论 #17605979 未加载

fizixer将近 7 年前

I don't care about that. For me this is incomprehensible:Why the eff does Google have billions of lines of code in their repo?I hope they are not counting revisions (e.g., if a single 1 million project has 100 revisions, that's 1 million, not 100 million).I have heard that they do count generated code (so it's not all handwritten code). In that case again, I have two things to say:- that's a bad metric. I could overnight generate a billion lines of code with each line a printf of number_to_word of numbers from 1 to a billion. They want to measure the size of the repo? They should tell us the gigabytes, terabytes etc. But when it's lines of code, it's cheezy and childish to blow up the measure by including lines of generated code.- But more importantly, I hope the generated code is 90% or more of that repository. Because any less than that would mean that Google engineers have handwritten 100 million or more lines of code through out the lifetime of the company, in which case I have to ask: what bloated mess do you have on your hands? I thought you guys were the top engineers of the world.

38 条评论

hobls将近 7 年前

评论 #17606933 未加载

评论 #17606573 未加载

评论 #17606236 未加载

评论 #17606031 未加载

评论 #17607121 未加载

评论 #17605964 未加载

评论 #17606282 未加载

评论 #17607921 未加载

评论 #17605971 未加载

评论 #17606160 未加载

评论 #17605976 未加载

评论 #17606028 未加载

评论 #17610406 未加载

评论 #17606460 未加载

评论 #17606836 未加载

评论 #17606552 未加载

评论 #17607236 未加载

评论 #17607014 未加载

评论 #17607849 未加载

评论 #17606947 未加载

评论 #17606295 未加载

jgibson将近 7 年前

评论 #17607758 未加载

评论 #17607728 未加载

评论 #17607682 未加载

评论 #17608129 未加载

评论 #17607661 未加载

评论 #17608921 未加载

评论 #17609165 未加载

评论 #17608061 未加载

senozhatsky将近 7 年前

评论 #17607001 未加载

评论 #17607356 未加载

ChrisCinelli将近 7 年前

评论 #17605778 未加载

评论 #17605758 未加载

评论 #17606002 未加载

评论 #17605941 未加载

评论 #17605757 未加载

whack将近 7 年前

评论 #17606181 未加载

评论 #17606572 未加载

评论 #17606147 未加载

评论 #17606187 未加载

评论 #17606213 未加载

评论 #17606157 未加载

makecheck将近 7 年前

评论 #17606943 未加载

评论 #17607186 未加载

tzhenghao将近 7 年前

评论 #17607369 未加载

mlthoughts2018将近 7 年前

评论 #17607391 未加载

haglin将近 7 年前

评论 #17606608 未加载

a-dub将近 7 年前

评论 #17606496 未加载

techbio将近 7 年前

Previous thread:<a href="https://news.ycombinator.com/item?id=11991479" rel="nofollow">https://news.ycombinator.com/item?id=11991479</a>

评论 #17605740 未加载

ridiculous_fish将近 7 年前

评论 #17605834 未加载

评论 #17605884 未加载

评论 #17605844 未加载

评论 #17607663 未加载

评论 #17605911 未加载

stevesimmons将近 7 年前

timkrueger将近 7 年前

jamesmiller5将近 7 年前

wrayjustin将近 7 年前

评论 #17606963 未加载

tsycho将近 7 年前

malkia将近 7 年前

Here is the video (with Rachel Potvin), predating the article by some months: <a href="https://www.youtube.com/watch?v=W71BTkUbdqE" rel="nofollow">https://www.youtube.com/watch?v=W71BTkUbdqE</a>

vbezhenar将近 7 年前

I've used monorepo for few small related projects and it worked just fine for me. Much easier to make related changes across several projects.

joe_fishfish将近 7 年前

评论 #17635183 未加载

评论 #17608381 未加载

评论 #17611333 未加载

paulddraper将近 7 年前

paulie_a将近 7 年前

评论 #17607098 未加载

the_arun将近 7 年前

carapace将近 7 年前

alexeiz将近 7 年前

jbergknoff将近 7 年前

How does CI work with a monorepo? Do you always have to run all the tests and build all the artifacts? Or are there nice ways to say "just build this part of the repo"?

评论 #17605851 未加载

评论 #17605868 未加载

评论 #17605846 未加载

评论 #17605848 未加载

nicodjimenez将近 7 年前

评论 #17606645 未加载

jorblumesea将近 7 年前

评论 #17606013 未加载

testcross将近 7 年前

I don't understand why gitlab/github/bitbucket don't provide better tools for monorepo. This is a topic pretty trendy. But there is absolutely no tools helping with control access, good ci, ...

评论 #17606483 未加载

IloveHN84将近 7 年前

The giant monorepo works only if you're using SVN, with Git it would be tremendous

评论 #17608760 未加载

axaxs将近 7 年前

评论 #17607197 未加载

prepend将近 7 年前

评论 #17606355 未加载

gervase将近 7 年前

Should probably have a [2016] tag.

评论 #17605849 未加载

emmelaich将近 7 年前

(2016)

tflinton将近 7 年前

A repo including configuration and data.How about we stop considering google an engineering leader and just a search leader?

tflinton将近 7 年前

A repo with configuration, secrets and data?Can we stop considering google an engineering leader and just a search algorithm leader?

curtis将近 7 年前

I think monorepos make a lot of sense when you're talking about millions of lines of code. I'm not at all sure they make sense when you're talking about billions.

评论 #17605718 未加载

评论 #17605979 未加载

fizixer将近 7 年前