Advantages of Monorepos (2015)

183 点作者 Naac大约 3 年前

30 条评论

lisper大约 3 年前

It's very simple: with a monorepo you always have access to everything you need, together with a ton of stuff you don't. Whether or not this is advantageous boils down to whether the cost of not having access to something you need is greater than the cost of having access to a bunch of stuff you don't. As long as your system is reasonably efficient at letting you select small subsets of everything you could potentially have access to, the cost of having access to a bunch of stuff you don't need is essentially zero. Perforce is good at that. Git isn't. So people who use Perforce tend to think that monorepos are good and people who use git don't. And they're both right.

评论 #30947960 未加载

评论 #30946741 未加载

评论 #30947169 未加载

评论 #30951294 未加载

评论 #31044384 未加载

评论 #30946386 未加载

captainmuon大约 3 年前

One upside of smaller repos that I rarely hear about is that it forces you to think about versioning. If you have a monorepo, you often don't version individual components, you just have master that always builds. If your product is a user facing website, that is fine. But if you make releases, and have multiple components in different versions that have a stable API, and are expected to work in different combinations, then it is a real hassle. Of course you can tag individual library versions in a monorepo, but that is not the way of least resistance.One place I've worked at migrated to a monorepo, the ATLAS experiment at CERN. It was not bad, although there were the usual problems with long checkout time. But it worked because we tended to version every single piece of software together in a big "release" anyway (to make scientific results reproducable).

评论 #30948011 未加载

评论 #30947317 未加载

评论 #30947293 未加载

评论 #30952282 未加载

bob1029大约 3 年前

We've been doing this for a few years now. Biggest non-intentional thing that came out of it was that the entire team started speaking in terms of commit hashes.Once a non-technical person learns that the entire state of a product/project/organization can be described by a hash, they will begin to abuse it for literally everything. And, I totally endorse this. Its incredible to watch unfold. An employee passively noting the current commit hash like its the time of day puts a bit of joy into my brain every time.Everyone can speak this language. The semantics are ridiculously simple.

评论 #30951744 未加载

评论 #30951197 未加载

akshayshah大约 3 年前

I hope that Amazon open-sources Brazil and the surrounding version set ecosystem someday. They're the only large company I know of that uses individual project repos at scale, and they've built tools that solve many of these problems. (I've never worked there, so I don't know how loved those tools are internally.)Edit: I worked at Microsoft, which also uses tons of tiny repos (at least within Azure). I didn't encounter any good cross-repo management tools, though; apart from having a Jira-like ticketing system built in, Azure DevOps seemed quite a bit worse than GitHub.

评论 #30948469 未加载

lliamander大约 3 年前

All these advantages really come down to making it easier to manage tightly coupled systems. That's great that the monorepo approach used by large tech companies with whole departments devoted to developer tooling can make that work.However, I think the "polyrepo" response to most of these advantages would be to focus on decoupling your systems instead.Take for instance:> With a monorepo, you just refactor the API and all of its callers in one commit. That's not always trivial, but it's much easier than it would be with lots of small repos. I've seen APIs with thousands of usages across hundreds of projects get refactored and with a monorepo setup it's so easy that it's no one even thinks twice.Like, that's really cool you can do that. But why are doing that?! Why are you breaking your API contract and forcing all of your clients to change all at once?Of course, proper decoupling also requires good engineering. A polyrepo environment can still get horribly tangled, but the natural response to all of these tangling problems in a polyrepo is to move in a direction of looser coupling.

评论 #30952623 未加载

oceanplexian大约 3 年前

I know some of the FAANGs do monorepo (Google being the biggest) but AWS does not.A monorepo is an organizational mess when trying to manage and transfer ownership across thousands of teams, contain the blast radius of changes, unless you invest a ton of resources into proprietary tooling that requires a bunch of maintenance, since all the open source solutions are terrible at this and the whole data model is built around splitting out individual project repositories. And then after all that effort, why wouldn’t you just use tooling the way it was intended, and the way it’s used in the open source model, so you can partition your CI/CD without a bunch of hacks, and don’t run into bizarre scaling issues with your VCS.It perplexes me people advocate for this strategy. All I can think is it’s another one of those cargo-cult ideas that everyone is doing because Google did it (So it must be good).

评论 #30946963 未加载

评论 #30946947 未加载

评论 #30946879 未加载

评论 #30946940 未加载

评论 #30946841 未加载

pbiggar大约 3 年前

Monorepos are also great for small monorepos with just a few projects. The darklang monorepo [1] has a devcontainer that installs all the build tools for 4 projects which create 21 different services, using 6 languages, and building everything is one step.In fact, it makes it so easy to add new stuff that I didn't even realize we had 21 services til I counted. My first guess was 12.[1] <a href="https://github.com/darklang/dark" rel="nofollow">https://github.com/darklang/dark</a>

codenesium大约 3 年前

Having been down the route of repos for every service I would always choose monorepo in the future. I could see separate repos for libraries. There is just too much overhead trying to manage multiple repos. With a single repo it's possible to build a package that represents all of your software vs being forced to version everything. Tasks almost always touch multiple services unless you are so big you have a team per service.

评论 #30949359 未加载

hardwaregeek大约 3 年前

I agree that monorepos are great if you're using version control systems in their current state. But I can't help but wonder if it's a question of monorepos being good, or version control/tooling inhibiting other options. If you had a VC tool that could compose repositories with ease, that could understand multiple histories and allow for atomic commits across repos, perhaps monorepos wouldn't be the best? Or you could keep the monorepo, but allow a "lens" into a specific subsection.Even with Dan's point about monorepos making tooling easier, if a VC tool had a good API, perhaps this point would be moot. Why is it hard to query files and repository dependencies? Should there be some way to model dependencies in your version control system? It'd be interesting to see someone tackle these problems in version control.

评论 #30947100 未加载

benreesman大约 3 年前

Dan is diplomatic to a fault. Splitting repos on boundaries that aren’t necessary because of access control, legal obligation, or infrastructure constraint is for people who have nothing better to do.All the big shops have multiple repositories. They all broke each one out grudgingly and under some kind of pressure.

评论 #30946524 未加载

评论 #30951499 未加载

honkycat大约 3 年前

The thing about monorepos is similar to the the thing about micro-services: they require a lot of tooling and discipline and documentation that most organizations do not have.On our multi-repos I have consistently seen dozens, if not hundreds, of stale pull requests and branches and issues piling up never to be merged. This compounds with a monorepo.Additionally, how do you avoid doing pointless builds when new features are pushed? I can only imagine what the `.github` folder in a monorepo looks like.For me it is similar to the "one large file" argument, and why I don't agree: obfuscation is bad, but information hiding is GOOD. When I open a file, I want the information relevant to the current domain I am working in, not all of the information all at once.Similarly, when I open a github page, I want its issues, pull requests, branches, and wiki to represent the state of a single project. The one I am currently interested in. You lose this with a monorepo.You can argue "well tooling can..." yes tooling that does not exist and that I do not want to implement. Similar to the "one large file" argument, editors are set up to manage many different files with tabs. You COULD just compile the code and navigate symbols, but that isn't the world we currently live in.

评论 #30947797 未加载

jsnell大约 3 年前

(2015)-ish. Significant previous discussions:<a href="https://news.ycombinator.com/item?id=9562923" rel="nofollow">https://news.ycombinator.com/item?id=9562923</a><a href="https://news.ycombinator.com/item?id=16362345" rel="nofollow">https://news.ycombinator.com/item?id=16362345</a>

trollied大约 3 年前

> With a monorepo, projects can be organized and grouped together in whatever way you find to be most logically consistent, and not just because your version control system forces you to organize things in a particular way. Using a single repo also reduces overhead from managing dependencies.I don't actually understand this. You can do this with git submodules. It's just a directory structure. Can somebody please explain? If the problem is committing to multiple things at the same time for a point-in-time release, then the answer is tags. Rather than terabytes of git history for a gigantic organisation that has many unrelated projects.A good example for you: Google releases the Google Ad Manager API externally periodically, with dated releases. How does having that in a huge monorepo make sense?

评论 #30949858 未加载

jkaptur大约 3 年前

> the downsides are already widely discussed.Does anyone have any useful pointers? I'm in such total agreement with the article that I actually don't know the counterarguments.

评论 #30947043 未加载

评论 #30947460 未加载

评论 #30946520 未加载

评论 #30946495 未加载

denimnerd42大约 3 年前

git seems like the wrong tool for monorepos so what is used instead if you can't immediately just build your own tools

评论 #30946872 未加载

评论 #30948191 未加载

Naac大约 3 年前

I think its worth calling out that there are different types of monorepos.For example, I've worked in a monorepo that was one giant binary, but I've also worked in a monorepo that was a single repo that contained 4 ish independent services ( but were all in a single git repo ).

no_wizard大约 3 年前

I'm a big fan of monorepos. If they get too unwieldy or you need VCS granular permissions, you should use Perforce over git, but using either git or Perforce generally speaking I think works fine for monorepos. The tool has come such a long way from even 10 years ago, especially for front end codebases, but even for things like Rust the story is really strong.It comes down to how efficient you can be with tooling. Thats the one thing that monorepos really do require, is a good upfront investment in tooling, and long term maintenance. However I've found the initial setup "cost" of setting up a complex monorepo with correct tooling is far outweighed by the simplified operative overhead of working inside it.

atx42大约 3 年前

Our team is unique at our company, having a "monorepo" with 9 components versus the standard 1 component / 1 repo that other teams use. With maven, we can use one command to build any one or all components. If we split, we'd tell Jenkins how to build everything, but would say goodbye to simple local builds. Without introducing some more technology or complexity and likely specifying how the build works in two different places, I didn't see a good solution to this.I mention this here, as maybe I'm missing some obvious solution.

paulvnickerson大约 3 年前

How do you address the blast radius problem with monorepos? For instance, I want to have a single gitlab repo for postgresql clusters. Using jsonnet, I deploy and configure a cluster for each customer, and adding a new cluster is as easy as adding a config file.However, my colleague explained that it's a bad idea because any config changes or accidental button presses on gitlab's ci/cd page can bring down or wipe out everybody's cluster. How can that problem be mitigated? It seems intrinsic to monorepo style.

评论 #30947396 未加载

Thaxll大约 3 年前

How do you manage versions / tags with monorepo? If you need to tag something ( a lib ) everyone gets the same, the entire repo now has a tag v0.0.1 eventhough only your library changed.

评论 #30947380 未加载

评论 #30947256 未加载

评论 #30947745 未加载

评论 #30947238 未加载

pbalau大约 3 年前

For multi repo I will need to build automation to manage all the repos and enforce a consistent experience across them, including syncing the repos, if we end up using stuff like submodules. And I need to do this now. We tried to "trust" every repo owner to do the right thing, but it was a cluster fuck.With monorepo, I had to set up things once and go on my merry way. And I will be able to kick the monorepo-is-too-slow-can down the road for a few years from now.

trasz大约 3 年前

Monorepo is one of the features I really like in FreeBSD. It makes adding functionality that goes across layers - eg adding a syscall implementation, its manual page, libc stub, and making use of it in some userspace component - trivial, compared to the hurdles necessary in the Linux world, where you'd need to interact with kernel folks, libc folks, some random userspace project folks, and then wait until it goes into distributions.

wjmao88大约 3 年前

Its Conway's Law, Your code organization is a reflection of your engineering teams organization.The number of repos you have should roughly be equal to how many autonomous engineering "groups" you can divide into that work largely independent of other groups. Anything a group touches should probably be in the same repo as everything else that the same group touches.

Maksadbek大约 3 年前

We use git with monorepos. The codebase is so large that git status command takes about 3-6 secs. Do you also use git with monorepos ?

switch33大约 3 年前

Large repos make sense or don't make sense based on companies that work with large data or not based on predicate calculus and derivatives usually dealing with repos as well as stories and have more problems with ssds too.There is lots of problems associated with ssds as well as large monorepos. There are more complicated than people realize but if you did google code jam it teaches them somewhat but needs to be explained too. There problem is stories sort of intersect with programming too. Clockwork with ssds needs to be reworked for google code jams. The problem is elixir sort of works with stories and programming. Predicate calculus and proof theories sort of are the only way programming will really make sense in a world full of ssds. Leveldb could be a more interesting problem for google code jams if it has some newer features too. Conflict resolution is tower of hanoi and that has problems with consensus algorithms and concat too.SSDs need to do derivatives for pieceing and parting software too and that is more interesting too.

liminal大约 3 年前

Any suggestions for how to go from multiple Git repos to a monorepo? Preserving history would be really nice. I've looked at submodules and subtrees and both seem to have huge downsides and don't deliver the same benefits of a true monorepo.

评论 #30946385 未加载

评论 #30946413 未加载

评论 #30947973 未加载

评论 #30946304 未加载

评论 #30946371 未加载

MichaelMoser123大约 3 年前

one problem with multiple repos: you may end up with multiple binary components, like shared libraries, static libraries, etc, where each binary is produced from the sources of a separate repository. Now it may turn out to be a bit tricky to track a given binary found in a deployment to its sources. (on the JVM you could partially get by without the sources, as you have good decompilers)I have never worked with mono repos, but I guess that this task would be somewhat easier, given that all sources are under a single repository.

dqpb大约 3 年前

Use a monorepo, but organize your code as if it will someday be split into many repos.

88913527大约 3 年前

In my experience, the developer experience for juniors is too much. Yarn + Lerna is just too much of a learning curve. However, having one repo and on CICD pipeline is convenient. But we've decided to divest from them. Your situation may not match mine, and that's okay.

评论 #30946225 未加载

评论 #30946406 未加载

exfascist大约 3 年前

I'd argue that the optimal configuration is really a compromise; use sub modules with a dvcs tool like git. You get the organizational benefits of monorepos with the isolation benefits of individual repos. Your branches go stale in weeks rather than days, cloning even with full history can be very fast, and you don't need to learn new tools when you change organizations.