Monorepos and the Fallacy of Scale

104 pointsby loevborgover 6 years ago

21 comments

kenover 6 years ago

Whenever I hear smart and reasonable people argue well for both sides of an engineering issue, my experience is that it will turn out that we're arguing the wrong question. The perspective is wrong. We can't get past thinking in terms of our old terminology.What we all really want is a VCS where repos can be combined and separated easily, or where one repo can gain the benefits of a monorepo without the drawbacks of one.Another crazy tech prediction from me: just as DVCS killed off pre-DVCS practically overnight, the thing that will quickly kill off DVCS is a new type of VCS where you can trivially combine/separate repos and sections of repos as needed. You can assign, at the repo level, sub-repos to include in this one, get an atomic commit hash for the state of the whole thing, and where my VCS client doesn't need to actually download every linked repo, but where tools are available to act like I have.(This will also enable it to replace the 10 different syntaxes I've had to learn for one project to reference another, via some Dependencies list and its generated Dependencies.lock list.)In a sense, we already have all of these features, in folders. You can combine and separate them, you can make a local folder mimic a folder on a remote system, and access its content without needing to download it all ahead of time. They just don't have any VCS features baked in. We've got {filesystems, network filesystems, and VCS}, and each of the three has some features the others would like!I don't have much money right now but I'd pay $1000 for a good solution to this. I'd use it for my home directory, my backups, my media server, etc.

评论 #18860158 未加载

评论 #18868336 未加载

评论 #18867264 未加载

评论 #18858007 未加载

jupp0rover 6 years ago

While certainly interesting, the article leaves out the juicy bits monorepos and their workflows offer:- library developers can easily see the impact their changes will have on their consumers- library changes that introduce regressions in their consumers can be caught pre-merge given good test coverage- dependency version updates between packages cause less mayhem because they are performed atomically and only merged when greenAt the same time, many drawbacks are also left out:- the incentive to have long living branches for stability reasons can negate most benefits mentioned above- build times for compiled languages can become problematic even for moderately sized organizations (I’m looking at you, C++)- in my experience, you pretty much need a dedicated dev team working on workflow tooling because ready solutions are fragmented and hard to integrate (code review, merge bots, CI/CD, ...)

评论 #18858813 未加载

评论 #18856582 未加载

评论 #18856778 未加载

Joe8Bitover 6 years ago

The benefits of monorepos (in my experience) are all people/organisation based e.g. it can be easier to enforce standards/processes among 100s-1000s of engineers with a monorepo, or it can be easier to manage/release a very large interdependent codebase/eco-system being worked on/coordinated between dozens of teams.However, this linked post makes a great point, those benefits are all 'scale' problems which 99.99% of orgs don't have. The corollary is I've seen how hard it is to go from from multi-repo -> monorepo when you reach the scale where you would see some benefit.I also think that the tooling/UX doesn't publicly exist to solve the multi-repo problem with 100s-1000s of engineers working on 100s of repos. It becomes so hard to navigate, understand and grok and so much is buried in dark corners. My experience is that that tooling is less hard to build around monorepos (Google for example).

评论 #18858496 未加载

justinwpover 6 years ago

We are a small 20ish dev company using a monorepo with mostly python. Our tooling is Bazel and Drone.- Ease of onboarding. Being able to quickly build or test any target is awesome for the new employee.- Ease of collaboration. I can see all of the code easily and can learn from these patterns. I can also quickly contribute or extend apis and fix all usages without concern for breaking changes.Our use of Bazel quickly gets us around git scale issues by enabling external dependencies that can be loaded into the workspace without fully vendoring everything.

评论 #18856375 未加载

评论 #18856671 未加载

评论 #18856472 未加载

azhenleyover 6 years ago

I’ve never understood the debate between mono and multi repos. With the right tooling, the line seems to vanish and you just have folders anyway.Each repo may have their own policies and permissions, which is the biggest reason I see to keep them separate, but again the distinction still seems little more than a folder.Am I missing something?

评论 #18855923 未加载

评论 #18855948 未加载

评论 #18855926 未加载

评论 #18856118 未加载

评论 #18856327 未加载

评论 #18855902 未加载

评论 #18855907 未加载

评论 #18857082 未加载

grey-areaover 6 years ago

We're pretty small scale (< 20 services, < 10 devs), and happily use a monorepo (recently moved from multiple repos when that became unwieldy as services grew). If you have a lot of services/projects with some shared dependencies they can make tracking that easier. I agree with the article that in general they make life easier.It depends on what tooling you're using, and whether it is tied to the version control system. Clearly if the tooling makes assumptions about one deployable per repo and works on git hooks that's going to cause pain, but the answer is don't use monorepos if your tooling doesn't support it, or change the tooling so it does.Most companies won't scale past a few hundred employees, so they're never going to hit any sort of scale issues with monorepos, and if they do, they'll have the resources to deal with it.Does this have to be a religious war? Does one size fits all really apply here?

评论 #18856148 未加载

评论 #18856242 未加载

nwhattover 6 years ago

The right way to write this kind of content is like Digital Ocean did: <a href="https://blog.digitalocean.com/cthulhu-organizing-go-code-in-a-scalable-repo/" rel="nofollow">https://blog.digitalocean.com/cthulhu-organizing-go-code-in-...</a>Rather than these back and forth about the theoretical implications of a monorepo, actual stories of implementing one are 10x more useful to me.

评论 #18857264 未加载

monksyover 6 years ago

> Developers are not arguing children that need to be confined to separate rooms to prevent fightsHas the author seen the fights that go on? We're extremely opinionated.

asdfasdfasdfaover 6 years ago

The original "Monorepos please don't" article really just convinced me how great monorepos are when you aren't at scale. So you know, put your shit in a monorepo, and then when it gets painful, break it out.

评论 #18856857 未加载

jphover 6 years ago

Monorepo vs polyreop summary notes from previous HN discussions: <a href="https://github.com/joelparkerhenderson/monorepo_vs_polyrepo" rel="nofollow">https://github.com/joelparkerhenderson/monorepo_vs_polyrepo</a>I'm adding notes from this HN discussion today. Feedback welcome.

nijaveover 6 years ago

Merging/integrating code & styles is difficult and error prone. At the end of the day, if two systems interact they will need to be "merged" at some point. It seems to make more sense to handle this in tests/at a source code level then risk doing it in the runtime environment alone.I think tooling and granular permissions (still part of tooling) can be blockers, though. It makes less sense outside an enterprise/company perspective such as developing a discrete component that gets pushed to a public repo (Maven, pypi, npm, etc)

EGregover 6 years ago

What are the actually serious downsides of having a repo for each project again? Serious question. Mercurial supports Subrepositories for example. Just define your rules for pulling stuff.From my own experience, if you are arguing about whether use convention A or convention B, the answer should be to have C that allows both, and then configurations on top of C for A and B.This applies for example to lookups in the dabatase by an index.

austincheneyover 6 years ago

> Does the practice of keeping all code together in one place lead to better code sharing? In my experience that's clearly the case.This is where abstraction comes in. When done correctly abstractions are necessary so that you can separate your work from things you don't want to work on. In my application I want to be able to access and modify files on the local filesystem. I don't care about the differences between opening files in Windows versus Linux or the intricacies of how filesystems work at the bit level. My application evaluates some code and writes some output to a file. I use Node.js to solve for a universal file management API. This is an example of a good abstraction because the separation is clear and explicit.The simple rule for abstractions is if you can do the very same job in a lower level you don't need the higher level code. In the Node.js example you cannot access the filesystem in a lower level, because no such standard library exists to JavaScript.Bad abstractions don't provide separation. Many times developers want to use an abstraction to solve for complexity, but inadvertently do the very same things the abstraction is supposedly solving for just in a different style or syntax. Many JavaScript developers use abstractions to access the DOM or XHR. XHR is simple: assign a handler to the onreadystatechange property, open the connection, and then send the request. You lose huge amounts of performance by abstracting these and dramatically increase your code base and the separation between the API, the framework performing the abstraction, and the code you are writing are all superficial and self-imposed.By using and enforcing good abstractions while avoiding bad abstractions you keep your application far more lean and restrict the focus of your development team to the goals of the project. Without that your code isn't a monorepo, its a dependent library of another repo.

oblioover 6 years ago

I just have this to say: the discussions here are painfully oriented around SaaS. Once you're doing stuff on-premise or making desktop applications (things requiring long lived release branches), the discussion is totally different.

评论 #18856836 未加载

评论 #18856816 未加载

eddiehover 6 years ago

I think I can summarize my thinking on this pretty succinctly: I want to build a product, not tooling for software development, and I certainly don't want to spend any time trying to keep different repos synchronized, etc.

评论 #18860925 未加载

vorpalhexover 6 years ago

I'm part of a company that went from a boring VCS strategy to jumping on the monorepo bandwagon against my advice to keep our git usage simple. It's been fairly terrible - merge conflicts, code going to the wrong environment, nobody can actually do a hot patch, and even long running feature branches which should be stupidly simple run into immense problem.It also caused issues with our npm repo solution, and has created the worst case of dependency lock we've ever had.Do yourself a favor and say no to monorepos. It is massive complexity for no benefit.

DannyBeeover 6 years ago

1. It's really hard to tell if any of the people writing blog posts about these things have ever experienced the larger scale monorepos or not for any length of time.As best I can tell, the answer is "no", and they are mostly writing based on perception. They don't appear to even do things like "try to talk to people who have experienced the good and bad of it".While the writing is fun, it makes it a lot less useful in both directions, IMHO.2. The author is right that planning more than 6 months for smaller scale companies makes no sense. However, both of these authors seem to fundamentally miss the actual problem in large companies, which they assume is around engineering and scaling large systems. In fact, it is not. The underlying issue is that engineering a thing is no longer your main cost. This is one of many reasons larger teams/companies are fundamentally different (as this author does correctly point out).There are 2080 work hours in a year.If i have 8000 developers, and I have to spend an hour teaching them a new thing, i just spent ~4 people for a year.If you spend a day teaching them something new, I just spent ~31 people for a year.If you spend a work week teaching them something new, I just spent ~154 people for a year.That's just the basic learning costs, it doesn't include migration costs for code base or anything else[1].But these costs certainly dominate the cost to engineer a solution as you get larger - the systems being talked about here (which have scaled engineering wise) are not 50 people a year (i work next to them :P). Not even close.In some sense, talking about the engineering challenges makes no sense - they basically don't matter to the overall cost at large scale.These same things apply to most of the broader (in the sense of who it touches) pieces of developer infrastructure like programming languages, etc.As you can also imagine, you can't stand still, and so you will pay costs here, and need to be able to amortize these costs over the longer term. In turn, this means you have to plan much longer term, because you want to pay these costs over a 5-10 year scale, not a 1 year scale.[1] It also excludes the net benefits, but you still pay the costs in actual time even if you get the benefits in actual time as well :)Also, productivity benefits from new developer infrastructure are wildly overestimated in practice. Studies I've seen basically show people perceive a lot of benefit that either doesn't pan out or doesn't translate into real time saved. So at best you may get happiness, which while great, doesn't pay down your cost here ;)

评论 #18856754 未加载

评论 #18856591 未加载

评论 #18856611 未加载

SideburnsOfDoomover 6 years ago

This can't really be discussed without also clarifying how your target language and tools ecosystem does package management. i.e. are you expecting to generate and version internal packages and then consume them from a feed?It seems like, if you don't have this facility then the monorepo becomes more compelling.Not to mention, are you building 1 or 2 apps, or a whole host of microservices.Without knowing that, your experience of mono vs. multi-repos won't be much use.

pm90over 6 years ago

I have a simple question: are monorepos possible in git? What is the upper limit of contributions per day for git to be effective in a monorepo?

评论 #18856041 未加载

lmmover 6 years ago

> Looking at it from the other side, could introducing strict borders somehow make it easier to reuse logic? I think it's clear that borders can only take away from your ability to perceive opportunities to use abstractions or to unify code.This is not at all clear. I'd argue that a visible organization of your codebase into repositories makes it easier to reuse code in the same way that interface/implementation splits do: it makes it clearer which parts felt domain-specific and which felt like reusable libraries.> The bottom line is that you should pick the right abstraction and the right place for a function or class based on the individual merits of the case - and not driven by facts about repos created a long time ago.This seems to be assuming that repository boundaries are defined in the beginning and fixed for all time - the same mistake I see opponents of static typing making. Your repository structure reflects your logic and business structure; as those change you change your code structure to match.> True, touching multiple subprojects in a single commit is not always desirable. For example, updating backend and frontend components incrementally in backward-compatible ways can be the better approach. But even so, it's useful to retain the option of cross-boundary commits for many reasons including simplicity and enforced coordination.This needs to be justified. In much of programming we consider the benefits of strict isolation to outweigh the costs - e.g. private fields in OO languages, true parametric polymorphism, microservices. You can't just assert that having the option of bypassing the good practice is worthwhile.> If you think about it, splitting a codebase into sub-repos is a ham-fisted way to enforce ownership boundaries. Developers are not arguing children that need to be confined to separate rooms to prevent fights. With sufficient communication and good practices, a monorepo will allow you to avoid the question “which repo does this piece of code belong to?” Instead of thinking about repo boundaries - effectively a distraction - a monorepo allows you to focus on the important question: where should we draw the boundaries between modules to keep the code maintainable, understandable and malleable in the light of changing requirements?Communication and good practice are the most costly way to enforce important things; you could equally well argue that e.g. unit tests are a ham-fisted way to enforce non-breaking of code and developers are not arguing children that need to be reminded not to break each other's functionality.Repo boundaries are higher-level than directory boundaries. No-one is arguing for having each directory in its own repo, but being able to represent "not directly involved, but versioned together" and "separate enough to be versioned separately" is a very valuable distinction to have in your toolbox.> Many of us, especially in the world of startups, work in smaller teams - let's say less than 100 developers.Do you find it practical to communicate and co-ordinate with 100 other developers before making any changes? Because that's the only case where a single repo makes sense - when you are working closely enough with every other developer sharing the repository that you don't need to go to any extra effort to organize who is changing what.Once you're not attending the same standup, you shouldn't be working on the same repository. You need to have a release cycle with semVer etc. so that people who aren't in close communication with you can understand the impact of changes to your code area. Since tags are repository-global, the repository should be the unit of versioning/releasing.

评论 #18860159 未加载

chrismathesonover 6 years ago

Glad you wrote this and saved me the effort, pretty much what I was thinking of jotting down