TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Wrangling 2000 Git Repos at Reddit

55 pointsby jdorfmanabout 1 year ago

14 comments

conjecTechabout 1 year ago
I worked at Reddit in the not so distant past. The entire recommendation system lived in 3 repos. I'm pretty sure there are just 2000 repos because the onboarding tutorials have you create one, and that number is probably around the number of engineers that have worked there. I'd guess 100-200 have some production component.
评论 #39750687 未加载
airstrikeabout 1 year ago
You know, big sweeping refactors deservedly get a bad rep, but as everything else in life, there are always exceptions<p>At some point, I don&#x27;t know, maybe when you cross the 100 repos mark, you&#x27;ve gotta ask yourself &quot;maybe we could try a different approach?&quot;<p>It&#x27;s not like reddit has been known for its wonderful stability over the years<p>I&#x27;m sure the scale here is completely unlike anything I&#x27;ve ever worked on, but how hard can it be to write a sane implementation of a message board?<p>I&#x27;d be curious how much of this problem is caused by the junk that is &quot;new reddit&quot;. I&#x27;ve been there since 2007... The day old.reddit.com is the day I abandon it for good
评论 #39750798 未加载
mebazaaabout 1 year ago
Yes, the Reddit dev team might have spawned a 2000+ repo mess, but they also host it under the snooguts.net domain name, which is objectively adorable, so all is forgiven.
评论 #39750122 未加载
tayo42about 1 year ago
I worked with a monorepo and multiple teams dedicated to the dev experience, I had my complaints but I was spoiled in hindsight.<p>I know they did alot with git to make it manageable, hopefully what ever they did makes it to the open source world eventually so we can all avoid these crazy thousand repo worlds.
评论 #39749699 未加载
评论 #39749749 未加载
headsabout 1 year ago
We used to have an ecosystem like this. In our case it reflected an entrenched set of divisions between warring teams. In some ways it may have then enhanced those positions and we still bear a few of the scars today.<p>A lot of the old guard have left the company though and our main product moved from four repos to just one. The threat from the legal team to have enforced OWNERS files — essentially replicating the divisive politics of the old repos but in the monorepo — thankfully withered on the vine. We still audit what goes into each release but it’s no longer part of any active permissions thing. We trust our developers but verify, for legal reasons, that nothing went wrong.<p>You either want one engineering team to act in unison behind your company’s mission, or you want to live a divisive narrative that you are actually multiple teams “working” together with none of the advantages of living under one roof and all the disadvantages of hard repository boundaries crisscrossing your intellectual property.<p>So many factors threaten to curdle your team dynamic: multiple offices, multiple floors, work from home hermits, bad management, etc. It’s simply org entropy and it takes much effort to keep the weeds out of the garden. Multiple repositories is one less bullet you can keep out of your feet while fighting all the other battles that threaten to turn your team from 1990s Sun Microsystems into 2010 Sun Microsystems.
IshKebababout 1 year ago
That&#x27;s crazy. Monorepo definitely makes more sense.<p>Though I always wonder - how do Google, Microsoft, Facebook etc. deal with developing code near the root of their dependency tree? Utility libraries for example. Technically you&#x27;re going to have every change you make there building <i>all</i> the code and running <i>all</i> the tests, which is obviously unworkable. What do they do?
评论 #39750635 未加载
评论 #39751151 未加载
评论 #39750381 未加载
miduilabout 1 year ago
I can&#x27;t believe how it is working in such a big structure with just GitHub alone. GitLab with groups&#x2F;subgroups and also integrated sourcegraph seems such more practical at this scale.
nolist_policyabout 1 year ago
I don&#x27;t get the hating here. With the right tooling it doesn&#x27;t matter if its 10, 100 or 2000 repos. And it buys you some nice things like per-repo permission settings.
评论 #39750559 未加载
评论 #39751122 未加载
ivanjermakovabout 1 year ago
Out of those 2k repos, how many of them actually used in production?
评论 #39753914 未加载
ydnaclementineabout 1 year ago
Sounds made up for why the R&amp;D costs in their IPO docs was 450million or whatever
sethammonsabout 1 year ago
To those who are swinging towards monorepos, I don&#x27;t think that is a good solution. The reason being is that developers simply cannot be trusted to &quot;do the right thing&quot; on data and module boundaries. Someone comes in new to the project and does something they don&#x27;t know they shouldn&#x27;t. It is the honor system backed by weak linters and tooling.<p>In our monorepo, everyone passes around django orm objects and boundaries are practically non-existent. N+1 queries abound. Tests are full of patching and mocking and are _slow_. Our build takes over an hour to run tests. Someone on team A can and absolutely will mess up what someone on team B is doing. We are now having to spend quarter upon quarter as we define and enforce domain boundaries within the python code base. It is all bolted on checks. Tests are getting worse and people are actively trying to figure out ways around the testing system because it sucks.<p>Compare to my last gig. We had several hundred production repos. Each repo starts from a template with its own build pipeline. All production repos are gated so that any PR must pass tests before it can merge. Any merge has to pass tests before it could be deployed. As the base build processes matured, teams could, at their leisure, pull their services up to the latest and greatest. We even migrated from Jenkins to Buildkite; yeah, it took N pulls into N repos. Not a big deal. Most projects&#x27; tests and builds could get code out to production in under 10 minutes, including all those checks. Due to the network boundary, you couldn&#x27;t accidentally get around someone&#x27;s abstraction. And if one team blew up their build doing something dumb? No problem, it only affects that one team.<p>The argument is &quot;gah, managing all those services!&quot; Keep data behind APIs. Keep APIs backwards compatible. Keep dependencies acyclic. This is _possible_ with monorepos, but you have to do extra work compared to networked services -- yes, when any particular team&#x2F;service can deploy in minutes due to low build system complexity you are winning. Can you get that wrong and make strange cyclic dependencies and introduce performance issues due to network hops? Yeah, of course. However, we were processing, literally, 10s of billions of api requests on this system and teams could work untethered from one another. The new gig does eerily similar software, but is several orders of magnitude slower in their ability to process data and their ability to move new features.<p>yes, yes, you could have networked services and a monorepo and you can leverage tooling like Pants to minimize the testing to only account for changed files. It is just fighting what I have found to be a better model. Keep things separate. Keep things fast to change.
评论 #39751272 未加载
评论 #39750851 未加载
评论 #39750793 未加载
ZephyrBluabout 1 year ago
2000 repos what the fuck. More repos than engineers sounds terrible. Having worked in a large monolithic repo, I much prefer that. Everything (Shipping, testing, debugging, etc) is much easier that way.
评论 #39749718 未加载
评论 #39750415 未加载
评论 #39750717 未加载
MilStdJunkieabout 1 year ago
Holy Jesus Buddha Muhammad on a Harley. <i>2000 repos</i> for a <i>messageboard?!</i><p>I don&#x27;t think someone knows what &quot;repository&quot; means.<p>At least they&#x27;re bringing in Sourcegraph. That tool&#x27;s helped me make sense of some chaos. Not <i>2000 repos&#x27;</i> worth of chaos, but still, <i>some</i> chaos.
评论 #39749741 未加载
评论 #39750722 未加载
hackmiesterabout 1 year ago
Non-legacy Reddit link: <a href="https:&#x2F;&#x2F;reddit.com&#x2F;r&#x2F;RedditEng&#x2F;comments&#x2F;1bdtrjq&#x2F;wrangling_2000_git_repos_at_reddit&#x2F;" rel="nofollow">https:&#x2F;&#x2F;reddit.com&#x2F;r&#x2F;RedditEng&#x2F;comments&#x2F;1bdtrjq&#x2F;wrangling_20...</a>
评论 #39750132 未加载
评论 #39749753 未加载