> The Google codebase includes approximately one billion files and has a history of approximately 35 million commits spanning Google’s entire 18-year existence.<p>Wait, that's an average of nearly 30 new files per commit. Not 30 files changed per commit, but whatever changes are happening to existing files, plus 30 brand new files. For every single commit.<p>Although...<p>> The total number of files also includes source files copied into release branches, files that are deleted at the latest revision, [...]<p>I'm not quite sure what this is saying.<p>Is it saying that if `main` contains 1,000 files, and then someone creates a branch called `release`, then the repo now contains 2,000 files? And if someone then deletes 500 files from `main` in the next commit, the repo still contains 2,000 files, not 1,500?<p>If that's the case, why not just call every different version of every file in the repo a different file? If I have a new repo and in the first commit I create a single 100-line file called `foo.c`, and then I change one line of `foo.c` for the second commit, do I now have a repo with two files?<p>I mean, if you look at the plumbing for e.g. `git`, yes, the repo is storing two file objects for the repo history. But I don't think I've ever seen someone discuss the Linux git repo and talk about the total number of file objects in the repo object store. And when the linked paper itself mentions Linux, it says "The Linux kernel is a prominent example of a large open source software repository
containing approximately 15 million lines of code in 40,000 files" - and in that case it's definitely not talking about the total number of file objects in the store.<p>I don't think it's entirely clear what the paper even means when it talk about "a file" in a source code repository, or if it even means the same thing consistently. I'm not sure it's using the most obvious interpretation, but I can't understand why it would pick a non-obvious interpretation. Especially if it's not going to explain what it means, let alone explain <i>why</i> it chose one meaning over another.