There is one note piece to the puzzle to make git perfect for every use case I can think of: store large files as a list of blobs broken down by some rolling hash a-la rsync/borg/bup.<p>That would e.g. make it reasonable to check in virtual machine images or iso images into a repository. Extra storage (and by extension, network bandwidth) would be proportional to change size.<p>git has delta compression for text as an optimization but it’s not used on big binary files and is not even online (only on making a pack). This would provide it online for large files.<p>Junio posted a patch that did that ages ago, but it was pushed back until after the sha1->sha256 extension.
Has anyone used Git submodules to isolate large binary assets into their own repos? Seems like the obvious solution to me. You already get fine-grained control over which submodules you initialize. And, unlike Git LFS, it might be something you’re already using for other reasons.
Also known as workspace views in P4.<p>It's interesting to see the wheel reinvented. We used to run a 500gb art sync/200gb code sync with ~2tb back end repo back when I was in gamedev. P4 also has proper locking, it is really the is right tool if you've got large assets that need to be coordinated and versioned.<p>Only downside of course is that it isn't free.
This is interesting and could be a savior for Machine Learning(ML) engineering teams. In a typical ML workflow, there are three main entities to be managed:
1. Code
2. Data
3. Models
Systems like Data Version Control(DVC) [1], are useful for versioning 2 & 3. DVC improves on usability by residing inside the project's main git repo while maintaining versions of the data/models that reside on a remote. With Git partial clone, it seems like the gap could still be reduced between 1 & 2/3.<p>[1] - <a href="https://dvc.org/" rel="nofollow">https://dvc.org/</a>
Also --reference (or --shared) is a good parameter to speed-up cloning (for build, for example), if you have your repository cached in some other place.
I was using it a long time ago when I was working on system that required to clone 20-40 repos to build. This approach decreased clone timings by an order of magnitude.
That seems quite useful, though Git LFS mostly does the job.<p>One of my biggest remaining pain points is resumable clone/fetch. I find it near impossible to clone large repos (or fetch if there were lots of new commits) over a slow, unstable link, so almost always I end up cloning a copy to a machine closer to the repo, and rsyncing it over to my machine.
This is great. We use get lfs extensively, and one of the biggest complaints we have is users have to clone 7GB of data just to get the source files. There's a work around in that you don't have to enter your username and password from the lfs repo, and let it timeout, but that's a kluge.
In the AAA games industry git has been a bit slower on the uptake (although that’s changing quickly) as large warehouses of data are often required (eg: version history of video files, 3D audio, music, etc.). It’s nice to see git have more options for this sort of thing.
This could actually be a really good solution to the maximum supported size of a Go module. If you place a go.mod in the root of your repo, then every file in the repo becomes part of the module. There's also a hardcoded maximum size for a module: 500M. Problem is, I've got 1G+ of vendored assets in one of my repos. I had to trick Go into thinking that the vendored assets were a different Go module[0]. Go would have to add support for this, but it would be a pretty elegant solution to the problem.<p>[0]: <a href="https://github.com/golang/go/issues/37724" rel="nofollow">https://github.com/golang/go/issues/37724</a>
I started a project recently and for the first time ever I've wanted to keep large files in my repo. I looked into git LFS and was disappointed to learn that it requires either third party hosting or setting up a git LFS server myself. I looked into git annex and it seems decent. This, once it is ready for prime time, will hopefully be even better
> One reason projects with large binary files don't use Git is because, when a Git repository is cloned, Git will download every version of every file in the repository.<p>Wrong? There's a --depth option for the git fetch command which allows the user to specify how many commits they want to fetch from the repository