I'm sure GitHub did their due diligence before starting to work on this, but I can't lie: it bums me out a bit that they didn't find git-bigstore [1] (a project I wrote about 2 years ago) before they started, since it works in almost the exact same way. Three-line pointer files, smudge and clean filters, use of .gitattributes for which files to sync, and remote service integration.<p>Compare "Git Large File Storage"'s file spec:<p><pre><code> version https://git-lfs.github.com/spec/v1
oid sha256:4d7a214614ab2935c943f9e0ff69d22eadbb8f32b1258daaa5e2ca24d17e2393
size 12345
</code></pre>
And bigstore's:<p><pre><code> bigstore
sha256
96e31e44688cee1b0a56922aff173f7fd900440f
</code></pre>
Bigstore has the added benefit of keeping track of file upload / download history _entirely in Git_, using Git notes (an otherwise not-so-useful feature). Additionally, Bigstore is also _not_ tied to any specific service. There are built-in hooks to Amazon S3, Google Cloud Storage, and Rackspace.<p>Congrats to GitHub, but this leaves a sour taste in my mouth. FWIW, contributions are still welcome! And I hope there is still a future for bigstore.<p>[1]: <a href="https://github.com/lionheart/git-bigstore" rel="nofollow">https://github.com/lionheart/git-bigstore</a>
It's interesting that this uses smudge/clean filters. When I considered using those for git-annex, I noticed that the smudge and clean filters both had to consume the entire content of the file from stdin. Which means that eg, git status will need to feed all the large files in your work tree into git-lfs's smudge filter.<p>I'm interested to see how this scales. My feeling when I looked at it was that it was not sufficiently scalable without improving the smudge/clean filter interface. I mentioned this to the git devs at the time and even tried to develop a patch, but AFAICS, nothing yet.<p>Details: <<a href="https://git-annex.branchable.com/todo/smudge>" rel="nofollow">https://git-annex.branchable.com/todo/smudge></a>
This looks like it misses the mark a bit.<p>As anyone who's worked on project with large binary files(the docs assume PSDs) you need to be able to lock unmergeable binary assets. Otherwise you get two people touching the same file and someone has to destroy their changes. That never makes anyone happy.<p>It's also unseen how good the disk performance is. These two areas are the reason why Perforce is still my go-to solution for large binary files.
So basically it's git-annex, but tied to GitHub. <a href="http://git-annex.branchable.com/" rel="nofollow">http://git-annex.branchable.com/</a>
This looks really interesting. You basically trade the ability to have diffs (nearly meaningless on binary files anyway) for representing large files as their SHA-256 equivalent values on a remote server.<p>What will be interesting is to see whether GitHub's implementation of LFS allows a "bring your own server" option. Right now the answer seems to be no -- the server knows about all the SHAs, and GitHub's server only supports their own storage endpoint. So you couldn't use, say, S3 to host your Git LFS files.
> Every user and organization on GitHub.com with Git LFS enabled will begin with 1 GB of free file storage and a monthly bandwidth quota of 1 GB.<p>Does this mean that with the free tier I can upload a 1GB file which can be downloaded at most <i>once a month</i>?
Even a small 10MB file, which fits comfortably in a git repo, could be downloaded only 100 times a month. Maybe they meant 1TB bandwidth?
The "filter-by-filetype" approach used here is going to work a lot better for mixed-content repositories than git-annex, which doesn't have that capability built-in (to my knowledge).<p>git-annex has been great for my photo collection (which is strictly binary files). It lets me keep a partial checkout of photos on my laptop and desktop, while replicating the backup to multiple hosts around the internet.<p>At work we have a bunch of video themes that are partially XML and INI files and partially JPG and MP4. LFS would work great for us, except we don't use github (we don't have a need for it.) It looks like this is going to be very simple for that kind of workflow.<p>Just yesterday HN user dangero was looking for this exact sort of thing, large file support in git that didn't add too much complexity to the workflow: <a href="https://news.ycombinator.com/item?id=9330125" rel="nofollow">https://news.ycombinator.com/item?id=9330125</a>
This solves a real problem, but I can't help but feel it is a band-aid hack.<p>The main fundamental advantage (vs implementation quirks of git) I can see is that these files are only fetched on a git checkout. But (of course) this breaks offline support, and it requires additional user action.<p>Wouldn't it have been fairly easy to build exactly the same functionality into git itself? "Big" blobs aren't fetched until they are checked-out? This also has the advantage the definition of "big" could depend on your connectivity / disk space / whatever, rather than being set per-repo.
After a quick scan, I'm a bit worried that this is too tied to a server in practice. For example, if I've downloaded everything locally, can I easily clone the whole download (including all lfs files) into a separate repo? If I can, can changes to each be swapped back and forth?
Our solution is likely a lot more duct tape-y, but we developed a straight-forward tool in Go for managing large assets in git: <a href="https://github.com/dailymuse/git-fit" rel="nofollow">https://github.com/dailymuse/git-fit</a><p>There's a number of other solutions open source out there, some of which are documented in our readme.
Can someone explain to me what problem this solves in layman's terms... How are version control systems are "impractical" for large files?<p>Or to put another way, what problems will I run into if I just commit large media files without using this?
Is there any hint on pricing? Slighty annoying to have a section titled "Pricing" which .... doesn't tell you the price. I would much rather use my own external server for hosting large files, it is going to need to be price competitive with other options to be interesting I would think.
What's stopping git from storing large files using Merkle trees + a rolling hash?<p>I'm probably missing something since there this, and git-annex, and git-bigstore, and others...
I like the ease of use of 'git lfs track "*.psd"' and being able to use normal git commands after that.<p>Would it be possible to extend git-annex with a command that lets you set one or more extensions? By using git hooks you can probably ensure that the normal git commands work reliably.
To celebrate the broader support for git with large files we just raised the storage limit of GitLab.com to 10GB <a href="https://about.gitlab.com/2015/04/08/gitlab-dot-com-storage-limit-raised-to-10gb-per-repo/" rel="nofollow">https://about.gitlab.com/2015/04/08/gitlab-dot-com-storage-l...</a> also, we're glad GitHub open sourced it and didn't call it assman
Has anyone seen what happens for a user who doesn't have this installed when cloning? I've tried it out but it seems to not affect local clones.
Does Github really do this using git's "smudge" and "clean" filters? That would mean reprocessing the whole file for each access. That's inefficient. It's useful only if someone else is paying for the disk bandwidth, and necessary only if you don't have control of the storage system. Why would GitHub do that to itself?
BitKeeper has had a better version of this since around 2007. Better in that we support a cloud of servers so there is no "close to the server" thing, everyone is close to the server.<p>What we don't have is the locking. I agree with the people commenting here that locking is a requirement because you can't merge. We need to do that.
Wouldn't it be nicer if we had something like this on the level of the filesystem, instead of on the level of a version control system? Advantages would be that git and any other user-space application wouldn't need much extension, and files could be opened as if they were on the local file system.
> Every user and organization on GitHub.com with Git LFS enabled will begin with 1 GB of free file storage and a monthly bandwidth quota of 1 GB.<p>A GB doesn't get you very far if you are working with raw audio and video.<p>Does it make sense to think about storing virtual machines images (.vmdk) in git on GitHub with LFS?