Announcing Git Large File Storage

620 pointsby dewskiabout 10 years ago

33 comments

theli0nheartabout 10 years ago

I'm sure GitHub did their due diligence before starting to work on this, but I can't lie: it bums me out a bit that they didn't find git-bigstore [1] (a project I wrote about 2 years ago) before they started, since it works in almost the exact same way. Three-line pointer files, smudge and clean filters, use of .gitattributes for which files to sync, and remote service integration.Compare "Git Large File Storage"'s file spec:<pre><code> version https://git-lfs.github.com/spec/v1 oid sha256:4d7a214614ab2935c943f9e0ff69d22eadbb8f32b1258daaa5e2ca24d17e2393 size 12345 </code></pre> And bigstore's:<pre><code> bigstore sha256 96e31e44688cee1b0a56922aff173f7fd900440f </code></pre> Bigstore has the added benefit of keeping track of file upload / download history _entirely in Git_, using Git notes (an otherwise not-so-useful feature). Additionally, Bigstore is also _not_ tied to any specific service. There are built-in hooks to Amazon S3, Google Cloud Storage, and Rackspace.Congrats to GitHub, but this leaves a sour taste in my mouth. FWIW, contributions are still welcome! And I hope there is still a future for bigstore.[1]: <a href="https://github.com/lionheart/git-bigstore" rel="nofollow">https://github.com/lionheart/git-bigstore</a>

评论 #9345137 未加载

评论 #9344612 未加载

评论 #9344558 未加载

评论 #9345028 未加载

评论 #9345053 未加载

评论 #9344762 未加载

joeyhabout 10 years ago

It's interesting that this uses smudge/clean filters. When I considered using those for git-annex, I noticed that the smudge and clean filters both had to consume the entire content of the file from stdin. Which means that eg, git status will need to feed all the large files in your work tree into git-lfs's smudge filter.I'm interested to see how this scales. My feeling when I looked at it was that it was not sufficiently scalable without improving the smudge/clean filter interface. I mentioned this to the git devs at the time and even tried to develop a patch, but AFAICS, nothing yet.Details: <<a href="https://git-annex.branchable.com/todo/smudge>" rel="nofollow">https://git-annex.branchable.com/todo/smudge></a>

评论 #9344547 未加载

评论 #9344366 未加载

评论 #9344385 未加载

vvandersabout 10 years ago

This looks like it misses the mark a bit.As anyone who's worked on project with large binary files(the docs assume PSDs) you need to be able to lock unmergeable binary assets. Otherwise you get two people touching the same file and someone has to destroy their changes. That never makes anyone happy.It's also unseen how good the disk performance is. These two areas are the reason why Perforce is still my go-to solution for large binary files.

评论 #9343278 未加载

评论 #9343320 未加载

评论 #9344436 未加载

评论 #9343762 未加载

评论 #9343387 未加载

Dojiabout 10 years ago

So basically it's git-annex, but tied to GitHub. <a href="http://git-annex.branchable.com/" rel="nofollow">http://git-annex.branchable.com/</a>

评论 #9343376 未加载

评论 #9343298 未加载

评论 #9343358 未加载

评论 #9343290 未加载

jxfabout 10 years ago

This looks really interesting. You basically trade the ability to have diffs (nearly meaningless on binary files anyway) for representing large files as their SHA-256 equivalent values on a remote server.What will be interesting is to see whether GitHub's implementation of LFS allows a "bring your own server" option. Right now the answer seems to be no -- the server knows about all the SHAs, and GitHub's server only supports their own storage endpoint. So you couldn't use, say, S3 to host your Git LFS files.

评论 #9343178 未加载

评论 #9343590 未加载

评论 #9343195 未加载

评论 #9343180 未加载

评论 #9343898 未加载

评论 #9343193 未加载

otabout 10 years ago

> Every user and organization on GitHub.com with Git LFS enabled will begin with 1 GB of free file storage and a monthly bandwidth quota of 1 GB.Does this mean that with the free tier I can upload a 1GB file which can be downloaded at most once a month? Even a small 10MB file, which fits comfortably in a git repo, could be downloaded only 100 times a month. Maybe they meant 1TB bandwidth?

评论 #9343238 未加载

jewelabout 10 years ago

The "filter-by-filetype" approach used here is going to work a lot better for mixed-content repositories than git-annex, which doesn't have that capability built-in (to my knowledge).git-annex has been great for my photo collection (which is strictly binary files). It lets me keep a partial checkout of photos on my laptop and desktop, while replicating the backup to multiple hosts around the internet.At work we have a bunch of video themes that are partially XML and INI files and partially JPG and MP4. LFS would work great for us, except we don't use github (we don't have a need for it.) It looks like this is going to be very simple for that kind of workflow.Just yesterday HN user dangero was looking for this exact sort of thing, large file support in git that didn't add too much complexity to the workflow: <a href="https://news.ycombinator.com/item?id=9330125" rel="nofollow">https://news.ycombinator.com/item?id=9330125</a>

评论 #9344593 未加载

评论 #9343321 未加载

justinsbabout 10 years ago

This solves a real problem, but I can't help but feel it is a band-aid hack.The main fundamental advantage (vs implementation quirks of git) I can see is that these files are only fetched on a git checkout. But (of course) this breaks offline support, and it requires additional user action.Wouldn't it have been fairly easy to build exactly the same functionality into git itself? "Big" blobs aren't fetched until they are checked-out? This also has the advantage the definition of "big" could depend on your connectivity / disk space / whatever, rather than being set per-repo.

评论 #9343176 未加载

Rondomabout 10 years ago

Has someone had a closer look and can say how this compares to Git-Annex?

评论 #9343373 未加载

评论 #9343989 未加载

geoffreyirvingabout 10 years ago

After a quick scan, I'm a bit worried that this is too tied to a server in practice. For example, if I've downloaded everything locally, can I easily clone the whole download (including all lfs files) into a separate repo? If I can, can changes to each be swapped back and forth?

m0th87about 10 years ago

Our solution is likely a lot more duct tape-y, but we developed a straight-forward tool in Go for managing large assets in git: <a href="https://github.com/dailymuse/git-fit" rel="nofollow">https://github.com/dailymuse/git-fit</a>There's a number of other solutions open source out there, some of which are documented in our readme.

callum85about 10 years ago

Can someone explain to me what problem this solves in layman's terms... How are version control systems are "impractical" for large files?Or to put another way, what problems will I run into if I just commit large media files without using this?

评论 #9346424 未加载

zmmmmmabout 10 years ago

Is there any hint on pricing? Slighty annoying to have a section titled "Pricing" which .... doesn't tell you the price. I would much rather use my own external server for hosting large files, it is going to need to be price competitive with other options to be interesting I would think.

saljamabout 10 years ago

What's stopping git from storing large files using Merkle trees + a rolling hash?I'm probably missing something since there this, and git-annex, and git-bigstore, and others...

duartetbabout 10 years ago

Does this mean gamedevs might start droping Perforce for this? If its not too expensive maybe?

评论 #9345150 未加载

评论 #9344102 未加载

sytseabout 10 years ago

I like the ease of use of 'git lfs track "*.psd"' and being able to use normal git commands after that.Would it be possible to extend git-annex with a command that lets you set one or more extensions? By using git hooks you can probably ensure that the normal git commands work reliably.

sytseabout 10 years ago

To celebrate the broader support for git with large files we just raised the storage limit of GitLab.com to 10GB <a href="https://about.gitlab.com/2015/04/08/gitlab-dot-com-storage-limit-raised-to-10gb-per-repo/" rel="nofollow">https://about.gitlab.com/2015/04/08/gitlab-dot-com-storage-l...</a> also, we're glad GitHub open sourced it and didn't call it assman

评论 #9345077 未加载

Poiesisabout 10 years ago

Has anyone seen what happens for a user who doesn't have this installed when cloning? I've tried it out but it seems to not affect local clones.

Animatsabout 10 years ago

Does Github really do this using git's "smudge" and "clean" filters? That would mean reprocessing the whole file for each access. That's inefficient. It's useful only if someone else is paying for the disk bandwidth, and necessary only if you don't have control of the storage system. Why would GitHub do that to itself?

luckydudeabout 10 years ago

BitKeeper has had a better version of this since around 2007. Better in that we support a cloud of servers so there is no "close to the server" thing, everyone is close to the server.What we don't have is the locking. I agree with the people commenting here that locking is a requirement because you can't merge. We need to do that.

Pirate-of-SVabout 10 years ago

Can't wait too see what Linus got to say about this. I suppose he got an arguably better solution to the problem?

评论 #9344090 未加载

ameliusabout 10 years ago

Wouldn't it be nicer if we had something like this on the level of the filesystem, instead of on the level of a version control system? Advantages would be that git and any other user-space application wouldn't need much extension, and files could be opened as if they were on the local file system.

nodesocketabout 10 years ago

> Every user and organization on GitHub.com with Git LFS enabled will begin with 1 GB of free file storage and a monthly bandwidth quota of 1 GB.A GB doesn't get you very far if you are working with raw audio and video.Does it make sense to think about storing virtual machines images (.vmdk) in git on GitHub with LFS?

spbabout 10 years ago

I still don't get why you wouldn't just check large binaries into a submodule and host that everywhere you would an annex/LFS.

nicheabout 10 years ago

Yes! Bringing us all one step closer to the whiysi (we host it you store it) dev paradigm. Bravo!

patconabout 10 years ago

Cool! Can't want for future integration of content-addressable systems like ipfs :)

markvitalsabout 10 years ago

This is very handy for designers, who want to use Photoshop or Illustrator with Git

dmitrypolushkinabout 10 years ago

Hopefully bup will implement something like that for the backuping.

jbrambleabout 10 years ago

Does this mean github could become useful for music production?

lesplatabout 10 years ago

So does this mean the large files are actually versioned?

评论 #9344479 未加载

silon3about 10 years ago

Can it link to torrent?

mahouseabout 10 years ago

Ah, cool! At last I will be able to store my database backups in GitHub.

评论 #9347118 未加载

ElectricFeelabout 10 years ago

my name is Lars & i do projects for LiveIT! this is exciting

33 comments

theli0nheartabout 10 years ago

评论 #9345137 未加载

评论 #9344612 未加载

评论 #9344558 未加载

评论 #9345028 未加载

评论 #9345053 未加载

评论 #9344762 未加载

joeyhabout 10 years ago

评论 #9344547 未加载

评论 #9344366 未加载

评论 #9344385 未加载

vvandersabout 10 years ago

评论 #9343278 未加载

评论 #9343320 未加载

评论 #9344436 未加载

评论 #9343762 未加载

评论 #9343387 未加载

Dojiabout 10 years ago

So basically it's git-annex, but tied to GitHub. <a href="http://git-annex.branchable.com/" rel="nofollow">http://git-annex.branchable.com/</a>

评论 #9343376 未加载

评论 #9343298 未加载

评论 #9343358 未加载

评论 #9343290 未加载

jxfabout 10 years ago

评论 #9343178 未加载

评论 #9343590 未加载

评论 #9343195 未加载

评论 #9343180 未加载

评论 #9343898 未加载

评论 #9343193 未加载

otabout 10 years ago

评论 #9343238 未加载

jewelabout 10 years ago

评论 #9344593 未加载

评论 #9343321 未加载

justinsbabout 10 years ago

评论 #9343176 未加载

Rondomabout 10 years ago

Has someone had a closer look and can say how this compares to Git-Annex?

评论 #9343373 未加载

评论 #9343989 未加载

geoffreyirvingabout 10 years ago

m0th87about 10 years ago

callum85about 10 years ago

评论 #9346424 未加载

zmmmmmabout 10 years ago

saljamabout 10 years ago

What's stopping git from storing large files using Merkle trees + a rolling hash?I'm probably missing something since there this, and git-annex, and git-bigstore, and others...

duartetbabout 10 years ago

Does this mean gamedevs might start droping Perforce for this? If its not too expensive maybe?

评论 #9345150 未加载

评论 #9344102 未加载

sytseabout 10 years ago

评论 #9345077 未加载

Poiesisabout 10 years ago

Has anyone seen what happens for a user who doesn't have this installed when cloning? I've tried it out but it seems to not affect local clones.

Animatsabout 10 years ago

luckydudeabout 10 years ago

Pirate-of-SVabout 10 years ago

Can't wait too see what Linus got to say about this. I suppose he got an arguably better solution to the problem?

评论 #9344090 未加载

ameliusabout 10 years ago

nodesocketabout 10 years ago

spbabout 10 years ago

I still don't get why you wouldn't just check large binaries into a submodule and host that everywhere you would an annex/LFS.

nicheabout 10 years ago

Yes! Bringing us all one step closer to the whiysi (we host it you store it) dev paradigm. Bravo!

patconabout 10 years ago

Cool! Can't want for future integration of content-addressable systems like ipfs :)

markvitalsabout 10 years ago

This is very handy for designers, who want to use Photoshop or Illustrator with Git

dmitrypolushkinabout 10 years ago

Hopefully bup will implement something like that for the backuping.

jbrambleabout 10 years ago

Does this mean github could become useful for music production?

lesplatabout 10 years ago

So does this mean the large files are actually versioned?

评论 #9344479 未加载

silon3about 10 years ago

Can it link to torrent?

mahouseabout 10 years ago

Ah, cool! At last I will be able to store my database backups in GitHub.

评论 #9347118 未加载

ElectricFeelabout 10 years ago

my name is Lars & i do projects for LiveIT! this is exciting