Something missing from the list of problems: Git LFS is a http(s) protocol so is problematic at best when you are using Git over ssh[1].<p>The git-lfs devs obviously don't use ssh, so you get the feeling they are a bit exasperated by this call to support an industry standard protocol which is widely used as part of ecosystems and workflows involving Git.<p>[1] <a href="https://github.com/git-lfs/git-lfs/issues/1044" rel="nofollow">https://github.com/git-lfs/git-lfs/issues/1044</a>
Did he really just try to make the argument that we shouldn’t use LFS because Git will have large file support at some unspecified point in the future?<p>LFS has existed for several years, and as far as I know Git still doesn’t have support for large files. At this point I’m not holding out much hope.
I despise LFS.<p>I’m sure that if you know how to use it... maybe... you can figure it out.<p>That said; here’s my battle story:<p>Estimate the time it’ll take to move all our repositories from a to b they said.<p>Us: with all branches?<p>Them: just main and develop.<p>Us: you just clone and push to the new origin, it’s not zero but it’s trivial.<p>Weeks later...<p>Yeah. LFS is now banned.<p>LFS is not a distributed version control system; once you use it, a clone is no longer “as good” as the original, because it refers to a LFS server that is independent of your clone.<p>...also, actually cloning all the LFS content from git lab is both slow and occasionally broken in a way that requires you to <i>restart</i> the clone.<p>:(
The latest version of git has a very similar feature called “partial clones” to what the author describes for Mercurial. All the data is still in your history, no extra tools are needed, but you only fetch the blobs from the server for the commits you checkout. So just like LFS larger blobs not on master are effectively free, but you still grab all the blobs for your current commit.<p>You need server side support, which GitHub and GitLab have, and then a special clone command:<p><pre><code> git clone --filter=blob:none
</code></pre>
Some background about the feature is here: <a href="https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone/" rel="nofollow">https://github.blog/2020-12-21-get-up-to-speed-with-partial-...</a>
All three points are really just the same point repeated three times: That it isn't part of core/official GIT ("stop gap" until official, irreversible to later official solution, and adds complexity that an official version would lack due to extra/third party tooling).<p>I'm frankly surprised GIT hasn't made LFS an official part by now. It fixes the problem, the problem is common and real, and GIT hasn't offered a better alternative.<p>If LFS was made official it would solve this critique, since that is really the only critique here.
git-annex is an interesting alternative the HTTP-first nature of Git LFS and the one-way door bother you.<p>You can remove it after the fact if you don't like it, it supports a ton of protocols, and it's distributed just like git is (you can share the files managed by git-annex among different repos or even among different non-git backends such as S3).<p>The main issue that git-annex does <i>not</i> solve is that, like Git LFS, it's not a part of git proper and it shows in its occasionally clunky integration. By virtue of having more knobs and dials it also potentially has more to learn than Git LFS.
I've been using Git LFS with several large Unity projects in the past several years. Never really had any problems. It was always just "enable and forget" kind of thing.
A side topic: is there a concrete reason why github's LFS solution has to be so expensive?<p>IIRC, it's $5 per 50GB per month? That's really a deal breaker to me and wondering whether people actually use LFS at volume will avoid LFS-over-GitHub.
Just this past week, git lfs was throwing smudge errors for me. Not really sure what the issue was, I followed the recommendations to disable, pull, and re-enable. And got them again. So I disabled. And left it disabled.<p>Not a solution.<p>This said, the whole git-lfs bit feels like a (bad) afterthought the way its implemented. I'd love to see some significant reduction of complexity (you shouldn't need to do 'git lfs enable', it should be done automatically), and increases in resiliency (sharding into FEC'ed blocks with distributed checksums, etc.) so we don't have to deal with 'bad' files.<p>I was a fan of mercurial before I switched to git ... it was IMO an easier/better system at the time (early 2010s). Not likely to switch now though.
This is really overstating the cost of a one-time setup step. History rewriting is only necessary for preexisting projects and you can use things like GitLab’s push rules to ensure that it’s never necessary in the future.<p>I get that a mercurial developer has different preferences but I don’t think that this is an especially effective form of advocacy.
Okay, so I should avoid it. What is the alternative?<p>I see so many git repos with READMEs saying download this huge pretrained weights file from {Dropbox link, Google drive link, Baidu link, ...} and I don't think that's a very good user experience compared to LFS.<p>LFS itself sucks and should be transparent without having to install it, but it's slightly better than downloading stuff from Dropbox or Google Drive.
I'm honestly super content with LFS. Wrote our own little API server to hook it up to Azure Blob Storage, never have issues with it. I don't recognize the issues mentioned in the article at all. Our whole team relies on it for years, and it delivers. No problems. Keep up the great work, git-lfs maintainers! Much love.
If you have a hundred images in git, and one cannot be downloaded for any reason, git smudge will not be able to run, and you won't be able to git pull at all.<p>We had an image on AWS go bad, still not sure how. Our devs lost the ability to pull. Disabling LFS could not be done (because of rewriting history). "disable smudge" is not an official option, and none of the hacks work reliably. We finally excluded all images from smudge, and downloaded them with SFTP. Git status shows all the images as having changed, and we are downright unhappy...<p>It would be happy to hear that I just don't know how to use LFS - but even if so, that means the docs are woefully not useful.<p>I want to: 1) Tell LFS to get whatever files it could, and just throw a warning on issues. 2) If image is restored not using LFS, git should still know the file has not been modified (by comparing the checksum or whatever smudge would do).
As much as Git LFS is a bit of a pain, on recent projects I've resorted to committing my node_modules with Yarn 2 to Git using LFS and it works really well.<p>Note that with Yarn 2 you're committing .tar.gz's of packages rather than the JS files themselves, so it lends itself quite well to LFS as there are a smaller number of large files.<p><a href="https://yarnpkg.com/features/zero-installs#how-do-you-reach-this-zero-install-state-youre-advocating-for" rel="nofollow">https://yarnpkg.com/features/zero-installs#how-do-you-reach-...</a>
<a href="https://yarnpkg.com/features/zero-installs#is-it-different-from-just-checking-in-the-node_modules-folder" rel="nofollow">https://yarnpkg.com/features/zero-installs#is-it-different-f...</a>
The reason the author provides is in my opinion weak compared to both his alternatives.<p>Sure, lfs contaminates a repository, so do large files, sensitive data removal, and references to packages and package managers that might become obsolete or non-existent in the future. The chance of your project compiling after 15 years, the age of git by the way, are very slim, and the chance that having a entirely compilable history being useful even slimmer.<p>And I think the author's statement about setupping up lfs being hard is exaggerated. It's a handful of command lines that should be in the "welcome at our company" manual anyway.<p>I've used lfs in the past and while it can be misused, as with all other tools, it does the job without too much headaches compared submodules and ignored tracked files.
My practice for storing large files with Git is to include the metadata for the large file in a tiny file(s):<p>1. Type information. Enough to synthesize a fake example.<p>2. A simple preview. This can be a thumb or video snippet, for example.<p>3. Checksum and URL of the big file.<p>This way your code can work at compile/test time using the snippet or synthesized data, and you can fetch the actual big data at ship time.<p>You can then also use the best version control tool for the job for the particular big files in question.
FWIW, rsync.net is currently deploying LFS support such that operations like:<p><pre><code> ssh user@rsync.net git clone blah
</code></pre>
... will properly handle LFS assets, etc.<p>This is in response to several requests we have had for this feature...
This opinion only lists issues, not solutions. Sure, they advertise mercurial, but migrating from git to mercurial is unrealistic for many cases.<p>I'd title it: "Why Mercurial is better than git+LFS"
Here's another fun one: <a href="https://github.com/git-lfs/git-lfs/issues/2434" rel="nofollow">https://github.com/git-lfs/git-lfs/issues/2434</a><p>> Git on Windows client corrupts files > 4Gb<p>It's apparently an upstream issue with Git on Windows, but if you depend on something, you inherit its issues.
Pushing Github past the 100mb limit has to be the most requested feature. Ridiculous that we have to use the fudge that is GitLFS.<p>It just adds complication for a limit that shouldn't be there anyway.
The solution I've tended to use in classes (where there'll always be some student who hasn't installed LFS) is to store the large files in Artifactory, so they are pulled in at build-time in the same way as libraries.<p>This seemed to me a sensible approach as Artifactory is a repository for binaries (usually, the compiled output of a project). It also seemed to me that the decisions on which versions to retain and when an update to a binary is expected or when that resource is now frozen and a replacement would be a new version is similar to the decision on when a build is a snapshot vs a release.
If you just don't jump on random tech without good reasons, you already naturally apply this advice. Especially since once you <i>really</i> need it and also wants Git, there is not much alternative (as the author recognizes). In this context, just waiting for a potential "better support for handling of large files" of official Git makes little sense; plus I make the wild prediction that what will actually happen is that it's Git LFS that will (continue to) be improved and used by most people (and maybe even integrated in "official Git"?)
You don't need to rewrite history unless you weren't using LFS or accidently committed large files to the repository. Nothing about LFS "requires" rewriting history.<p>Not to mention, many users are paying for a service that provides LFS, and hosting an LFS service isn't crazy hard. It's a file server with a custom API, it's mostly doable using S3 as a backend. It's not like this is crazy complicated stuff.
I keep hearing the mantra that "svn is better for large files than git" but never really understood why. To me a large file is a large file; if you make changes, worst case scenario you add the entire new file to the commit, best case you add some sort of binary diff. Does git do the former and svn the latter by any chance?
The thing that always rubbed me the wrong way about git-lfs was that they cloned the git-scm.org site design. It's not part of git!<p>[1]<a href="https://git-lfs.github.com/" rel="nofollow">https://git-lfs.github.com/</a><p>[2]<a href="https://git-scm.com/" rel="nofollow">https://git-scm.com/</a>
I’m using Git+LFS because my issue tracker, CI/CD etc natively speaks it. Not because it’s in any way superior or even on par with the large file handling of Mercurial (or even SVN to be honest).
Is rewriting the history for large repos really that difficult besides coordinating with other contributors? My understanding is that it shouldn't be that much worse than "git gc --aggressive". Yes it is expensive, but it is the sort of thing you can schedule to do overnight or on a weekend.
Maybe I am missing the point. What is the alternative this article proposes then?... Also, Git is not central so how can you ever integrate large file support without a separate server?
Good points, but it seems optimistic to assume that git will have good, native, large file support anytime soon. I‘ve been waiting quite a while for git submodules to improve..
The main argument here seems to be that we shouldn’t use LFS because Git will have large file support at some unspecified point in the future?
Similarly you could argue that we shouldn't use a Covid vaccine because we'll develop a cure in the future..why vaccinate billions of people when we can just treat the 1% of people who get ill?
Clearly that argument doesn't work. People need a solution now. Ironically we had to stop using mercurial because it didn't have an LFS alternative even though I prefer it. LFS is definitely not ideal but as a solution to a real world problem, it works. There may be issues around cloning repos and losing history in the future, but those are one off issues where you have to accept the pain, rather than living in pain every day.