Avoid Git LFS if possible

143 pointsby reimbarabout 4 years ago

32 comments

asymptosisabout 4 years ago

Something missing from the list of problems: Git LFS is a http(s) protocol so is problematic at best when you are using Git over ssh[1].The git-lfs devs obviously don't use ssh, so you get the feeling they are a bit exasperated by this call to support an industry standard protocol which is widely used as part of ecosystems and workflows involving Git.[1] <a href="https://github.com/git-lfs/git-lfs/issues/1044" rel="nofollow">https://github.com/git-lfs/git-lfs/issues/1044</a>

评论 #27136072 未加载

Aeolunabout 4 years ago

Did he really just try to make the argument that we shouldn’t use LFS because Git will have large file support at some unspecified point in the future?LFS has existed for several years, and as far as I know Git still doesn’t have support for large files. At this point I’m not holding out much hope.

评论 #27138373 未加载

评论 #27137841 未加载

wokwokwokabout 4 years ago

I despise LFS.I’m sure that if you know how to use it... maybe... you can figure it out.That said; here’s my battle story:Estimate the time it’ll take to move all our repositories from a to b they said.Us: with all branches?Them: just main and develop.Us: you just clone and push to the new origin, it’s not zero but it’s trivial.Weeks later...Yeah. LFS is now banned.LFS is not a distributed version control system; once you use it, a clone is no longer “as good” as the original, because it refers to a LFS server that is independent of your clone....also, actually cloning all the LFS content from git lab is both slow and occasionally broken in a way that requires you to restart the clone.:(

评论 #27137810 未加载

评论 #27139653 未加载

评论 #27138539 未加载

Game_Enderabout 4 years ago

The latest version of git has a very similar feature called “partial clones” to what the author describes for Mercurial. All the data is still in your history, no extra tools are needed, but you only fetch the blobs from the server for the commits you checkout. So just like LFS larger blobs not on master are effectively free, but you still grab all the blobs for your current commit.You need server side support, which GitHub and GitLab have, and then a special clone command:<pre><code> git clone --filter=blob:none </code></pre> Some background about the feature is here: <a href="https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone/" rel="nofollow">https://github.blog/2020-12-21-get-up-to-speed-with-partial-...</a>

评论 #27136535 未加载

Someone1234about 4 years ago

All three points are really just the same point repeated three times: That it isn't part of core/official GIT ("stop gap" until official, irreversible to later official solution, and adds complexity that an official version would lack due to extra/third party tooling).I'm frankly surprised GIT hasn't made LFS an official part by now. It fixes the problem, the problem is common and real, and GIT hasn't offered a better alternative.If LFS was made official it would solve this critique, since that is really the only critique here.

评论 #27136229 未加载

评论 #27138090 未加载

dwohnitmokabout 4 years ago

git-annex is an interesting alternative the HTTP-first nature of Git LFS and the one-way door bother you.You can remove it after the fact if you don't like it, it supports a ton of protocols, and it's distributed just like git is (you can share the files managed by git-annex among different repos or even among different non-git backends such as S3).The main issue that git-annex does not solve is that, like Git LFS, it's not a part of git proper and it shows in its occasionally clunky integration. By virtue of having more knobs and dials it also potentially has more to learn than Git LFS.

评论 #27139155 未加载

评论 #27138160 未加载

评论 #27136981 未加载

CreepGinabout 4 years ago

I've been using Git LFS with several large Unity projects in the past several years. Never really had any problems. It was always just "enable and forget" kind of thing.

评论 #27141677 未加载

评论 #27136263 未加载

goodcjw2about 4 years ago

A side topic: is there a concrete reason why github's LFS solution has to be so expensive?IIRC, it's $5 per 50GB per month? That's really a deal breaker to me and wondering whether people actually use LFS at volume will avoid LFS-over-GitHub.

评论 #27135610 未加载

评论 #27135837 未加载

评论 #27135694 未加载

hpcjoeabout 4 years ago

Just this past week, git lfs was throwing smudge errors for me. Not really sure what the issue was, I followed the recommendations to disable, pull, and re-enable. And got them again. So I disabled. And left it disabled.Not a solution.This said, the whole git-lfs bit feels like a (bad) afterthought the way its implemented. I'd love to see some significant reduction of complexity (you shouldn't need to do 'git lfs enable', it should be done automatically), and increases in resiliency (sharding into FEC'ed blocks with distributed checksums, etc.) so we don't have to deal with 'bad' files.I was a fan of mercurial before I switched to git ... it was IMO an easier/better system at the time (early 2010s). Not likely to switch now though.

评论 #27136308 未加载

acdhaabout 4 years ago

This is really overstating the cost of a one-time setup step. History rewriting is only necessary for preexisting projects and you can use things like GitLab’s push rules to ensure that it’s never necessary in the future.I get that a mercurial developer has different preferences but I don’t think that this is an especially effective form of advocacy.

dheeraabout 4 years ago

Okay, so I should avoid it. What is the alternative?I see so many git repos with READMEs saying download this huge pretrained weights file from {Dropbox link, Google drive link, Baidu link, ...} and I don't think that's a very good user experience compared to LFS.LFS itself sucks and should be transparent without having to install it, but it's slightly better than downloading stuff from Dropbox or Google Drive.

评论 #27135479 未加载

评论 #27135549 未加载

评论 #27135840 未加载

评论 #27136134 未加载

评论 #27138615 未加载

korijnabout 4 years ago

I'm honestly super content with LFS. Wrote our own little API server to hook it up to Azure Blob Storage, never have issues with it. I don't recognize the issues mentioned in the article at all. Our whole team relies on it for years, and it delivers. No problems. Keep up the great work, git-lfs maintainers! Much love.

sam_goodyabout 4 years ago

If you have a hundred images in git, and one cannot be downloaded for any reason, git smudge will not be able to run, and you won't be able to git pull at all.We had an image on AWS go bad, still not sure how. Our devs lost the ability to pull. Disabling LFS could not be done (because of rewriting history). "disable smudge" is not an official option, and none of the hacks work reliably. We finally excluded all images from smudge, and downloaded them with SFTP. Git status shows all the images as having changed, and we are downright unhappy...It would be happy to hear that I just don't know how to use LFS - but even if so, that means the docs are woefully not useful.I want to: 1) Tell LFS to get whatever files it could, and just throw a warning on issues. 2) If image is restored not using LFS, git should still know the file has not been modified (by comparing the checksum or whatever smudge would do).

madjam002about 4 years ago

As much as Git LFS is a bit of a pain, on recent projects I've resorted to committing my node_modules with Yarn 2 to Git using LFS and it works really well.Note that with Yarn 2 you're committing .tar.gz's of packages rather than the JS files themselves, so it lends itself quite well to LFS as there are a smaller number of large files.<a href="https://yarnpkg.com/features/zero-installs#how-do-you-reach-this-zero-install-state-youre-advocating-for" rel="nofollow">https://yarnpkg.com/features/zero-installs#how-do-you-reach-...</a> <a href="https://yarnpkg.com/features/zero-installs#is-it-different-from-just-checking-in-the-node_modules-folder" rel="nofollow">https://yarnpkg.com/features/zero-installs#is-it-different-f...</a>

评论 #27255766 未加载

评论 #27138450 未加载

TeeMassiveabout 4 years ago

The reason the author provides is in my opinion weak compared to both his alternatives.Sure, lfs contaminates a repository, so do large files, sensitive data removal, and references to packages and package managers that might become obsolete or non-existent in the future. The chance of your project compiling after 15 years, the age of git by the way, are very slim, and the chance that having a entirely compilable history being useful even slimmer.And I think the author's statement about setupping up lfs being hard is exaggerated. It's a handful of command lines that should be in the "welcome at our company" manual anyway.I've used lfs in the past and while it can be misused, as with all other tools, it does the job without too much headaches compared submodules and ignored tracked files.

breckabout 4 years ago

My practice for storing large files with Git is to include the metadata for the large file in a tiny file(s):1. Type information. Enough to synthesize a fake example.2. A simple preview. This can be a thumb or video snippet, for example.3. Checksum and URL of the big file.This way your code can work at compile/test time using the snippet or synthesized data, and you can fetch the actual big data at ship time.You can then also use the best version control tool for the job for the particular big files in question.

评论 #27135391 未加载

评论 #27135880 未加载

rsyncabout 4 years ago

FWIW, rsync.net is currently deploying LFS support such that operations like:<pre><code> ssh user@rsync.net git clone blah </code></pre> ... will properly handle LFS assets, etc.This is in response to several requests we have had for this feature...

KETpXDDzRabout 4 years ago

This opinion only lists issues, not solutions. Sure, they advertise mercurial, but migrating from git to mercurial is unrealistic for many cases.I'd title it: "Why Mercurial is better than git+LFS"

评论 #27138506 未加载

shabbyrobeabout 4 years ago

Here's another fun one: <a href="https://github.com/git-lfs/git-lfs/issues/2434" rel="nofollow">https://github.com/git-lfs/git-lfs/issues/2434</a>> Git on Windows client corrupts files > 4GbIt's apparently an upstream issue with Git on Windows, but if you depend on something, you inherit its issues.

robmsmtabout 4 years ago

Pushing Github past the 100mb limit has to be the most requested feature. Ridiculous that we have to use the fudge that is GitLFS.It just adds complication for a limit that shouldn't be there anyway.

评论 #27136054 未加载

wbillingsleyabout 4 years ago

The solution I've tended to use in classes (where there'll always be some student who hasn't installed LFS) is to store the large files in Artifactory, so they are pulled in at build-time in the same way as libraries.This seemed to me a sensible approach as Artifactory is a repository for binaries (usually, the compiled output of a project). It also seemed to me that the decisions on which versions to retain and when an update to a binary is expected or when that resource is now frozen and a replacement would be a new version is similar to the decision on when a build is a snapshot vs a release.

temacabout 4 years ago

If you just don't jump on random tech without good reasons, you already naturally apply this advice. Especially since once you really need it and also wants Git, there is not much alternative (as the author recognizes). In this context, just waiting for a potential "better support for handling of large files" of official Git makes little sense; plus I make the wild prediction that what will actually happen is that it's Git LFS that will (continue to) be improved and used by most people (and maybe even integrated in "official Git"?)

评论 #27137242 未加载

justaguy88about 4 years ago

> Git LFS is a Stop Gap SolutionBuild the real thing then..

评论 #27136333 未加载

评论 #27136353 未加载

ecnahc515about 4 years ago

You don't need to rewrite history unless you weren't using LFS or accidently committed large files to the repository. Nothing about LFS "requires" rewriting history.Not to mention, many users are paying for a service that provides LFS, and hosting an LFS service isn't crazy hard. It's a file server with a custom API, it's mostly doable using S3 as a backend. It's not like this is crazy complicated stuff.

评论 #27138472 未加载

SavantIdiotabout 4 years ago

Yep. All of this. I tried using Git LFS for a project and reverted back to links to cloud server for the large binary blobs and hashes on those blobs.

tpoacherabout 4 years ago

I keep hearing the mantra that "svn is better for large files than git" but never really understood why. To me a large file is a large file; if you make changes, worst case scenario you add the entire new file to the commit, best case you add some sort of binary diff. Does git do the former and svn the latter by any chance?

评论 #27141742 未加载

评论 #27139592 未加载

sjburtabout 4 years ago

The thing that always rubbed me the wrong way about git-lfs was that they cloned the git-scm.org site design. It's not part of git![1]<a href="https://git-lfs.github.com/" rel="nofollow">https://git-lfs.github.com/</a>[2]<a href="https://git-scm.com/" rel="nofollow">https://git-scm.com/</a>

alkonautabout 4 years ago

I’m using Git+LFS because my issue tracker, CI/CD etc natively speaks it. Not because it’s in any way superior or even on par with the large file handling of Mercurial (or even SVN to be honest).

slaymaker1907about 4 years ago

Is rewriting the history for large repos really that difficult besides coordinating with other contributors? My understanding is that it shouldn't be that much worse than "git gc --aggressive". Yes it is expensive, but it is the sort of thing you can schedule to do overnight or on a weekend.

评论 #27136137 未加载

评论 #27136113 未加载

评论 #27138524 未加载

pooya13about 4 years ago

Maybe I am missing the point. What is the alternative this article proposes then?... Also, Git is not central so how can you ever integrate large file support without a separate server?

thenoblesunfishabout 4 years ago

Good points, but it seems optimistic to assume that git will have good, native, large file support anytime soon. I‘ve been waiting quite a while for git submodules to improve..

chrisdbanksabout 4 years ago

The main argument here seems to be that we shouldn’t use LFS because Git will have large file support at some unspecified point in the future? Similarly you could argue that we shouldn't use a Covid vaccine because we'll develop a cure in the future..why vaccinate billions of people when we can just treat the 1% of people who get ill? Clearly that argument doesn't work. People need a solution now. Ironically we had to stop using mercurial because it didn't have an LFS alternative even though I prefer it. LFS is definitely not ideal but as a solution to a real world problem, it works. There may be issues around cloning repos and losing history in the future, but those are one off issues where you have to accept the pain, rather than living in pain every day.