Announcing GVFS: Git Virtual File System

805 pointsby janwhover 8 years ago

40 comments

greg7mdpover 8 years ago

This is similar to what Google uses internally. See <a href="http://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext" rel="nofollow">http://cacm.acm.org/magazines/2016/7/204032-why-google-store...</a>:"Most developers access Piper through a system called Clients in the Cloud, or CitC, which consists of a cloud-based storage backend and a Linux-only FUSE13 file system. Developers see their workspaces as directories in the file system, including their changes overlaid on top of the full Piper repository. CitC supports code browsing and normal Unix tools with no need to clone or sync state locally. Developers can browse and edit files anywhere across the Piper repository, and only modified files are stored in their workspace. This structure means CitC workspaces typically consume only a small amount of storage (an average workspace has fewer than 10 files) while presenting a seamless view of the entire Piper codebase to the developer."This is a very powerful model when dealing with large code bases, as it solves the issue of downloading all the code to each client. Kudos to Microsoft for open sourcing it, and under the MIT license no less.

评论 #13563662 未加载

评论 #13563761 未加载

评论 #13563336 未加载

chokoladover 8 years ago

There is a discussion thread on r/programming, where MS folks, who implemented this answer questions. A lot of questions like why not use multiple repos, why not git-lfs, why not git subtree, etc. are answered there<a href="https://www.reddit.com/r/programming/comments/5rtlk0/git_virtual_file_system_from_microsoft/" rel="nofollow">https://www.reddit.com/r/programming/comments/5rtlk0/git_vir...</a>

评论 #13562667 未加载

tambourine_manover 8 years ago

It's interesting how all the cool things seem to come from Microsoft these days.I still think we need something better than Git, though. It brought some very cool ideas and the inner workings are reasonably understandable, but the UI is atrociously complicated. And yes, dealing with large files is a very sore point.I'd love to see a second attempt at a distributed version control system.But I applaud MS's initiative. Git's got a lot of traction and mind share already and they'd probably be heavily criticized if they tried to invent its own thing, even if it was open sourced. Will take a long time to overcome its embrace, extend and extinguish history.

评论 #13561113 未加载

评论 #13561235 未加载

评论 #13560866 未加载

评论 #13560881 未加载

评论 #13561209 未加载

评论 #13564780 未加载

评论 #13561190 未加载

评论 #13562551 未加载

评论 #13562924 未加载

评论 #13561892 未加载

评论 #13560748 未加载

评论 #13564158 未加载

gvbover 8 years ago

Using git with large repos and large (binary blob) files has been a pain point for quite a while. There have been several attempts to solve the problem, none of which have really taken off. I think all the attempts have been (too) proprietary – without wide support, it doesn’t get adopted.I'll be watching this to see if Microsoft can break the logjam. By open sourcing the client and protocol, there is potential...Other attempts:* <a href="https://github.com/blog/1986-announcing-git-large-file-storage-lfs" rel="nofollow">https://github.com/blog/1986-announcing-git-large-file-stora...</a>* <a href="https://confluence.atlassian.com/bitbucketserver/git-large-file-storage-794364846.html" rel="nofollow">https://confluence.atlassian.com/bitbucketserver/git-large-f...</a>Article on GitHub’s implementation and issues (2015): <a href="https://medium.com/@megastep/github-s-large-file-storage-is-no-panacea-for-open-source-quite-the-opposite-12c0e16a9a91" rel="nofollow">https://medium.com/@megastep/github-s-large-file-storage-is-...</a>

评论 #13560138 未加载

评论 #13563709 未加载

评论 #13561438 未加载

kenttover 8 years ago

It's disappointing that all the comments are so negative. This is a great idea and solves a real problem for a lot of use cases.I remembering years ago Facebook says it had this problem. A lot of the comments were centered around that you could change your codebase to for what git can do. I'm glad there's another option now.

评论 #13560041 未加载

评论 #13560188 未加载

评论 #13560375 未加载

评论 #13563591 未加载

评论 #13560034 未加载

wyldfireover 8 years ago

I'm immediately reminded of MVFS and clearcase. Lots of companies still use clearcase, but IMO it's not the best tool for the job. git is superior in most dimensions. From what this article says, it's not quite the same as clearcase but there's certainly some hints of similarities.The biggest PITA with clearcase was keeping their lousy MVFS kernel module in sync with ever-advancing linux distros.I really liked Clearcase in 1999, it was an incredible advancement over other offerings then. MVFS was like "yeah! this is how I'd design a sweet revision control system. Transparent revision access according to a ranked set of rules, read-only files until checked out." But with global collaborators, multi-site was too complex IMO. And overall, clearcase was so different from other revision control systems that training people on it was a headache. Performance for dynamic views would suffer for elements whose vtrees took a lot of branches. Derived objects no longer made sense -- just too slow. Local disk was cheap now, it got bigger much faster than object files.> However, we also have a handful of teams with repos of unusual size! ... You can see that in action when you run “git checkout” and it takes up to 3 hours, or even a simple “git status” takes almost 10 minutes to run. That’s assuming you can get past the “git clone”, which takes 12+ hours.This seems like a way-out-there use case, but it's good to know that there's other solutions. I'd be tempted to partition the codebase by decades or something.

评论 #13565523 未加载

dewyattover 8 years ago

I think they could have picked a name that doesn't conflict with GNOME Virtual File System (GVfs).

评论 #13560029 未加载

评论 #13559960 未加载

评论 #13559903 未加载

评论 #13564357 未加载

daigoba66over 8 years ago

The article doesn't directly say it, but are they migrating the Windows source code repository to git? That seems like a big deal.I seem to recall that Microsoft has previously used a custom Perforce "fork" for their larger code bases (Windows, Server, Office, etc.).

评论 #13560421 未加载

评论 #13560019 未加载

Ericson2314over 8 years ago

If I understand this correctly, unlike git-annex and git lfs, this not about extending the git format with special large files, but changing the algorithm for the current data format.A custom filesystem is indeed the correct approach, and one that git itself should have probably supported long ago. In fact, there should really only be one "repo" per machine, name-spaced branches, and multiple mountpoints a la `git worktree`. In other words there should be a system daemon managing a single global object store.I wonder/hope IPFS can benefit from this implementation on Windows, where FUSE isn't an option.

评论 #13561010 未加载

hoovover 8 years ago

This is pretty big news. I know that when I was at Adobe, the only reason that Perforce was used for things like Acrobat, is because it was simply the only source control solution that could handle the size of the repo. Smaller projects were starting to use Git, but the big projects all stuck with Perforce.

kevincoxover 8 years ago

I love this approach. From working at Google I appreciate the virtual filesystem, it makes a lot of things a lot easier. However all my repos are large enough to fit on a single machine so I wish there was a mode where it was backed by a local repository, however the filesystem allows git to avoid tree scans.Basically most operations in git are O(modified files) however there are a few that are O(working tree size). For example checkout and status were mentioned by the article. However these operations can be made to O(modified) files if git doesn't have to scan the working tree for changes.So pretty much I would be all over this if:- It worked locally.- It worked on Linux.Maybe I'll see how it's implemented and see if I could add the features required. I'm really excited for the future of this project.

rethabover 8 years ago

Assuming that the repo was this big in the beginning, I wonder why the ever migrated to git (I'm assuming they did, because they can tell how long it takes to checkout). At least when somebody "tries" do the migration, wouldn't they realize that maybe git is not the right tool for them? Or did they actually migrate and then work with "git status" that take 10 minutes for some time until they realize they may need to change something?Also, it would have been interesting if the article mentioned whether they tried other approaches taken by facebook (mercurial afaik) or google.

评论 #13560253 未加载

评论 #13560136 未加载

imronover 8 years ago

> repos of unusual sizeSounds like they've almost solved the secrets of the fire swamp!

评论 #13565533 未加载

评论 #13561290 未加载

rbanffyover 8 years ago

Did they really need to make a name collision?<a href="https://en.wikipedia.org/wiki/GVfs" rel="nofollow">https://en.wikipedia.org/wiki/GVfs</a>

评论 #13561916 未加载

Navarrover 8 years ago

This sounds like a solid use case and a solid extension for that use case - but definitely not the end-all-be-all.For one, it's not really distributed if you're only downloading when you need that specific file.But that doesn't change the merrits of this at all, I think.

cafebabbeover 8 years ago

My sysadmin: "we won't switch to git because it can't handle binary files and our code base is too big"Our whole codebase is 800MB.

评论 #13561070 未加载

评论 #13562782 未加载

评论 #13560830 未加载

yakk0over 8 years ago

I appreciated the Princess Bride reference with "repos of unusual size"

评论 #13560123 未加载

0X1Aover 8 years ago

Just to make sure I have this right, this has to do with the _amount_ of files in their repo and not the _size_ of the files? So projects like git annex and LFS would not help the speed of the git repos?

评论 #13561026 未加载

OJFordover 8 years ago

> when you run “git checkout” and it takes up to 3 hours, or even a simple “git status” takes almost 10 minutes to run. That’s assuming you can get past the “git clone”, which takes 12+ hours.How on Earth can anybody work like that?I'd have thought you may as well ditch git at that point, since nobody's going to be using it as a tool, surely?<pre><code> git commit -m 'Add today\'s work - night all!' && git push; shutdown</code></pre>

评论 #13561343 未加载

评论 #13561261 未加载

评论 #13561053 未加载

mortdeusover 8 years ago

Or how about we start some compartmentalizing your codebase so that you can like. You know, organize your code and restore sanity to the known universe.I think when the powers that be said that whole thing about geniuses and clutter, they were specifically talking about their living spaces and not their work...

zwischenzugover 8 years ago

Does anyone know Microsoft's open source policy works internally? I'm thinking from a governance perspective, as I'm involved in a similar effort at $WORK.

scotty79over 8 years ago

I had a medium sized project in Ruby on Rails as git repo inside vm.It was slow to do 'git status' and other common commands. Restarting RoR app was also slo. I've put repo on RAM disk which made the whole experience at least few times faster.Since all was in vm that I rarely restarted I didn't have to recreate files on ram disk all that often. I was syncing changes with the persistent disk with rsync running periodically.

myrandomcommentover 8 years ago

"For example, the Windows codebase has over 3.5 million files and is over 270 GB in size."Okay, so this is a networking issue. Or is it a stick everything in the same branch issue?Whatever the reason here the issue is pure size vs. network pipe, pure and simple. Hum, when can I get a laptop with a 10GBaseT interface?One of the issue with the way they are doing this (only grab files when needed) is you cannot really work offline anymore.

amingilaniover 8 years ago

I'm no expert but if most single developers only use 5-10% of the codebase in their daily life, wouldn't it make to maybe break the project into multiple codebases of about 5% each and use a build pipeline that combines them together when needed?Although I could definitely be wrong but this sounds a lot like monolith vs microservices to me.

nojvekover 8 years ago

Microsoft is moving away from source depo to git it seems. I think its fantastic that a company like Microsoft is adapting git for its big king and queen projects such as office and windows. Also open sourcing the underlying magic tells a lot about the new Microsoft. They're really moving away from not-invented here syndrome

krishoogover 8 years ago

Does this article imply that Microsoft itself is also moving towards Git? Instead of e.g. using their own product like TFS?

评论 #13559998 未加载

评论 #13564262 未加载

评论 #13560334 未加载

评论 #13559981 未加载

评论 #13559983 未加载

b1gtunaover 8 years ago

MS has been doing really neat stuff lately. I never worked on a project that takes hours to clone. The largest repository I regularly clone is the Linux repo. It still takes only a few minutes. Yet I can see the GVFS being beneficial for me as I spend most of the time just reading the code (so no need to compile) on my laptop.

alkonautover 8 years ago

Could this also help a smaller repo but with long history, making the total repo size too large?The whole repo is needed for every developer - i.e it's not possible to do a sparse checkout but many gigs of old versions of small binaries I would prefer to keep only at the server until I need it (which is never).

acqqover 8 years ago

And for all those who still try to stick to anything older:<a href="https://github.com/Microsoft/gvfs" rel="nofollow">https://github.com/Microsoft/gvfs</a>"GVFS requires Windows 10 Anniversary Update or later."

srottover 8 years ago

I remember few years ago Git under Windows was very slow, is it still true?

评论 #13561044 未加载

dstaheliover 8 years ago

Check out the GVFS back story and details here: <a href="https://news.ycombinator.com/item?id=13563439" rel="nofollow">https://news.ycombinator.com/item?id=13563439</a>

pjmlpover 8 years ago

Quite nice use of C# and C++/CX for a virtual system implementation.

评论 #13564432 未加载

lolikoisuruover 8 years ago

Is it really that fucking hard to check if your package name is unique?Here is another virtual filesystem with the exact same name: <a href="https://wiki.gnome.org/Projects/gvfs" rel="nofollow">https://wiki.gnome.org/Projects/gvfs</a>Debian package for it: <a href="https://packages.debian.org/jessie/gvfs" rel="nofollow">https://packages.debian.org/jessie/gvfs</a>

mfontaniover 8 years ago

So... what happens when one runs "git grep foo" on it?

评论 #13563715 未加载

igtztorreroover 8 years ago

Anybody knows what does Linus think about it ?

cikeyover 8 years ago

Can we use this together with git LFS?

ianopolousover 8 years ago

Couldn't they use git over IPFS?

评论 #13563754 未加载

评论 #13560553 未加载

评论 #13560429 未加载

zahreeleyover 8 years ago

Don't believe in modular development with smaller repos?

评论 #13562687 未加载

testUser69over 8 years ago

Why is that so hard to believe? America is run by Donald Trump.The problems with these companies is that developers aren't making technical decisions, it's executives who know nothing about computer science. That's why Windows 10 is such a mess with spyware and adware.Now they have some FOSS advocate who doesn't really know anything about software or VCS but saw that an internal problem they were trying to solve was making their code base work with git. So he decided it would be really cool for Microsofts image to develop an open source extension of git, instead of actually solving the underlying problems (because he didn't recognize them). Now he's probably got a promotion at Microsoft for "fixing" their problem with git.

评论 #13575317 未加载

评论 #13560491 未加载

ksecover 8 years ago

Interesting M$ is moving to Git and the rest of the world is pretty much Github & alternatives while Facebook and Google are going with Mercurial. I actually liked Mercurial apart from its name being little hard to pronounce, but it doesn't seems to get used anywhere.So are the DVCS converging to Git and Git only?

评论 #13560310 未加载

评论 #13560379 未加载