TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Announcing GVFS: Git Virtual File System

805 pointsby janwhover 8 years ago

40 comments

greg7mdpover 8 years ago
This is similar to what Google uses internally. See <a href="http:&#x2F;&#x2F;cacm.acm.org&#x2F;magazines&#x2F;2016&#x2F;7&#x2F;204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository&#x2F;fulltext" rel="nofollow">http:&#x2F;&#x2F;cacm.acm.org&#x2F;magazines&#x2F;2016&#x2F;7&#x2F;204032-why-google-store...</a>:<p>&quot;Most developers access Piper through a system called Clients in the Cloud, or CitC, which consists of a cloud-based storage backend and a Linux-only FUSE13 file system. Developers see their workspaces as directories in the file system, including their changes overlaid on top of the full Piper repository. CitC supports code browsing and normal Unix tools with no need to clone or sync state locally. Developers can browse and edit files anywhere across the Piper repository, and only modified files are stored in their workspace. This structure means CitC workspaces typically consume only a small amount of storage (an average workspace has fewer than 10 files) while presenting a seamless view of the entire Piper codebase to the developer.&quot;<p>This is a very powerful model when dealing with large code bases, as it solves the issue of downloading all the code to each client. Kudos to Microsoft for open sourcing it, and under the MIT license no less.
评论 #13563662 未加载
评论 #13563761 未加载
评论 #13563336 未加载
chokoladover 8 years ago
There is a discussion thread on r&#x2F;programming, where MS folks, who implemented this answer questions. A lot of questions like why not use multiple repos, why not git-lfs, why not git subtree, etc. are answered there<p><a href="https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;programming&#x2F;comments&#x2F;5rtlk0&#x2F;git_virtual_file_system_from_microsoft&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;programming&#x2F;comments&#x2F;5rtlk0&#x2F;git_vir...</a>
评论 #13562667 未加载
tambourine_manover 8 years ago
It&#x27;s interesting how all the cool things seem to come from Microsoft these days.<p>I still think we need something better than Git, though. It brought some very cool ideas and the inner workings are reasonably understandable, but the UI is atrociously complicated. And yes, dealing with large files is a very sore point.<p>I&#x27;d love to see a second attempt at a distributed version control system.<p>But I applaud MS&#x27;s initiative. Git&#x27;s got a lot of traction and mind share already and they&#x27;d probably be heavily criticized if they tried to invent its own thing, even if it was open sourced. Will take a long time to overcome its embrace, extend and extinguish history.
评论 #13561113 未加载
评论 #13561235 未加载
评论 #13560866 未加载
评论 #13560881 未加载
评论 #13561209 未加载
评论 #13564780 未加载
评论 #13561190 未加载
评论 #13562551 未加载
评论 #13562924 未加载
评论 #13561892 未加载
评论 #13560748 未加载
评论 #13564158 未加载
gvbover 8 years ago
Using git with large repos and large (binary blob) files has been a pain point for quite a while. There have been several attempts to solve the problem, none of which have really taken off. I think all the attempts have been (too) proprietary – without wide support, it doesn’t get adopted.<p>I&#x27;ll be watching this to see if Microsoft can break the logjam. By open sourcing the client and protocol, there is potential...<p>Other attempts:<p>* <a href="https:&#x2F;&#x2F;github.com&#x2F;blog&#x2F;1986-announcing-git-large-file-storage-lfs" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;blog&#x2F;1986-announcing-git-large-file-stora...</a><p>* <a href="https:&#x2F;&#x2F;confluence.atlassian.com&#x2F;bitbucketserver&#x2F;git-large-file-storage-794364846.html" rel="nofollow">https:&#x2F;&#x2F;confluence.atlassian.com&#x2F;bitbucketserver&#x2F;git-large-f...</a><p>Article on GitHub’s implementation and issues (2015): <a href="https:&#x2F;&#x2F;medium.com&#x2F;@megastep&#x2F;github-s-large-file-storage-is-no-panacea-for-open-source-quite-the-opposite-12c0e16a9a91" rel="nofollow">https:&#x2F;&#x2F;medium.com&#x2F;@megastep&#x2F;github-s-large-file-storage-is-...</a>
评论 #13560138 未加载
评论 #13563709 未加载
评论 #13561438 未加载
kenttover 8 years ago
It&#x27;s disappointing that all the comments are so negative. This is a great idea and solves a real problem for a lot of use cases.<p>I remembering years ago Facebook says it had this problem. A lot of the comments were centered around that you could change your codebase to for what git can do. I&#x27;m glad there&#x27;s another option now.
评论 #13560041 未加载
评论 #13560188 未加载
评论 #13560375 未加载
评论 #13563591 未加载
评论 #13560034 未加载
wyldfireover 8 years ago
I&#x27;m immediately reminded of MVFS and clearcase. Lots of companies still use clearcase, but IMO it&#x27;s not the best tool for the job. git is superior in most dimensions. From what this article says, it&#x27;s not quite the same as clearcase but there&#x27;s certainly some hints of similarities.<p>The biggest PITA with clearcase was keeping their lousy MVFS kernel module in sync with ever-advancing linux distros.<p>I really liked Clearcase in 1999, it was an incredible advancement over other offerings then. MVFS was like &quot;yeah! this is how I&#x27;d design a sweet revision control system. Transparent revision access according to a ranked set of rules, read-only files until checked out.&quot; But with global collaborators, multi-site was too complex IMO. And overall, clearcase was so different from other revision control systems that training people on it was a headache. Performance for dynamic views would suffer for elements whose vtrees took a lot of branches. Derived objects no longer made sense -- just too slow. Local disk was cheap now, it got bigger much faster than object files.<p>&gt; However, we also have a handful of teams with repos of unusual size! ... You can see that in action when you run “git checkout” and it takes up to 3 hours, or even a simple “git status” takes almost 10 minutes to run. That’s assuming you can get past the “git clone”, which takes 12+ hours.<p>This seems like a way-out-there use case, but it&#x27;s good to know that there&#x27;s other solutions. I&#x27;d be tempted to partition the codebase by decades or something.
评论 #13565523 未加载
dewyattover 8 years ago
I think they could have picked a name that doesn&#x27;t conflict with GNOME Virtual File System (GVfs).
评论 #13560029 未加载
评论 #13559960 未加载
评论 #13559903 未加载
评论 #13564357 未加载
daigoba66over 8 years ago
The article doesn&#x27;t directly say it, but are they migrating the Windows source code repository to git? That seems like a big deal.<p>I seem to recall that Microsoft has previously used a custom Perforce &quot;fork&quot; for their larger code bases (Windows, Server, Office, etc.).
评论 #13560421 未加载
评论 #13560019 未加载
Ericson2314over 8 years ago
If I understand this correctly, unlike git-annex and git lfs, this not about extending the git format with special large files, but changing the algorithm for the current data format.<p>A custom filesystem is indeed the correct approach, and one that git itself should have probably supported long ago. In fact, there should really only be one &quot;repo&quot; per machine, name-spaced branches, and multiple mountpoints a la `git worktree`. In other words there should be a system daemon managing a single global object store.<p>I wonder&#x2F;hope IPFS can benefit from this implementation on Windows, where FUSE isn&#x27;t an option.
评论 #13561010 未加载
hoovover 8 years ago
This is pretty big news. I know that when I was at Adobe, the only reason that Perforce was used for things like Acrobat, is because it was simply the only source control solution that could handle the size of the repo. Smaller projects were starting to use Git, but the big projects all stuck with Perforce.
kevincoxover 8 years ago
I love this approach. From working at Google I appreciate the virtual filesystem, it makes a lot of things a lot easier. However all my repos are large enough to fit on a single machine so I wish there was a mode where it was backed by a local repository, however the filesystem allows git to avoid tree scans.<p>Basically most operations in git are O(modified files) however there are a few that are O(working tree size). For example checkout and status were mentioned by the article. However these operations can be made to O(modified) files if git doesn&#x27;t have to scan the working tree for changes.<p>So pretty much I would be all over this if:<p>- It worked locally.<p>- It worked on Linux.<p>Maybe I&#x27;ll see how it&#x27;s implemented and see if I could add the features required. I&#x27;m really excited for the future of this project.
rethabover 8 years ago
Assuming that the repo was this big in the beginning, I wonder why the ever migrated to git (I&#x27;m assuming they did, because they can tell how long it takes to checkout). At least when somebody &quot;tries&quot; do the migration, wouldn&#x27;t they realize that maybe git is not the right tool for them? Or did they actually migrate and then work with &quot;git status&quot; that take 10 minutes for some time until they realize they may need to change something?<p>Also, it would have been interesting if the article mentioned whether they tried other approaches taken by facebook (mercurial afaik) or google.
评论 #13560253 未加载
评论 #13560136 未加载
imronover 8 years ago
&gt; repos of unusual size<p>Sounds like they&#x27;ve almost solved the secrets of the fire swamp!
评论 #13565533 未加载
评论 #13561290 未加载
rbanffyover 8 years ago
Did they really need to make a name collision?<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;GVfs" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;GVfs</a>
评论 #13561916 未加载
Navarrover 8 years ago
This sounds like a solid use case and a solid extension for that use case - but definitely not the end-all-be-all.<p>For one, it&#x27;s not really distributed if you&#x27;re only downloading when you need that specific file.<p>But that doesn&#x27;t change the merrits of this at all, I think.
cafebabbeover 8 years ago
My sysadmin: &quot;we won&#x27;t switch to git because it can&#x27;t handle binary files and our code base is too big&quot;<p>Our whole codebase is 800MB.
评论 #13561070 未加载
评论 #13562782 未加载
评论 #13560830 未加载
yakk0over 8 years ago
I appreciated the Princess Bride reference with &quot;repos of unusual size&quot;
评论 #13560123 未加载
0X1Aover 8 years ago
Just to make sure I have this right, this has to do with the _amount_ of files in their repo and not the _size_ of the files? So projects like git annex and LFS would not help the speed of the git repos?
评论 #13561026 未加载
OJFordover 8 years ago
&gt; <i>when you run “git checkout” and it takes up to 3 hours, or even a simple “git status” takes almost 10 minutes to run. That’s assuming you can get past the “git clone”, which takes 12+ hours.</i><p>How on Earth can anybody work like that?<p>I&#x27;d have thought you may as well ditch git at that point, since nobody&#x27;s going to be <i>using</i> it as a tool, surely?<p><pre><code> git commit -m &#x27;Add today\&#x27;s work - night all!&#x27; &amp;&amp; git push; shutdown</code></pre>
评论 #13561343 未加载
评论 #13561261 未加载
评论 #13561053 未加载
mortdeusover 8 years ago
Or how about we start some compartmentalizing your codebase so that you can like. You know, organize your code and restore sanity to the known universe.<p>I think when the powers that be said that whole thing about geniuses and clutter, they were specifically talking about their living spaces and not their work...
zwischenzugover 8 years ago
Does anyone know Microsoft&#x27;s open source policy works internally? I&#x27;m thinking from a governance perspective, as I&#x27;m involved in a similar effort at $WORK.
scotty79over 8 years ago
I had a medium sized project in Ruby on Rails as git repo inside vm.<p>It was slow to do &#x27;git status&#x27; and other common commands. Restarting RoR app was also slo. I&#x27;ve put repo on RAM disk which made the whole experience at least few times faster.<p>Since all was in vm that I rarely restarted I didn&#x27;t have to recreate files on ram disk all that often. I was syncing changes with the persistent disk with rsync running periodically.
myrandomcommentover 8 years ago
&quot;For example, the Windows codebase has over 3.5 million files and is over 270 GB in size.&quot;<p>Okay, so this is a networking issue. Or is it a stick everything in the same branch issue?<p>Whatever the reason here the issue is pure size vs. network pipe, pure and simple. Hum, when can I get a laptop with a 10GBaseT interface?<p>One of the issue with the way they are doing this (only grab files when needed) is you cannot really work offline anymore.
amingilaniover 8 years ago
I&#x27;m no expert but if most single developers only use 5-10% of the codebase in their daily life, wouldn&#x27;t it make to maybe break the project into multiple codebases of about 5% each and use a build pipeline that combines them together when needed?<p>Although I could definitely be wrong but this sounds a lot like monolith vs microservices to me.
nojvekover 8 years ago
Microsoft is moving away from source depo to git it seems. I think its fantastic that a company like Microsoft is adapting git for its big king and queen projects such as office and windows. Also open sourcing the underlying magic tells a lot about the new Microsoft. They&#x27;re really moving away from not-invented here syndrome
krishoogover 8 years ago
Does this article imply that Microsoft itself is also moving towards Git? Instead of e.g. using their own product like TFS?
评论 #13559998 未加载
评论 #13564262 未加载
评论 #13560334 未加载
评论 #13559981 未加载
评论 #13559983 未加载
b1gtunaover 8 years ago
MS has been doing really neat stuff lately. I never worked on a project that takes hours to clone. The largest repository I regularly clone is the Linux repo. It still takes only a few minutes. Yet I can see the GVFS being beneficial for me as I spend most of the time just reading the code (so no need to compile) on my laptop.
alkonautover 8 years ago
Could this also help a smaller repo but with long history, making the total repo size too large?<p>The whole repo is needed for every developer - i.e it&#x27;s not possible to do a sparse checkout but many gigs of old versions of small binaries I would prefer to keep only at the server until I need it (which is never).
acqqover 8 years ago
And for all those who still try to stick to anything older:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;Microsoft&#x2F;gvfs" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;Microsoft&#x2F;gvfs</a><p>&quot;GVFS requires Windows 10 Anniversary Update or later.&quot;
srottover 8 years ago
I remember few years ago Git under Windows was very slow, is it still true?
评论 #13561044 未加载
dstaheliover 8 years ago
Check out the GVFS back story and details here: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=13563439" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=13563439</a>
pjmlpover 8 years ago
Quite nice use of C# and C++&#x2F;CX for a virtual system implementation.
评论 #13564432 未加载
lolikoisuruover 8 years ago
Is it really that fucking hard to check if your package name is unique?<p>Here is another virtual filesystem with the exact same name: <a href="https:&#x2F;&#x2F;wiki.gnome.org&#x2F;Projects&#x2F;gvfs" rel="nofollow">https:&#x2F;&#x2F;wiki.gnome.org&#x2F;Projects&#x2F;gvfs</a><p>Debian package for it: <a href="https:&#x2F;&#x2F;packages.debian.org&#x2F;jessie&#x2F;gvfs" rel="nofollow">https:&#x2F;&#x2F;packages.debian.org&#x2F;jessie&#x2F;gvfs</a>
mfontaniover 8 years ago
So... what happens when one runs &quot;git grep foo&quot; on it?
评论 #13563715 未加载
igtztorreroover 8 years ago
Anybody knows what does Linus think about it ?
cikeyover 8 years ago
Can we use this together with git LFS?
ianopolousover 8 years ago
Couldn&#x27;t they use git over IPFS?
评论 #13563754 未加载
评论 #13560553 未加载
评论 #13560429 未加载
zahreeleyover 8 years ago
Don&#x27;t believe in modular development with smaller repos?
评论 #13562687 未加载
testUser69over 8 years ago
Why is that so hard to believe? America is run by Donald Trump.<p>The problems with these companies is that developers aren&#x27;t making technical decisions, it&#x27;s executives who know nothing about computer science. That&#x27;s why Windows 10 is such a mess with spyware and adware.<p>Now they have some FOSS advocate who doesn&#x27;t really know anything about software or VCS but saw that an internal problem they were trying to solve was making their code base work with git. So he decided it would be really cool for Microsofts image to develop an open source extension of git, instead of actually solving the underlying problems (because he didn&#x27;t recognize them). Now he&#x27;s probably got a promotion at Microsoft for &quot;fixing&quot; their problem with git.
评论 #13575317 未加载
评论 #13560491 未加载
ksecover 8 years ago
Interesting M$ is moving to Git and the rest of the world is pretty much Github &amp; alternatives while Facebook and Google are going with Mercurial. I actually liked Mercurial apart from its name being little hard to pronounce, but it doesn&#x27;t seems to get used anywhere.<p>So are the DVCS converging to Git and Git only?
评论 #13560310 未加载
评论 #13560379 未加载