Mercurial with Largefiles: Why it is not a solution for game development

50 pointsby nochover 7 years ago

11 comments

pjc50over 7 years ago

Important point here is that if:- you have lots of large files which are not amenable to diff and change frequently;- everyone is working within the same company, and usually on the same network;then a DVCS is unhelpful because you have to spend the disk cost for the repositories on every machine having a full copy of everything that's ever been checked in, regardless of whether they need it or not.Many games are tens of gigabytes when shipped. It's easy to imagine a process which accumulated hundreds of megabytes of asset changes every single day over a multi-year development process. Then you can imagine having to have expensive terabyte SSDs just to work on it with all your tools.I'm actually looking at this problem at work for possibly converting a large repository from svn which is a decade old and merely tens of gigabytes. Frankly svn handles it just fine so I'm going to defer the problem until I absolutely have to migrate.

评论 #15839558 未加载

评论 #15839947 未加载

vostok4over 7 years ago

I actually setup a small studio to work exclusively on a Mercurial largefiles-based VCS. I would say its one of the best solutions today on the market for self-hosted free users.Why? Perforce's integration to Unity is quite poor (they have a huge untapped market here), so you end up having to resolve a lot of things slowly in their tools. Git/HG are much faster in my experience at detecting changes and interacting sanely with them. Also the team could never learn why a file is checked out, why can't they commit, etc.We regularly clean out our largefiles cache on disk, so most of the time everyone just has the latest version of a given binary file on disk. The server of course has every revision, but I want that.And most important of all: with small tweaks we're able to use Phabricator for all of our task management/documentation workflow. Getting VCS hooks out of the box to let artists say "Adding typewriter model, please review T555" in their commit and having that task automatically get assigned to the reviewer is priceless.Most of my team doesn't have any idea what VCS is, but they've learned to use TortoiseHg (they call it "the turtle") and Phabricator to organize ourselves.While Mercurial isn't the only way to get there, its free, its fast, its simple, and it unlocks the power of Phabricator (so does Git+LFS I believe).So in my experience, I would say hg+largefiles is an excellent solution for game development.

zubspaceover 7 years ago

The only free option with unlimited storage seems to be Microsoft TeamServices with Git-LFS support. (1) (2)I have been using Bitbucket and Mercurial for all side projects of mine for quite a while. But when you start with game development, you will reach the repository limits quite fast. Textures, meshes, sound, music, concept art and other binary blobs eat alot of storage.Git-LFS is a bit a pain to setup, because you need to define before checking in, which extensions need to be stored as large file. And then there are check-in hooks, which sometimes did seem unreliable. Visual Studio git integration is also quite limited, but SourceTree did serve me well.It's quite liberating if you're able to check in code and assets together without taking into account the space needed.1) <a href="https://blogs.msdn.microsoft.com/devops/2015/10/01/announcing-git-lfs-on-all-vso-git-repos/" rel="nofollow">https://blogs.msdn.microsoft.com/devops/2015/10/01/announcin...</a>2) <a href="https://www.visualstudio.com/vso/" rel="nofollow">https://www.visualstudio.com/vso/</a>

luckydudeover 7 years ago

BitKeeper solved this years ago with one more centralized servers that hold the binaries. When you clone you only get the tip but you can retrieve any version you want when you need it. Scales to terabytes easily.Free and open source (apache v2) at <a href="http://bitkeeper.org" rel="nofollow">http://bitkeeper.org</a>

corysamaover 7 years ago

Blizzard gave a great GDC talk titled “The Data Building Pipeline of 'Overwatch'”. It covered their in-house, http-based asset distribution system.Take-aways: <a href="http://seanmiddleditch.com/my-gdc-17-talk-retrospective/" rel="nofollow">http://seanmiddleditch.com/my-gdc-17-talk-retrospective/</a><a href="https://twvideo01.ubm-us.net/o1/vault/gdc2017/Presentations/Clyde_David_TheDataBuilding.pdf" rel="nofollow">https://twvideo01.ubm-us.net/o1/vault/gdc2017/Presentations/...</a>

LeoJiWooover 7 years ago

Pretty interesting.Game Development seems to have such a different workflow than most of the stuff I'm familiar with like backend, web dev, and the occasional network programming.<a href="https://gamedev.stackexchange.com/questions/480/version-control-for-game-development-issues-and-solutions" rel="nofollow">https://gamedev.stackexchange.com/questions/480/version-cont...</a> This link recommends peforce as the standard .What do people in the industry actually use ?

评论 #15839184 未加载

评论 #15839181 未加载

b0rsukover 7 years ago

For 2D graphics, SVG (and more generally vector graphics) fit very well in git. For 3D, there are models, but there's still the problem of textures.Is there anything close to a text-based procedural texture format ? Textures could be procedurally generated at startup and transformed into bitmaps. I am aware of kkrieger, but is there anything other than proof of concept ? No one takes voxels seriously anymore...

kuschkuover 7 years ago

I’ve been checking in gigabyte large assets into my git repos with Git-LFS and self-hosted GitLab, and it’s been working fine for now.Are there any issues with this approach I should be aware of, considering that Hg with Largefiles seems to have some, too?

评论 #15839803 未加载

twicover 7 years ago

> Okay, so once we have started, the most important thing to know is: This does not in any way change how Hg handles files in memory. [...] Hg has to take the file and consume many times more memory during the commit then the size of the file, to try to figure out what the differences are.This isn't true. Largefiles aren't stored as deltas, but as complete blobs. Mercurial still reads them in their entirety, so it still uses a lot of memory, but it's not diffing.> The next problem is everyone collaborating on the project would have to take a huge Pull with the new large files, for every version of the large file they don't yet have [...] if you want to go back to a revision you haven’t pulled yet and the Server is not up you’re out of luck. That means you should get all the commits at some point anyway (because you want all the code versions at your side), so what’s the point?Largefiles's mechanism doesn't require that you download every version of every large file. That's a key part of its design. If you decide that as a matter of policy that you want to download them all anyway, then no, largefiles won't help much.> Well, they handle files by placing them outside the repo, and storing only the hash of the file in the repo itself (all bigfile hashes inside one file). This has an unfortunate effect that you won't be able to tell which exact bigfile/largefile has actually been modified when looking in the history – the only thing you'd see in the repo is the cumulative file that holds the hashes of all bigfiles as having a change.This isn't true either. Largefiles stores the hashes in separate files, and the history machinery is able to interpret the records properly.I've written a little script to demonstrate the structure of a largefiles repository:<a href="https://bitbucket.org/snippets/twic/7eeAxy" rel="nofollow">https://bitbucket.org/snippets/twic/7eeAxy</a>One thing that would be really useful that largefiles doesn't (that i know of) do would be to opt out of downloading some largefiles at all. If i'm checking out an old revision just to read some old code, i don't want to spend ages pulling largefiles that i'm not going to look at. You can do this with Facebook's remotefilelog extension, which lets you make shallow clones, which can omit the large files, but it's awkward:<a href="https://bitbucket.org/facebook/hg-experimental/src/default/remotefilelog/" rel="nofollow">https://bitbucket.org/facebook/hg-experimental/src/default/r...</a>

raugustinusover 7 years ago

Would a maven like repository possibly be a solution to this problem? Simply have dependencies to binary files/artifacts distributed by Nexus?

z3t4over 7 years ago

File system Snapshots (like in ZFS) would work nice with binary data.