Commits are snapshots not diffs (2020)

323 点作者 warpech大约 4 年前

32 条评论

whack大约 4 年前

From a storage perspective, describing commits as snapshots seems like a bad mental model. Suppose I have a directory that is 100MB in size. If I take a snapshot of it, my snapshot would be 100MB in size. If I take a 2nd snapshot of it tomorrow, my 2nd snapshot would also be 100MB in size. My total storage needs would now be 300MB.Whereas if I had used git, and created 2 additional commits, each making a change to a small text file, my total storage size would be barely larger than 100MB. Describing the commits as a diff, as opposed to a snapshot, leads to a better intuitive understanding of why this would be the case.Not to mention other features the article discussed, such as cherry-picking. What does it even mean to "cherry-pick a snapshot"? In comparison, cherry-picking a diff and applying it to your current state, is far more intuitive.And let's not forget commit messages. If a commit is a snapshot, I would expect the commit-message to be descriptive of the entire snapshot. Whereas if a commit is a diff, I would expect the commit message to be descriptive of the diff. Which is exactly how most people use commit messages.Obviously both "diffs" and "snapshots" are leaky abstractions. If you insist on using the "snapshot" abstraction, you will need to resolve all of the above points of confusion by adding more complexity to your abstraction. And if you prefer to use the "diff" abstraction, you will eventually need to explain that a commit is actually a combination of diffs, along with some other metadata like a pointer to a parent commit. As a teaching tool, you can make either abstraction work. But I find it far more intuitive and useful to think of commits as "diffs + some metadata".

评论 #26742511 未加载

评论 #26742331 未加载

评论 #26742295 未加载

评论 #26742350 未加载

评论 #26742334 未加载

评论 #26742223 未加载

评论 #26743584 未加载

评论 #26742242 未加载

评论 #26743057 未加载

评论 #26742839 未加载

评论 #26742947 未加载

评论 #26751274 未加载

评论 #26742617 未加载

评论 #26742278 未加载

评论 #26758894 未加载

iudqnolq大约 4 年前

Why have so many people written long thoughtful explanations about how the author is wrong to suggest snapshots are a better mental model, and that you think all abstractions are leaky, but you find diffs a better mental model?The entire article is literally about how commits are literally snapshots. I would say people didn't read TFA, but a lot of people are quoting lines from TFA and then going on to argue with/expand on them in a way that is directly contradicted by the next few lines.I think it's because most of the people here have spent years working with git, and are so deeply attached to their understanding that they didn't hear most of what the article said.(Some commentators have pointed out specific oversimplifications the author makes like glossing over pack files, I'm referring to the people who say a git blob is a diff when the entire point of TFA is that it isn't)

评论 #26743495 未加载

评论 #26743481 未加载

评论 #26743894 未加载

评论 #26746548 未加载

评论 #26748405 未加载

评论 #26743691 未加载

评论 #26743635 未加载

评论 #26746488 未加载

necovek大约 4 年前

> I believe that Git becomes understandable if we peel back the curtain and look at how Git stores your repository data.I agree, and like many, I have been saying that for years (nay, for more than a decade): and that's exactly the problem!You don't need to understand how an internal combustion engine works to drive a car... You don't need to understand how your graphics card renders stuff to develop a web page... You don't need to know how a brushless motor works to use a drill...There is a pattern there, and it's the one that makes sense.I've read up on the internals of git a dozen times by now. But I only occasionally need to do something weird that makes me go back to it, so I usually forget the relevant bits.The trouble is that I've used a distributed VCS that did not ask me to understand internals and it had a sane UI, and good model (like tree-like commit history, so a top-level commit log would only have merges, but you could dive deeper into individual commits if you so pleased). It wasn't perfect, but it's hard for me to accept that we have gone with a subpar solution where every "tutorial" starts with how you need to understand the internals! But you also need to memorise them, dammit!Just like I keep forgetting the Emacs rectangle editing shortcuts since I seldom use them, I'll keep forgetting the specifics of git internals that I might need once every 12 months.And it's not me, it's _you_, git!

评论 #26743919 未加载

评论 #26751328 未加载

评论 #26744847 未加载

评论 #26747778 未加载

samatman大约 4 年前

This blog post is the most compelling argument I've yet seen for pijul.Git should work the way we think it does! It's confusing that snapshots are being converted into a few different forms of change object, which can be reconciled with merges or rebases or applying patches.Pijul (and darcs before it) actually works on the basis of patches, pijul with a robust theory of patches. A cherry-pick just moves a patch from one history-of-patches (branch) to another history-of-patches. One can share just a patch, and applying it is guaranteed to be the same action everywhere if that's possible, which it often is.I'm patiently waiting for pijul to be mature enough that I can move everything over to using it, it's one of the more exciting projects in the last ten years.

评论 #26742480 未加载

评论 #26743198 未加载

评论 #26743204 未加载

评论 #26744727 未加载

评论 #26744409 未加载

评论 #26742346 未加载

评论 #26744044 未加载

评论 #26742439 未加载

fraculus大约 4 年前

I think merge commits are key to why "snapshots" are a better model than "diffs", and a stronger arguments would emphasize this more.Like people have said, the two models:- a commits is a snapshot plus a pointer to a parent commit- a commits is a pointer to a parent commit plus a diffare sort of isomorphic. And some commands in the git porcelain (like git cherry-pick, or git rebase) indeed make more sense if you think of commits as diffs.But this isomorphism becomes really strained when you have commits with more than one parent (or even zero parents). (And I think it's telling that those commands don't play very nicely with merge commits or the root commit.)If you really want to incorporate merge commits and the root commit, the alternatives become:- a commit is a snapshot, together with a list of zero or more pointers to parent commits- a commit is a list of M >= 0 pointers to parent commits, together with N > 0 diffs, subject to the invariant that:a) M = N, except that for exactly one commit, which we will call the "root" we are allowed to have M = 0 but N = 1b) starting from any commit, if you traverse a path back to the root commit by following parent pointers, and then sequentially (in reverse order) apply, for each commit in the path, the diff that corresponds to the parent pointer chosen, then the result of composing all those diffs is independent of the path chosen.And when you put it like that, it's pretty clear that the "diffs" model is really impractical, and that's why it's a lot better to think of commits as snapshots.

tsimionescu大约 4 年前

It's nice to understand this, but I fail to see it helping much in practice. Sure, you'll know why the thing you want to do is hard for git to do, but that wont make it much easier.And without knowing even further implementation details, it's a bad idea to rely on this knowledge. For example, the article states that committing a rename separately from edits in the renames files helps git track the renames. But that's not obviously true from the discussion above, because it's not obvious if, when computing a diff between two commits, git will follow the entire history or just apply the diff algorithm on the two commits.If it were the latter, then it doesn't really matter which order you commit things in, git would simply see commit1: fileA, fileB with contents cA and cB; commit2: fileD, fileE with contents cD and cE, and would do the quadratic work anyway, even if commit1.5 had fileE, fileD with contents cA, cB.

评论 #26742527 未加载

Tomminn大约 4 年前

It strikes me as bizarre that something as old and as important as git is to the general version control problem, doesn't have a beautiful, complete and helpful user interface.With the status quo how it is, I definitely love articles like this because every time I use git I get a kind of anxiety that fades only in proportion to the depth with which I understand actual git mechanics.The thing I find strange is that when I interact with databases that have beautiful, helpful user interfaces, I have almost none of this anxiety, and just kind of accept "black box that handles things", and move on with my life.I figure I must not be alone in this psychological niche. Which again, makes it bizarre that the problem of giving git a beautiful, complete, helpful front end has not been solved.

评论 #26744749 未加载

评论 #26743298 未加载

评论 #26743293 未加载

评论 #26743366 未加载

评论 #26743574 未加载

gpspake大约 4 年前

I think it's a tragedy that just about every developer uses git but most learn add, commit, branch, and merge and then just stop learning.A lot of people are scared of rebase and cherrypick and shut down or get defensive when you mention them or try to encourage their use.The result is, because developers only have a hammer, they brute force merge everything which results in grotesque conflict resolutions and commit histories and makes it hard to untangle problems.At a previous job, another developer was kind enough to walk through rebasing on the command line with vim. I was receptive and in about 10 minutes, I realized there was a significant set of standard features and day to day Git use I was previously just oblivious to.These days, the UI for rebasing and cherry picking in Gitkraken is state of the art and effortless and I use them every day without hesitation and without the fear that comes from not understanding or knowing what I'm doing. Still, I constantly struggle with coworkers merging feature branches from 100 commits ago in to new feature branches and brute force resolving conflicts across half a dozen files in one commit without any context.I see it all because I have visibility in to the history and branch relationships but I still get shrugs and eye rolls when I bring it up. I don't necessarily want to dictate nitpicky git usage but I have a hard time accepting when people just to refuse how rebasing and cherrypicking work when they're both core basic features of a tool we all use every day. Proper Git use is one of those hills I'll die on, though so I don't intend to shut up about it any time soon :)Edit: My practical advice: If you use git every day and you don't know how to rebase, reset, cherrypick, and stash from the command line, make it a goal. Then, once you're comfortable, learn how to do it in a visual tool like Gitkraken and make an effort to incorporate them in to your daily workflow. My guess is things will become a lot less tedious and confusing when things get messy.

评论 #26742513 未加载

评论 #26743015 未加载

评论 #26742556 未加载

评论 #26742329 未加载

评论 #26742054 未加载

评论 #26742616 未加载

评论 #26743034 未加载

评论 #26742505 未加载

评论 #26742531 未加载

评论 #26742146 未加载

评论 #26742095 未加载

评论 #26742113 未加载

评论 #26742936 未加载

评论 #26742591 未加载

评论 #26743463 未加载

评论 #26742708 未加载

评论 #26742041 未加载

评论 #26742101 未加载

评论 #26746653 未加载

评论 #26743100 未加载

评论 #26742336 未加载

评论 #26742099 未加载

评论 #26742039 未加载

评论 #26742204 未加载

divbzero大约 4 年前

This is a good overview of Git internals. If this stuff interests you, Chapter 10 of Pro Git offers similar descriptions of Git objects [1] and Git references [2], and then continues onto Git packfiles [3] which are not covered by OP.[1]: <a href="https://git-scm.com/book/en/v2/Git-Internals-Git-Objects" rel="nofollow">https://git-scm.com/book/en/v2/Git-Internals-Git-Objects</a>[2]: <a href="https://git-scm.com/book/en/v2/Git-Internals-Git-References" rel="nofollow">https://git-scm.com/book/en/v2/Git-Internals-Git-References</a>[3]: <a href="https://git-scm.com/book/en/v2/Git-Internals-Packfiles" rel="nofollow">https://git-scm.com/book/en/v2/Git-Internals-Packfiles</a>

aarchi大约 4 年前

Whereas in Pijul and Darcs, commits (called patches) are diffs, not snapshots. They are based on a sound theory of patches, which allows for operations not supported by Git like commuting, as long as the commits aren't interdependent. Plus, language-specific tools can extend the notion of dependency from line-based to semantic.

评论 #26742322 未加载

评论 #26742586 未加载

ashton314大约 4 年前

I really liked this video: the guy first walks you through how to build your own git-like utility with a handful of shell commands, then goes and walks through an actual git repo:<a href="https://youtu.be/qq_s2Hh--aQ" rel="nofollow">https://youtu.be/qq_s2Hh--aQ</a>Even the first 20 minutes was enough for me to have a substantially better understanding of how git works.

评论 #26748781 未加载

aequitas大约 4 年前

This article goes into a little too much detail imho. I have had great success explaining Git to coworkers using post-its, permanent marker and a flip board (no computer!) and going through the steps Git would take (abstractly, not exactly) when performing certain commands. All commits (and their relations) are written down on the board with the marker because they don't change (eg: rebasing just creates a new line of commits). The branches are written down on post-its and can move around (like this article explains, they are just pointers). You can use a whiteboard with non-permanent marking for the working directory and index if you want to go that deep.

davesque大约 4 年前

Neat overview of some of the core concepts in Git that often go unnoticed. Although I'll say that the fact that commits are technically not diffs doesn't seem to matter much in day to day use. Git does a decent job of abstracting that detail away to the point that you could just as well believe commits are diffs. Also, I want to say that technically I believe Git does use deltas to compress an object's history in the blob store. But the different blobs that comprise an object's history can be thought of somewhat as being separate. Git could just as easily not perform this internal, space-saving optimization and things would all work the same. The SHA hashes would be the same and based on the same input.

zwieback大约 4 年前

Cherry-pick is what messes up the commit-as-snapshot idea for me. If I see a small commit that I feel I can merge into my branch then that commit feels like a diff and I don't want to care about the rest of the stuff that commit snapshots. I guess that's a good thing.

评论 #26742187 未加载

siawyoung大约 4 年前

Commits are conceptually snapshots, and everything else Git does is just an optimization over the naive “keep all versions of all files ever” (imagine implementing a version of Git that is just zipping the entire folder). Diffs are isomorphic to commits and are generated as needed.I wrote about it (albeit imprecisely) here: <a href="https://siawyoung.com/git-intuition" rel="nofollow">https://siawyoung.com/git-intuition</a>

cryptonector大约 4 年前

Yes, exactly, this is a very good post on the nature of Git.> Branches are pointersYes. I would say they are named pointers. Commit hashes are weak, unnamed pointers.

maweki大约 4 年前

I think we're running into a naming issue here. It's usefult to think of a single commit in itself as a diff. The DAG is a useful model for an accumulation of changes. The question is, what changes and operations make up a node in the DAG (i.e. what code is in this branch, compared to that? What code do they have in common)?To answer this: take the node and follow along the predecessor until you get one (or more) roots. All commits along the root are contained in the commit at hand. That's the history.Adding changes is, I think, the most useful mental model, even if it is not the implementation.Now what the author is saying is: A commit is not only the diff, but also the whole tree/history that the diff is based on. And that is also true and then the commit (the adding plus the past) is a snapshot.Do we have a good naming convention for the single node in the tree with its changes, compared to the single node in the tree with its changes AND the references to the parents with all their changes etc.?

评论 #26742554 未加载

karmakaze大约 4 年前

This comes up from time to time and each time the comments debate the correctness/effectiveness of the title.The contents of the post does shed much light on how git operates and introduces a view that can help in navigating how to use git.Whether or not you want to think of a commit as a snapshot or a diff isn't material. It's best to think of it as a dual, since a diff on any base can create a snapshot, and a snapshot can create a diff from a snapshot.This very much mirrors the idea of a transaction log (of diffs) and a 'current' state. The current state is convenient, can benefit performance, but is not absolutely necessary. It doesn't even have to be the most recent, e.g. key frames in video compression. These are all just ideas, getting used to them and being able to move viewpoints between them is better than clinging to any one of them.

slumpt_大约 4 年前

Most developers think of commits as diffs and they can for all intents and purposes be thought of as such. It’s actually best for the understanding of how to practically get things done to think of them in this way.Odd semantic argument to make.

dmuth大约 4 年前

If anyone does want to get more into the internals of Git without playing with a production repo, I built a "playground" awhile ago which creates a simple Git repo of synthetic commits which you can then play around with:<a href="https://github.com/dmuth/git-rebase-i-playground" rel="nofollow">https://github.com/dmuth/git-rebase-i-playground</a>I know it says "rebase -i", which originally what I built it for (and what the exercises in the README are for), but you can really do whatever you want in it, and blow away/rebuild the repo with the included script.Enjoy!

grawprog大约 4 年前

>Commits are snapshots....commits are diffs....Neither model really encompasses commits for me.I prefer...Commits are a point in history I can return to after I inevitably fuck up or look back on so I can convince myself, yes I am indeed making progress.

mberning大约 4 年前

Am I the only person that doesn’t want to understand the inner workings of my VCS in lurid detail? I don’t have to know as much about any other developer tool in order to use it effectively.

评论 #26742567 未加载

评论 #26742598 未加载

ndand大约 4 年前

I used to think commits as snapshots, but it was confusing. Then I read "Git Internals".A commit contain the "whole" content of each file that we've commited. But since a commit has a pointer to a root commit, it also represents a working directory. Even though a commit contain "whole" files, the git internally stores only parts of the files as an optimization.When we diff two commits, we see the difference of the file contents in the corresponding working directories that the commits represent.

Tomminn大约 4 年前

Great article but:"one of my favorite analogies is to think of commits as having a wave/partical duality.."is a hilariously misguided object to build an analogy from. Theoretical physicist checking in, and my community has been searching for about 100 years for an analogy to explain that shit, so it's hilarious to see someone try to use it as a concrete object people can use as a touchstone to better understand a purely classical database.

评论 #26744661 未加载

hongsy大约 4 年前

i convinced myself that commits are snapshots by doing the following:<pre><code> # generate a 100M text file base64 -b 76 /dev/urandom | head -c 100000000 > file.txt git add . && git commit -m "1" # remove first line and add a new line to bottom tail -n +2 file.txt > tmp && rm file.txt && mv tmp file.txt && base64 -b 76 /dev/urandom | head -n 1 >> file.txt git add . && git commit -m "2" # repeat tail -n +2 file.txt > tmp && rm file.txt && mv tmp file.txt && base64 -b 76 /dev/urandom | head -n 1 >> file.txt git add . && git commit -m "3" ... du -sh . # a very big folder </code></pre> each of the commits are almost 80M big in the git folder. if you run `du -h .` you can see how git stores each object individually (80M big)

da39a3ee大约 4 年前

The fact that in the implementation “commit” means one thing, does not mean that people need to / should use “commit” to mean the sane thing, not that it is necessarily helpful to do so. In any case a commit is more than a snapshot because it has a parent, thus diff is a sensible mental model for the pair.

breck大约 4 年前

I think this is incorrect, no?Can’t all commits be turned into patches? Thus, aren’t commits isomorphic to diffs?

评论 #26742449 未加载

评论 #26742365 未加载

评论 #26742356 未加载

评论 #26742195 未加载

评论 #26742345 未加载

评论 #26742421 未加载

d_tr大约 4 年前

My first tutorial was the Pro Git book, and this fact was stressed well there so it stuck. Thinking of commits as snapshots also has the small advantage of making the first commit less special.

rhabarba大约 4 年前

Darcs users disagree.

ChrisMarshallNY大约 4 年前

That's a cool explanation.I'm a bit slow on the uptake, so I had to re-read a couple of sections, but it was helpful.

masukomi大约 4 年前

this... seems so very flawed and disprovable to me. Ignoring the obvious storage issues that have been discussed if commits were snapshots you could rebase and reorder them without ever worrying about conflicts. In reality you very much DO have to worry about conflicts because they are change instructions that transform a file from A->B->C if you try and reorder it as A->C->B you're going to have serious issues (assuming these all touch the same code) because C is a transformation from the B state to the C state. It blows up attempting to convert A->C because the instructions in that transformation describe going from B->C.> A commit is a snapshot in time. Each commit contains a pointer to its root tree,it so... _so_ very much isn't. It's not even a snapshot in time of a section of a file.It's a change instruction. No, it's not a "diff" but it also isn't a snapshot.

评论 #26743883 未加载

rektide大约 4 年前

simonedon chuckles in allagmatic.

32 条评论

whack大约 4 年前

评论 #26742511 未加载

评论 #26742331 未加载

评论 #26742295 未加载

评论 #26742350 未加载

评论 #26742334 未加载

评论 #26742223 未加载

评论 #26743584 未加载

评论 #26742242 未加载

评论 #26743057 未加载

评论 #26742839 未加载

评论 #26742947 未加载

评论 #26751274 未加载

评论 #26742617 未加载

评论 #26742278 未加载

评论 #26758894 未加载

iudqnolq大约 4 年前

评论 #26743495 未加载

评论 #26743481 未加载

评论 #26743894 未加载

评论 #26746548 未加载

评论 #26748405 未加载

评论 #26743691 未加载

评论 #26743635 未加载

评论 #26746488 未加载

necovek大约 4 年前

评论 #26743919 未加载

评论 #26751328 未加载

评论 #26744847 未加载

评论 #26747778 未加载

samatman大约 4 年前

评论 #26742480 未加载

评论 #26743198 未加载

评论 #26743204 未加载

评论 #26744727 未加载

评论 #26744409 未加载

评论 #26742346 未加载

评论 #26744044 未加载

评论 #26742439 未加载

fraculus大约 4 年前

tsimionescu大约 4 年前

评论 #26742527 未加载

Tomminn大约 4 年前

评论 #26744749 未加载

评论 #26743298 未加载

评论 #26743293 未加载

评论 #26743366 未加载

评论 #26743574 未加载

gpspake大约 4 年前

评论 #26742513 未加载

评论 #26743015 未加载

评论 #26742556 未加载

评论 #26742329 未加载

评论 #26742054 未加载

评论 #26742616 未加载

评论 #26743034 未加载

评论 #26742505 未加载

评论 #26742531 未加载

评论 #26742146 未加载

评论 #26742095 未加载

评论 #26742113 未加载

评论 #26742936 未加载

评论 #26742591 未加载

评论 #26743463 未加载

评论 #26742708 未加载

评论 #26742041 未加载

评论 #26742101 未加载

评论 #26746653 未加载

评论 #26743100 未加载

评论 #26742336 未加载

评论 #26742099 未加载

评论 #26742039 未加载

评论 #26742204 未加载

divbzero大约 4 年前

aarchi大约 4 年前

评论 #26742322 未加载

评论 #26742586 未加载

ashton314大约 4 年前

评论 #26748781 未加载

aequitas大约 4 年前

davesque大约 4 年前

zwieback大约 4 年前

评论 #26742187 未加载

siawyoung大约 4 年前

cryptonector大约 4 年前

Yes, exactly, this is a very good post on the nature of Git.> Branches are pointersYes. I would say they are named pointers. Commit hashes are weak, unnamed pointers.

maweki大约 4 年前

评论 #26742554 未加载

karmakaze大约 4 年前

slumpt_大约 4 年前

dmuth大约 4 年前

grawprog大约 4 年前

mberning大约 4 年前

Am I the only person that doesn’t want to understand the inner workings of my VCS in lurid detail? I don’t have to know as much about any other developer tool in order to use it effectively.

评论 #26742567 未加载

评论 #26742598 未加载

ndand大约 4 年前

Tomminn大约 4 年前

评论 #26744661 未加载

hongsy大约 4 年前

da39a3ee大约 4 年前

breck大约 4 年前

I think this is incorrect, no?Can’t all commits be turned into patches? Thus, aren’t commits isomorphic to diffs?

评论 #26742449 未加载

评论 #26742365 未加载

评论 #26742356 未加载

评论 #26742195 未加载

评论 #26742345 未加载

评论 #26742421 未加载

d_tr大约 4 年前

My first tutorial was the Pro Git book, and this fact was stressed well there so it stuck. Thinking of commits as snapshots also has the small advantage of making the first commit less special.

rhabarba大约 4 年前

Darcs users disagree.

ChrisMarshallNY大约 4 年前

That's a cool explanation.I'm a bit slow on the uptake, so I had to re-read a couple of sections, but it was helpful.

masukomi大约 4 年前

评论 #26743883 未加载

rektide大约 4 年前

simonedon chuckles in allagmatic.