Git is Inconsistent

157 点作者 sheffield大约 14 年前

16 条评论

Here's the short version:<pre><code> I am the original sentence. </code></pre> Alice commits a change in her repo:<pre><code> I am a different sentence. </code></pre> Bob commits a change in his repo:<pre><code> I am the original sentence. I am the original sentence. </code></pre> Now Alice pulls Bob's commit. What should happen?The argument is that in certain cases it can be known which of Bob's 2 sentences is the original and which is the copy (due to context provided by an intermediate commit) and that therefore a correct VCS will figure out that the original is on the bottom:<pre><code> I am the original sentence. I am a different sentence. </code></pre> But git doesn't look at history so will always produce:<pre><code> I am a different sentence. I am the original sentence. </code></pre> I don't care. If you force me to care then I actually prefer git's behavior. Git is consistent: a merge will always produce the same result for the same files. I don't want history to matter.The problem is not actually solvable. So git doesn't try to solve it. I think that's why it's called "the stupid content tracker."EDIT: Is there anything worse than "smart" features that only work, say, 80% of the time? The closer they get to 100% the worse it gets, because then you start relying on them and they break right when you stop paying attention.

评论 #2456191 未加载

评论 #2456847 未加载

评论 #2456332 未加载

评论 #2456305 未加载

评论 #2456605 未加载

评论 #2456782 未加载

tytso大约 14 年前

I've contributed a tiny amount to git (the high-level "git mergetool") so I can't speak for all of the git developers, but I've spent enough time hanging around for them to say that the general feeling they have is that git's algorithm which is "3-way merge, and then look at the intervening commits to fix any merge conflicts" is good enough.You can always try to spend more time trying to use more data, or deducing more semantic information, but past a certain point, it's what Linus Torvalds has called "mental masterburation".For example, you could try to create an algorithm that notices that in branch A a method function has been renamed, and in branch B, a call to that method function was introduced, and when you merge A and B, it will also automatically rename the method function invocation that was added in branch B. That might be closer to "doing the right thing". But does it matter? In practice, a quick trial compile check of the sources before you finalize the merge will solve the problem, and that way you don't have to start adding language-specific semantic parsers for C++, Java, etc. So just because something could be done to make merges smarter, doesn't mean that it should be done.It's a similar case going on here. Yes, if you prepend and postpend identical text, a 3-way merge can get confused. And since git doesn't invoke its extra resolution magic unless the merge fails, the "wrong" result, at least according to the darcs folks, can happen. But the reason why git has chosen this result is that Linus wanted merges to be fast. If you have to examine every single intermediate node to figure out what might be going on, merges would become much slower, since in real life there will be many, many more intermediate nodes that darcs would have to analyze. Given that this situation doesn't happen much in real life (not withstanding SCM geeks who spend all day dreaming up artificial merge scenarios), it's considered a worthwhile tradeoff.

评论 #2456028 未加载

评论 #2456034 未加载

评论 #2460371 未加载

yuvadam大约 14 年前

To quote Johannes Schindelin [1] :<pre><code> This all just proves again that there can be no perfect merge strategy; you'll always have to verify that the right thing was done. </code></pre> [1] - <a href="http://thread.gmane.org/gmane.comp.version-control.git/105748" rel="nofollow">http://thread.gmane.org/gmane.comp.version-control.git/10574...</a>

评论 #2456754 未加载

评论 #2455988 未加载

saalweachter大约 14 年前

Is there any reason to assume that merges should be associative? Hell, of the four normed division algebras, only three are associative; just because you can say "operations on octonions should be associative" doesn't mean that you can necessarily create a system of octonions where it's true.For what it's worth, "git pull --rebase" does enforce a specific order to changes (local changes always happen after remote changes) which will produce the same results regardless of when user Bob pulls user Charlie's changes: regardless of whether Bob pulls change c1 after commiting both b1 and b2 or after commiting b1 and before commiting b2, the final commit order will always be "a, c1, b1, b2".Of course, if Bob commits and pushes b1 before Charlie commits and pushes c1, the final commit order will be "a, b1, c1, b2", but how could it ever be otherwise?

评论 #2456613 未加载

KirinDave大约 14 年前

Not to be grumpy about it, but git's shortcomings are well-known and most people don't run into them on a daily basis.Some DVCS, like Darcs, might behave better, but they all seem almost comically slow even for medium-sized repos. If I have to sacrifice git's speed for certain types of correctness (that don't trouble me on a daily basis), I will be VERY reluctant to make that choice.

nevinera大约 14 年前

>There are still some people who still think nothing is wrong with git; that it is okay for the result of a merge to depend on how things are merged rather than on only what is merged; that is it okay for two git repositories that pull the same patches to have different contents depending on how they pulled those patches. I don’t know what to say to those people. Such a view seems like insanity to me.Git merges files, not file-histories. Git's behavior is simple, clear, and easy to understand.I can see why you might expect merges to be transitive like this (it would be an elegant property, if it were true), but why does it matter to you? In what way do you use merges that could rely on this expectation?

评论 #2456196 未加载

评论 #2456217 未加载

ob大约 14 年前

There are two things most commenters in this thread have missed:1) The article talks about auto-merges. If the code is "too close" by some definition of close, you get a conflict that needs to be manually merged. The article does NOT talk about manual merges.2) The article is titled "Git is Inconsistent", it doesn't claim Git is WRONG, it claims it is INCONSISTENT. It does different things depending on how you merge and when.I think consistency in a DVCS is a desirable goal. It should not matter whether you pull A then B, or pull B then A, or whether given a series of commits, you pull after each one, or just once at the end. The end result should be the same.That it is a rare occurrence only makes it worse. You will mostly trust the auto-merge algorithm until you hit the corner case and it will be very expensive in terms of time/money to fix the mistake.Git's brilliance/stupidity is precisely that it only tracks contents, so although it could get the right answer it makes it very expensive to do it.

评论 #2457051 未加载

Groxx大约 14 年前

Super-simple-summary:Git doesn't use history to determine merge behavior (edit: in this circumstance). Git behaves like applying patches. Darcs uses the history to make "intelligent" patches.It's a matter of taste. If you look at Git as having a history, therefore should use the history, yes, it's incorrect. But if you look at it as a patch manager, it's behaving as it should, and Darcs is frighteningly unpredictable - the numbers on the patch might not match the numbers of the lines it modifies.I side with Git on this. I can generate patches from Git that will work anywhere, and use them 100% identically within Git as manually applying them. The same cannot be said for Darcs.

评论 #2456884 未加载

__david__大约 14 年前

After reading this it strikes me that git is imperative--it stores files as they were when you checked them in and merges what you tell it in the order you tell it.Darcs, however, is more declarative--it stores patches. And not just patches but patches with dependencies. This set of patches describes how the current state of the repository is constructed. So when you merge you're really just adding new patches to the repo and it knows exactly what to do to make it work.The interesting thing is that git has all the information there... It could go through the relevant history, diff everything and put the resulting patches in a darcs-like data structure and then commute patches with darcs' patch theory.But in the end I'm not sure I'm ready to call darcs' style right and git's wrong. Both of them have a fairly easy to understand object models and they both have merges that act in accordance to the internal philosophies of those object models.

etherealG大约 14 年前

I agree with you completely, but want to know how this can be fixed in git? Surely there has to be something about the merging algorithm that can be changed to fix this, and if that's the case we can just patch it and move on.What is the specific problem with the algorithm that causes this?

评论 #2455830 未加载

daviddavis大约 14 年前

I wonder how mercurial compares in this aspect. Also, I'll keep using git because for sure, it's a helluva lot better than SVN or CVS (which my company was using when I got there).

评论 #2455874 未加载

jojo1大约 14 年前

Hmmm, nobody seems to care: <a href="http://article.gmane.org/gmane.comp.version-control.git/105748/" rel="nofollow">http://article.gmane.org/gmane.comp.version-control.git/1057...</a>

tzs大约 14 年前

The article mentions that some systems do have the associativity property--that is, extra rungs in the merge ladder do not affect the result.I can see how that can be achieved in the case of fully automatic merges. When merging B2 into C1+B1, you'd effectively un-merge C1+B1, merge B1 and B2, and then merge C1 and B1+B2.But how would that work if C1+B1 had a conflict that had to be manually resolved? Assuming merging B1+B2 into C1 has the same problem (a fair assumption) will I have to do the same manual fixes again?Or are they smart enough to look at the failed automatic C1+B1 merge, and generate a patch to that from the manual fixes I did, and then try to use those to resolve the merge of C1 and B1+B2?I suspect there will be cases where this is just not going to work well.

dmoney大约 14 年前

Off topic, but the link to the shell script and the images in the article use Data URIs, which I didn't know existed: <a href="http://en.wikipedia.org/wiki/Data_URI_scheme" rel="nofollow">http://en.wikipedia.org/wiki/Data_URI_scheme</a>

gnosis大约 14 年前

Does anyone know how bazaar would handle this?

评论 #2457170 未加载

mml大约 14 年前

hmm. i was hoping the article discussed git's mind-bogglingly horrible user interface.can't have everything i guess.

评论 #2456073 未加载