I want all the git users to help me answer this query.<p>Quite often I worry about git hash collisions. I know the probability is as small as it can get. But what is the probability that out of all the git users so far, any one out of all had a git collision? (This is not the same as the previous prob)<p>Secondly hypothetically if I combine all the git repositories ever created in this world into one giant git repo, will I see any collisions?
See <a href="http://en.wikipedia.org/wiki/Birthday_attack" rel="nofollow">http://en.wikipedia.org/wiki/Birthday_attack</a><p>A basic estimate is that the probability of collision is (n^2 / 2) / 2^160, where n is the number of hashes that have been generated. For example, if you've generated 2^64 hashes, you have a 1/2^33 chance of collision. This works pretty well until you get too close to 2^80, and larger.<p>Edit: That formula actually is a shorthand for (n * (n-1)/2) / 2^160, you get it because on your first hash, you have 0/2^160 chance of collision, and on your second hash, you have 1/2^160 chance of collision, and on your third, you have 2/2^160 chance of collision, and on your nth, you have (n-1)/2^160 chance of collision (assuming you haven't had any collisions already). So you can add up those probabilities to get (n * (n-1)/2)/2^160 -- you can just add the probabilities because the chance of having _two_ collisions is super unlikely (until it is no longer super unlikely, around 2^80 or so). (Really it's 1 - (1 - 1 / 2^160)^k and we're approximating that with k * (1 / 2^160.)
<a href="https://www.wolframalpha.com/input/?i=number+of+combinations+of+sha1" rel="nofollow">https://www.wolframalpha.com/input/?i=number+of+combinations...</a><p>> 983.544.651.006.435.431.129.340.665.918.456.708.968.598.498.835<p>or 9.8 * 10^47 combinations are possible. Although this doesn't give the complete picture, this shows how big the sha1 space is.