The Biggest and Weirdest Commits in Linux Kernel Git History

394 pointsby gary_bernhardtover 8 years ago

14 comments

curuinorover 8 years ago

Clauset Shalizi Newman 2007 has not-nice things to say about the classic physicist's idiot trick of fitting power law distributions by drawing a straight line on a log-log graph: it's got huge bias. <a href="https://arxiv.org/abs/0706.1062" rel="nofollow">https://arxiv.org/abs/0706.1062</a>However, the other difficult thing about power law distributions is that the dataset size requirements for proper determination of the fact that it's a power law distribution are occasionally incredibly difficult. So their critique is very strong, given the comparative lack of data. It is often the case that computer systems, with the overflowing reams of data, are still not enough. Note that the paper I cited up there suggests MLE and then a Kolmogorov-Smirnoff test, so it'll say a lot of things aren't power laws that could well be.Another way to look at it is from a more geometric point of view. The metric entropy of any generic system of variables is defined as the sum of the positive Lyapunov exponents: and as an "entropy" that quantity does have a lot of commonalities with the other entropies. But to have positive Lyapunov exponents is often to have a chaotic dynamics, so it could just be conjectured that the time series of commits and merge octopus sizes in kernel git history is chaotic, so the evolution of the time series will be fractal in nature.But it's also really fucking hard to confirm or deny that one, because there are varied and strange definitions of chaos itself and the methods that have been suggested to measure Lyapunov exponent in real systems are arcane and difficult. You could try some synchronization methods, but they remain arcane and crap. Fractal measurement methods are also shitty and full of dark magic.One neat little trick might be to discretize the series, symbolic dynamics-style (it's already discretized but discretize further, into like percentiles or something) and run it through one of the dynamical machine learning dealies to see if there's patterns. Not too much literature on that but it's a thing that some randoes in like 2004 or something did

评论 #13649129 未加载

评论 #13648549 未加载

评论 #13648351 未加载

评论 #13649288 未加载

cpobrienover 8 years ago

There is a mention of the 66 parent merge from Linus himself:<a href="http://marc.info/?l=linux-kernel&m=139033182525831" rel="nofollow">http://marc.info/?l=linux-kernel&m=139033182525831</a>

评论 #13648187 未加载

评论 #13648196 未加载

评论 #13648269 未加载

geofftover 8 years ago

Another interesting piece of trivia: the very first more-than-two-parent merge in the kernel history is a mistake. The second and third parents are the same commit.<pre><code> commit 13e652800d1644dfedcd0d59ac95ef0beb7f3165 Merge: 4332bdd 88d7bd8 88d7bd8 Author: David Woodhouse <dwmw2@shinybook.infradead.org> Date: Sun May 8 13:23:54 2005 +0100 Merge with master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6.git</code></pre>

SEJeffover 8 years ago

Some of my favorite commits come from Rusty Russel, who wrote the lguest toy hypervisor documentation as a story:<a href="https://github.com/torvalds/linux/commit/f938d2c892db0d80d144253d4a7b7083efdbedeb#diff-847230dec604827964905e0dfec81e42R1" rel="nofollow">https://github.com/torvalds/linux/commit/f938d2c892db0d80d14...</a>

gsylvieover 8 years ago

I don't like OP's definition of divergence. I prefer to take the size of the diff along first-parent instead.Here's how I would do it:<pre><code> time git log -m --first-parent --shortstat --pretty="%H" --min-parents=2 | grep -v '^$\|3e1dd193edefd2a806a0ba6cf0879cf1a95217da' | sed 's/.* file.* changed,//' | sed 's/insertion.*,/+/' | sed 's/deletion.*//' | sed 's/insertion.*//' | sed 's/^\ $.*$\ $/\$$\(\1$\)/' | xargs -d '\n' -L 2 echo echo | bash | sort -k 2,2 -g </code></pre> Note: I skip 3e1dd193edefd2a806a0ba6cf0879cf1a95217da because that commit has no diff along first-parent, and thus screws up my xargs result (which depends on every 2nd line having the --shortstat output).Of course "--first-parent" doesn't guarantee that we're walking the mainline (see: <a href="https://developer.atlassian.com/blog/2016/04/stop-foxtrots-now/" rel="nofollow">https://developer.atlassian.com/blog/2016/04/stop-foxtrots-n...</a> ), but it usually is.On my laptop it takes 3 mins 30 seconds. Here are the 5 biggest merges by this definition:<pre><code> 099bfbfc7fbbe22356c02f0caf709ac32e1126ea 463702 3f17ea6dea8ba5668873afa54628a91aaa3fb1c0 466320 ce519e2327bff01d0eb54071e7044e6291a52aa6 500074 7ea61767e41e2baedd6a968d13f56026522e1207 504965 f063a0c0c995d010960efcc1b2ed14b99674f25c 569691 </code></pre> And here's "git show" for those 5:<pre><code> 099bfbfc7fbb 2015-06-26T13:18:51-07:00 Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux 3f17ea6dea8b 2014-06-08T11:31:16-07:00 Merge branch 'next' (accumulated 3.16 merge window patches) into master ce519e2327bf 2009-01-06T17:04:29-08:00 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6 7ea61767e41e 2009-09-16T08:11:54-07:00 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6 f063a0c0c995 2010-10-28T12:13:00-07:00 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6</code></pre>

评论 #13649116 未加载

kijinover 8 years ago

> "Christ, that's not an octopus, that's a Cthulhu merge"Perhaps git should throw a warning when you try to do an octopus merge with more parents than an octopus has legs. If you really want to proceed, add the --cthulhu option. The default behavior would be --no-cthulhu.

评论 #13651084 未加载

评论 #13656646 未加载

brongondwanaover 8 years ago

It only has one parent, but this would be the commit that I'm least proud of (not in Linux, obviously):<a href="https://github.com/cyrusimap/cyrus-imapd/commit/fdc0eb3d09bcc2ce916d2790c98839a61d403937" rel="nofollow">https://github.com/cyrusimap/cyrus-imapd/commit/fdc0eb3d09bc...</a>Showing 126 changed files with 14,128 additions and 20,617 deletions.(ok, I'm pretty proud of reducing code size by 6k+ lines while improving lots of stuff, but the commit is a shitshow)

userbinatorover 8 years ago

GitHub's logo always reminds me of the octopus merge; not sure if it was chosen for this reason, but I think it's quite suitable.

评论 #13649675 未加载

metrognomeover 8 years ago

I think Gary's commit counts are off:<pre><code> $ git log | wc -l </code></pre> This should count the number of lines in the entire git log, including metadata (not just commits). I think he means this:<pre><code> $ git log --oneline | wc -l </code></pre> The number of commits for Rails should be closer to 61,000.

评论 #13648496 未加载

评论 #13648733 未加载

评论 #13648468 未加载

cpercivaover 8 years ago

Octopuses are more common than you might expectThe etymologically correct plural is octopodes. (Some people accuse "octopodes* of being pedantic, but as I see it "pedantic" is just a euphemism for "correct in a way I don't like".)

评论 #13650688 未加载

smallnamespaceover 8 years ago

Slight article nitpick: a distribution that 'looks like a straight line' in a log-log plot is often not power-law distributed.One could say that the distribution has a fat one-sided tail though.

评论 #13648255 未加载

majewskyover 8 years ago

I used octopus merges once for a deployment system that I built when my team switched from SVN to Git. Since there were a lot of developers working on different parts, it was many times required to test multiple different changes in parallel in the QA system.I built a small web UI where developers could select and unselect development branches, and it would octopus-merge all selected branches into the master branch, and force-push that state onto the QA branch (and deploy it to QA, of course). So QA would always be master + all development branches that were currently being verified. By using a Github webhook, it would update the QA system whenever master or one of the branches being verified was pushed to. I'm not in that team anymore, but I think that deployment tool is still humming along nicely.

评论 #13651056 未加载

tomatokillerover 8 years ago

Has anyone asked Laxman Dewangan what he was up to with that initial commit and merge thing?

评论 #13648492 未加载

评论 #13648686 未加载

behmover 8 years ago

That was the worst diagram today. <1 Commits on the y-axis? Where would be 30 on the x-axis? Can't tell if you only have 3 markers on a log axis.

评论 #13648745 未加载