TechEcho

12 comments

This is technical recruiting done right, a lucid walkthrough of a hard problem complete with links to the implementation. It must have taken at least a few weeks of engineer-time to write, Github is awesome for making this public.I wish they had talked a little more about the tradeoff they made. They mentioned that splitting packfiles by fork was space-prohibitive, but ended up with a solution which must take more space than originally used. (If the new heuristic refuses to use some objects as delta bases, some options which would have provided the best compression are no longer available to git)The performance win is incredible, how much space did they give up in the process?

评论 #10261533 未加载

评论 #10261668 未加载

rakooover 9 years ago

Not specific to the post, but:> we're sending few objects, all from the tip of the repository, and these objects will usually be delta'ed against older objects that won't be sent. Therefore, Git tries to find new delta bases for these objects.Why is this the case ? git can send thin packs if the receiver already has the objects, why does it still need to find a full base to diff against ? (Not counting when initial base objects are from another fork -- I don't know if it's often the case)On top of that as far as I understood from the discussion about heuristics (<a href="https://git.kernel.org/cgit/git/git.git/tree/Documentation/technical/pack-heuristics.txt?id=HEAD" rel="nofollow">https://git.kernel.org/cgit/git/git.git/tree/Documentation/t...</a>) it seems like the latest objects are full and the earlier objects are diffed against them (double benefits: you usually want access to the last object which is already full, and earlier objects tend to be only remove stuff, not add because "stuff grows over time). So if objects are still stored as packs, things should already be in a pretty good shape to be sent as-is... or not ?

评论 #10262655 未加载

OJFordover 9 years ago

This is awesome. I absolutely love <company>engineering blogs like this.There's no real reason for companies to educate external devs/hobbyists/students like this, but some do, and it's really awesome.

评论 #10260954 未加载

delinkaover 9 years ago

Perhaps I'm a bit ignorant of git's storage and protocols, but what's the purpose of this initial count? Seems to me it's traversing the tree twice - once to count and once to send the objects across the network. So why not traverse the tree, sending objects that need to be sent and ignoring the ones that don't instead of counting?

评论 #10262013 未加载

评论 #10262014 未加载

codezeroover 9 years ago

Hopefully this isn't too off topic, does anyone know what software was used to make the charts in this post?

评论 #10262227 未加载

politicianover 9 years ago

The Azure team responsible for the Local Git implementation needs to read this fantastic article.I've been putting up with 10-minute deploys due to precisely this issue of counting objects. It's slow because we don't use Local Git as our source-of-record repository (because commits initiate a deployment step), so every deploy involves a clean fetch into a new tmpdir.At least now I know why our deploys are getting slower and slower.

alagappanrover 9 years ago

I wonder why GitHub has a separate domain githubengineering.com for this blog instead of a subdomain like engineering.github.com.I notice that there is an inactive user account called engineering. If at all an User Page is created by that account, it would be available at engineering.github.io.

评论 #10262516 未加载

0x400614over 9 years ago

So does GitHub hire C engineers? Git is implemented in C.

评论 #10260670 未加载

评论 #10262274 未加载

评论 #10260732 未加载

评论 #10260907 未加载

joshmlewisover 9 years ago

The illustrations look like they were done in Paper. Does anyone know if that's right or not?

评论 #10260487 未加载

u14408885over 9 years ago

Could this technique be used to optimise graph traversal in a postgresql table? Or are recursive queries still the way to go?

base698over 9 years ago

Not only is the tech achievement amazing, and something that will save countless man hours it's incredibly written to boot.Amazing work.

jmountover 9 years ago

"When you fork a repository on GitHub, we create a shallow copy of it. This copy has no objects of its own, but it has access to all the objects of an alternate ..."So does this mean one could attach a GitHub repository by having a lot of shill accounts cline it and add random objects (possibly having a performance impact on the original)? I understand the engineering need for the use of alternates, but wonder about the lowered degree of isolation.

12 comments

brian_cloutierover 9 years ago

评论 #10261533 未加载

评论 #10261668 未加载

rakooover 9 years ago

评论 #10262655 未加载

OJFordover 9 years ago

评论 #10260954 未加载

delinkaover 9 years ago

评论 #10262013 未加载

评论 #10262014 未加载

codezeroover 9 years ago

Hopefully this isn't too off topic, does anyone know what software was used to make the charts in this post?

评论 #10262227 未加载

politicianover 9 years ago

alagappanrover 9 years ago

评论 #10262516 未加载

0x400614over 9 years ago

So does GitHub hire C engineers? Git is implemented in C.

评论 #10260670 未加载

评论 #10262274 未加载

评论 #10260732 未加载

评论 #10260907 未加载

joshmlewisover 9 years ago

The illustrations look like they were done in Paper. Does anyone know if that's right or not?

评论 #10260487 未加载

u14408885over 9 years ago

Could this technique be used to optimise graph traversal in a postgresql table? Or are recursive queries still the way to go?

base698over 9 years ago

Not only is the tech achievement amazing, and something that will save countless man hours it's incredibly written to boot.Amazing work.

jmountover 9 years ago

Counting Objects

12 comments

Counting Objects

12 comments