The GitHub Load Balancer

438 pointsby logicalstackover 8 years ago

20 comments

NicoJuicyover 8 years ago

I notice a lot of negativity arround here. Don't know why that is... But i'll take my 5 cents on it.NIH - Not invented here and redoing an opensource project.- Github said they used HAProxy before, i think the use case of github could very well be unique. So they created something that works best for them. They don't have to re-engineer an entire code base. When you work on small projects, you can send a merge request to do changes. I think this is something bigger then just a small bugfix ;). Totally understand them there for creating something new- They used opensource based on number of open source projects including, haproxy, iptables, FoU and pf_ring. That is what opensource is, use opensource to create what suits you best. Every company has some edge cases. I have no doubt that Github has a lot of them ;)Now,Thanks GitHub for sharing, i'll follow up on your posts and hope to learn a couple of new things ;)

otoburbover 8 years ago

Given this is based on HAProxy and seems to improve the director tier of a typical L4/L7 split design, I'm led to believe GLB is an improved TCP-only load balancer.But they also talk about DNS queries, which are still mainly UDP53, so I'm hoping GLB will have UDP load-balancing capability as gravy on top. I excluded zone transfers, DNSSEC traffic or (growing) IPv6 DNS requests on TCP53 because, at least in carrier networks, we're still seeing a tonne of DNS traffic that still fits within plain old 512-byte UDP packets.Looking forward to seeing how this develops.EDIT: Terrible wording on my part to imply that GLB is based off of HAProxy code. I meant to convey that GLB seems to have been designed with deep experience working with HAProxy as evidenced by the quote: "Traditionally we scaled this vertically, running a small set of very large machines running haproxy [...]".

评论 #12558791 未加载

评论 #12558685 未加载

jimjagover 8 years ago

I am increasingly bothered by the "not invented here" syndrome where instead of taking existing projects and enhancing them, in true open source fashion, people instead re-create from scratch.It is then justified that their creation is needed because "no one else has these kinds of problems" but then they open source them as if lots of other people could benefit from it. Why open source something if it has an expected user base of 1?Again, I am not surprised by this. They whole push of Github is not to create a community which works together on a single project in a collaborative, consensus based method, but rather lots of people doing their own thing and only occasionally sharing code. It is no wonder that they follow this meme internally.

评论 #12559563 未加载

评论 #12560202 未加载

评论 #12561111 未加载

评论 #12560615 未加载

评论 #12563629 未加载

评论 #12559573 未加载

评论 #12559711 未加载

评论 #12559365 未加载

评论 #12561094 未加载

评论 #12561880 未加载

评论 #12560038 未加载

评论 #12559662 未加载

评论 #12562776 未加载

Scaevolusover 8 years ago

Related presentations/papers about large scale load balancing:Facebook: <a href="https://www.usenix.org/conference/srecon15europe/program/presentation/shuff" rel="nofollow">https://www.usenix.org/conference/srecon15europe/program/pre...</a>Google: <a href="http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44824.pdf" rel="nofollow">http://static.googleusercontent.com/media/research.google.co...</a>

gwrightover 8 years ago

While I understand that NIH syndrome is a real thing, it is very dissapointing to read many of the comments here.I think very few HN readers are really in a position to have an informed opinion regarding Github's decision to build new piece of software rather than using an existing system.Personally I find this area quite interesting to read about because it is very difficult to build highly available, scalable, and resilient network service endpoints. Plain old TCP/IP isn't really up to the job. Dealing with this without any cooperation from the client side of the connection adds to the difficulty.I look forward to hearing more about GLB.

Ianvdlover 8 years ago

Given the title and the length of the post I was expecting a lot more detail.> Over the last year we’ve developed our new load balancer, called GLB (GitHub Load Balancer). Today, and over the next few weeks, we will be sharing the design and releasing its components as open source software.Is it common practice to do this? Most recent software/framework/service announcements I've read were just a single, longer post with all the details and (where applicable) source code. The only exception I can think of is the Windows Subsystem for Linux (WSL) which was discussed over multiple posts.

评论 #12559610 未加载

评论 #12559004 未加载

评论 #12558833 未加载

评论 #12559766 未加载

评论 #12558858 未加载

wtarreauover 8 years ago

Did people really read the article ? For me it was pretty clear, maybe it involves some regular load-balancing terms that people are not familiar with, because I'm seeing a lot of bullshit written in the comments, but here is what is described there :- in a traditional L4/L7 load balancing setup (typically what is described in my very old white paper "making applications scalable with load balancing"), the first layer (L3-4 only, stateless or stateful) is often called the "director".- the second level (L7) necessarily is based on a proxy.For the director part, LVS used to be used a lot over the last decade, but over the last 3-4 years we're seeing ECMP implemented almost in every router and L3 switch, offering approximately the same benefits without adding machines.ECMP has some drawbacks (breaks all connections during maintenance due to stateless hashing).LVS has other drawbacks (requires synchronization, cannot learn previous sessions upon restart, sensitivity to SYN floods).Basically what they did is something between the two for the director, involving consistent hashing to avoid having to deal with connection synchronization without breaking connections during maintenance periods.This way they can hack on their L7 layer (HAProxy) without anyone ever noticing because the L4 layer redistributes the traffic targeting stopped nodes, and only these ones.Thus the new setups is now user->GLB->HAProxy->servers.And I'm very glad to see that people finally attacked the limitations everyone has been suffering from at the director layer, so good job guys!

gumbyover 8 years ago

They talk about running on "bare metal" but when I followed that link it looked like they were simply running under Ubuntu. Is it so much a given that everything is going to be virtualized?When I think of "bare metal" I think of a single image with disk management, network stack, and what few services they want all running in supervisory mode. Basically the architecture of an embedded system.

评论 #12560612 未加载

评论 #12563554 未加载

评论 #12562024 未加载

评论 #12562025 未加载

yladizover 8 years ago

I'm of two minds about this. Part of me agrees with many of the commenters here, in that Not Invented Here syndrome was probably in effect during the development of this. I don't really know Github's specific use case, and I don't know the various open source load balancers outside of Haproxy and Nginx, but I would be surprised if their use case hasn't been seen before and can be handled with the current software (with some modification, pull requests, etc.). On the other hand, I would guess Github would research into all of this, contact knowledgeable people in the business, and explore their options before spending resources on making an entirely new load balancer. Maybe it really is difficult to horizontally scale load balancing, or load balance on "commodity hardware".That being said, why introduce a new piece of technology without actually releasing it if you're planning to release it, without giving a firm deadline? This isn't a press release, this is a blog post describing the technical details of the load balancer that is apparently already in production and working, so why not release the source when the technology is introduced?

p1mrxover 8 years ago

GitHub only speaks IPv4, so I would be extra-skeptical about using any of their networking code to support a modern service.

NatWover 8 years ago

I'm curious if they looked into pf / CARP as part of their research into allowing horizontal scalability for an ip. See: <a href="https://www.openbsd.org/faq/pf/carp.html" rel="nofollow">https://www.openbsd.org/faq/pf/carp.html</a>

评论 #12560033 未加载

treveover 8 years ago

I half expect a comment here explaining why Gitlab does it better ;)

评论 #12560924 未加载

jedbergover 8 years ago

Awesome. The whole time I was reading I was thinking "they need Rendezvous hashing". And then bam, last paragraph mentions that is in fact what they are using.

lifeisstillgoodover 8 years ago

I love using GitHub and appreciate the impact it is and has had. But this post is what is wrong with the web today. They have taken a distributed-at-it's-plumbing technology, and centralised it so much that now we need to innovate new load balancing mechanisms.Years ago I worked at Demon Internet and we tried to give every dial up user a piece of webspace - just a disk always connected. Almost no one ever used them. But it is what the web is for. Storing your Facebook posts and your git pushes and everything else.No load balancing needed because almost no one reads each repo.The problem is it is easier to drain each of my different things into globally centralised locations, easier for me to just load it up on GitHub than keep my own repo on my cloud server. Easier to post on Facebook than publish myself.But it is beginning to creak. GitHub faces scaling challenges, I am frustrated that some people are on whatsapp and some slack and some telegram, and I cannot track who is talking to me.The web is not meant to be used like this. And it is beginning to show.

评论 #12558942 未加载

评论 #12559725 未加载

评论 #12559378 未加载

评论 #12560112 未加载

评论 #12560809 未加载

评论 #12558943 未加载

评论 #12559256 未加载

contingenciesover 8 years ago

I am intrigued by their opening statement of multiple POPs, but the lack of multi-POP discussion further in the system description.My understanding is that the likes of, for example, Cloudflare or EC2 have a pretty solid system in place for issuing geoDNS responses (historical latency/bandwidth, ASN or geolocation based DNS responses) to direct random internet clients to a nearby POP. Building such a system is not that difficult, I am fairly confident many of us could do so given some time and hardware funding.Observation #1: No geoDNS strategy.Observation #2: Limited global POPs.Given that the inherently distributed nature of git probably makes providing a multi-pop experience easier than for other companies, I wonder why Github's architecture does not appear to have this licked. Is this a case of missing the forest for the trees?

lamontcgover 8 years ago

Why not just use DNS load balancing over VIPs served by HA pairs of load balancers?Back in the day we did this with Netscalers doing L7 load balancing in clusters, and then Cisco Distributed Directors doing DNS load balancing across those clusters.It can take days/weeks to bleed off connections from a VIP that is in the DNS load balancing, but since you've got an H/A pair of load balancers on every VIP you can fail over and fail back across each pair to do routine maintenance.That worked acceptably for a company with a $10B stock valuation at the time.

评论 #12601691 未加载

madmulitaover 8 years ago

We are in the process of moving all of our infrastructure to OpenStack, OpenShift, Ansible, DevOps, Microservices, Docker, Agile, SDN and what not.There are some brainiacs pushing these magic solutions on us and one of the promises is load balancing is not an issue, even better, it's not even being talked about.Please, please, tell me there's something I'm missing.

squiguy7over 8 years ago

I know they mentioned their SYN flood tool but I recently saw a similar project from a hosting provider and thought it was neat [1]. It seems like everyone wants their own solution to this when it is a very common and non-trivial problem.[1]: <a href="https://github.com/LTD-Beget/syncookied" rel="nofollow">https://github.com/LTD-Beget/syncookied</a>

bogomipzover 8 years ago

Do the Directors use Anycast then? That wasn't clear to me.

评论 #12559512 未加载

alsadiover 8 years ago

I never like github approach, they alway use larger hammers