See also:<p>Facebook: <a href="https://code.fb.com/open-source/open-sourcing-katran-a-scalable-network-load-balancer/" rel="nofollow">https://code.fb.com/open-source/open-sourcing-katran-a-scala...</a><p>Google: <a href="https://cloudplatform.googleblog.com/2016/03/Google-shares-software-network-load-balancer-design-powering-GCP-networking.html" rel="nofollow">https://cloudplatform.googleblog.com/2016/03/Google-shares-s...</a><p>The design of all 3 is very similar.
Cool use of SR-IOV, I like it.
We've done a few (academic) experiments with SR-IOV for flow bifurcation and we've wondered why no one seems to use it like this. The performance was quite good: neglible performance difference between PF and a single VF and only 5-10% when running multiple >= 8 VFs (probably cache contention somewhere in our specific setup).<p>You seem to be running this on X540 NICs, aren't you running into limitations for the VFs. Mostly the number of queues which I believe is limited to 2 per VF in the ixgbe family.
I wonder whether the AF_XDP DPDK driver could be used instead if SR-IOV isn't available or feasible for some reason.<p>A more detailed look at performance would have been cool. I might try it myself if I find some time (or a student) :)
This basically looks like an open source Maglev [1]. Awesome!<p>[1] - <a href="https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44824.pdf" rel="nofollow">https://static.googleusercontent.com/media/research.google.c...</a>
Looks really cool! Though a simpler solution for most people will probably be OpenBSD's CARP protocol to share a single virtual IP between multiple boxes (with for example relayd). ECMP routing can get complex fast.
I am having trouble understanding this passage. I'm wondering is someone could help me understand this as it seems like an important design detail:<p>>"Another benefit to using UDP is that the source port can be filled in with a per-connection hash so that they are flow within the datacenter over different paths (where ECMP is used within the datacenter), and received on different RX queues on the proxy server’s NIC (which similarly use a hash of TCP/IP header fields)."<p>A source port in the UDP header still needs to be be just that a port number no? Or are they actually stuffing a hash value in to that UDP header field? How would the receiving IP stack no how to understand a value other than a port number in that field?
This is the first I've heard of Rendezvous hashing. It seems superior in every respect to the ring-based consistent hashing I've heard much more about. Why is the ring-based method more common?
The article states:<p>>"Each server has a bonded pair of network interfaces, and those interfaces are shared between DPDK and Linux on GLB director servers."<p>What's the distinction between DPDK and Linux here? It wasn't clear to me why SR-IOV is needed in this design. Does DPGK need to "own" the entire NIC device is that? In other words using DPDK and regular kernel networking are mutually exclusive option on the NIC? Is that correct?
Maybe I don't see the use case since I'm not at that scale, but it seems like a lot of added complexity for what appears to be hacking around using other load balancing solutions as a Layer-4 option?<p>Edit: It's a question, if you downvote please let me know why it's a better solution.