There are a number of concrete problems:<p>- syscall interfaces are a mess, the primitive APIs are too slow for regular sized packets (~1500 bytes), the overhead is too high. GSO helps but it’s a horrible API, and it’s been buggy even lately due to complexity and poor code standards.<p>- the syscall costs got even higher with spectre mitigation - and this story likely isn’t over. We need a replacement for the BSD sockets / POSIX APIs they’re terrible this decade. Yes, uring is fancy, but there’s a tutorial level API middle ground possible that should be safe and 10x less overhead without resorting to uring level complexity.<p>- system udp buffers are far too small by default - they’re much much smaller than their tcp siblings, essentially no one but experts have been using them, and experts just retune stuff.<p>- udp stack optimizations are possible (such as possible route lookup reuse without connect(2)), gso demonstrates this, though as noted above gso is highly fallible, quite expensive itself, and the design is wholly unnecessarily intricate for what we need, particularly as we want to do this safely from unprivileged userspace.<p>- several optimizations currently available only work at low/mid-scale, such as connect binding to (potentially) avoid route lookups / GSO only being applicable on a socket without high peer-competition (competing peers result in short offload chains due to single-peer constraints, eroding the overhead wins).<p>Despite all this, you can implement GSO and get substantial performance improvements, we (tailscale) have on Linux. There will be a need at some point for platforms to increase platform side buffer sizes for lower end systems, high load/concurrency, bdp and so on, but buffers and congestion control are a high complex and sometimes quite sensitive topic - nonetheless, when you have many applications doing this (presumed future state), there will be a need.
In the early days of QUIC, many people pointed out that the UDP stack has had far far less optimization put into it than the TCP stack. Sure enough, some of the issues identified here arise because the UDP stack isn't doing things that it <i>could</i> do but that nobody has been motivated to make it do, such as UDP generic receive offload. Papers like this are very likely to lead to optimizations both obvious and subtle.
Even HTTP/2 seems to have been rushed[1]. Chrome has removed support for server push. Maybe more thought should be put into these protocols instead of just rebranding whatever Google is trying to impose on us.<p>[1] <a href="https://varnish-cache.org/docs/trunk/phk/h2againagainagain.html" rel="nofollow">https://varnish-cache.org/docs/trunk/phk/h2againagainagain.h...</a>
> we identify the root cause to be high receiver-side processing overhead<p>I find this to be the issue when it comes to Google, and I bet it was known before hand; pushing processing to the user. For example, the AV1 video codec was deployed when no consumer had HW decoding capabilities. It saved them on space at the expense of increased CPU usage for the end-user.<p>I don't know what the motive was there; it would still show that they are carbon-neutral while billions are busy processing the data.
Seems to be available on arXiv: <a href="https://arxiv.org/pdf/2310.09423" rel="nofollow">https://arxiv.org/pdf/2310.09423</a>
>The results show that QUIC and HTTP/2 exhibit similar performance when the network bandwidth is relatively low (below ∼600 Mbps)<p>>Next, we investigate more realistic scenarios by conducting the same file download experiments on major browsers: Chrome, Edge, Firefox, and Opera. We observe that the performance gap is even larger than that in the cURL and quic_client experiments: on Chrome, QUIC begins to fall behind when the bandwidth exceeds ∼500 Mbps.<p>Okay, well, this isn't going to be a problem over the general Internet, it's more of a problem in local networks.<p>For people that have high-speed connections, how often are you getting >500Mbps from a single source?
Don't have access to the published version but draft at <a href="https://arxiv.org/pdf/2310.09423" rel="nofollow">https://arxiv.org/pdf/2310.09423</a> mentions ping RTT at 0.23ms.<p>As someone frequently at 150ms+ latency for a lot of websites (and semi-frequently 300ms+ for non-geo-distributed websites), in practice with the latency QUIC is easily the best for throughput, HTTP/1.1 with a decent number of parallel connections is a not-that-distant second, and in a remote third is HTTP/2 due to head-of-line-blocking issues if/when a packet goes missing.
Currently chewing my way laboriously through RFC9000. Definitely concerned by how complex it is. The high level ideas of QUIC seem fairly straight forward, but the spec feels full of edge cases you must account for. Maybe there's no other way, but it makes me uncomfortable.<p>I don't mind too much as long as they never try to take HTTP/1.1 from me.
Free PDF file of the research: <a href="https://arxiv.org/pdf/2310.09423" rel="nofollow">https://arxiv.org/pdf/2310.09423</a>
I don't have access to the paper but based on the abstract and a quick scan of the presentation, I can confirm that I have seen results like this in Caddy, which enables HTTP/3 out of the box.<p>HTTP/3 implementations vary widely at the moment, and will likely take another decade to optimize to homogeneity. But even then, QUIC requires a <i>lot</i> of state management that TCP doesn't have to worry about (even in the kernel). There's a ton of processing involved with every UDP packet, and small MTUs, still engrained into many middle boxes and even end-user machines these days, don't make it any better.<p>So, yeah, as I felt about QUIC ... oh, about 6 years ago or so... HTTP/2 is actually really quite good enough for most use cases. The far reaches of the world and those without fast connections will benefit, but the majority of global transmissions will likely be best served with HTTP/2.<p>Intuitively, I consider each HTTP major version an increased order of magnitude in complexity. From 1 to 2 the main complexities are binary (that's debatable, since it's technically simpler from an encoding standpoint), compression, and streams; then with HTTP/3 there's _so, so much_ it does to make it work. It _can_ be faster -- that's proven -- but only when networks are slow.<p>TCP congestion control is its own worst enemy, but when networks aren't congested (and with the right algorithm)... guess what. It's fast! And the in-order packet transmissions (head-of-line blocking) makes endpoint code so much simpler and faster. It's no wonder TCP is faster these days when networks are fast.<p>I think servers should offer HTTP/3 but clients should be choosy when to use it, for the sake of their own experience/performance.
For us, what QUIC solves is that mobile users that move around in the subway and so on are not getting these huge latency spikes. Which was one of our biggest complains.
Something that nobody seems to be talking about here is the congestion control algorithm, which is <i>the</i> problem here. Cubic doesn't like losses. At all. In the kernel, pacing is implemented to minimise losses, allowing Cubic to work acceptably for TCP, but if the network is slightly lossy, the perfs are terrible anyway. QUIC strongly recommends to implement pacing but it's less easy to implement accurately in userland when you have to cross a whole chain than at the queue level in the kernel.<p>Most QUIC implementations use different variations around the protocol to make it behave significantly better, such as preserving the last metrics when facing a loss so that in case it was only a reorder, they can be restored, etc. The article should have compared different server-side implementations, with different settings. We're used to see a ratio of 1:20 in some transatlantic tests.<p>And testing a BBR-enabled QUIC implementation shows tremendous gains compared to TCP with Cubic. Ratios of 1:10 are not uncommon with moderate latency (100ms) and losses (1-3%).<p>At least what QUIC is enlightening is that if TCP has worked so poorly for a very long time (remember that the reason for QUIC was that it was impossible to fix TCP everywhere), it's in large part due to congestion control algorithms, and that since they were implemented in kernel by people carefully reading an academic paper that never considers reality but only in-lab measurements, such algorithms behave pretty poorly in front of the real internet where jitter, reordering, losses, duplicates etc are normal. QUIC allowed many developers to put their fingers in the algos, adjust some thresholds and mechanisms and we're seeing stuff improve fast (it could have improved faster if OpenSSL didn't decide to play against QUIC a few years ago by cowardly refusing to implement the API everyone needed, and imposing to rely on locally-built SSL libs to use QUIC). I'm pretty sure that within 2-3 years, we'll see some of the QUIC improvements ported to TCP, just because QUIC is a great playground to experiment with these algos that for 4 decades had been the reserved territory of just a few people who denied the net as it is and worked for the net how they dreamed it.<p>Look at this for example, it summarizes it all: <a href="https://huitema.wordpress.com/2019/11/11/implementing-cubic-congestion-control-in-quic/" rel="nofollow">https://huitema.wordpress.com/2019/11/11/implementing-cubic-...</a>
I think one of the reasons Google choose UDP is that it's already a popular protocol, on which you can build reliable packets, while also having the base UDP unreliability on the side.<p>From my perspective, which is a web developer's, having QUIC, allowed the web standards to easily piggy back on top of it for the Webtransport API, which is ways better than the current HTTP stack and WebRTC which is a complete mess.
Basically giving a TCP and UDP implementation for the web.<p>Knowing this, I feel like it makes more sense to me why Google choose this way of doing, which some people seem to be criticizing.
Netflix has gotten TCP/TLS up to 800 Gbps (over many streams):<p>* <a href="https://news.ycombinator.com/item?id=32519881">https://news.ycombinator.com/item?id=32519881</a><p>* <a href="https://news.ycombinator.com/item?id=33449297">https://news.ycombinator.com/item?id=33449297</a><p>hitting 100 Gbps (20k-30k customers) using less that 100W:<p>* <a href="https://twitter.com/ocochardlabbe/status/1781848334145130661" rel="nofollow">https://twitter.com/ocochardlabbe/status/1781848334145130661</a><p>* <a href="https://news.ycombinator.com/item?id=40630699#unv_40630785">https://news.ycombinator.com/item?id=40630699#unv_40630785</a>
I wonder if the trick might be to repurpose technology from server hardware: partition the physical NIC into virtual PCI-e devices with distinct addresses, and map to user-space processes instead of virtual machines.<p>So in essence, each browser tab or even each listening UDP socket could have a distinct IPv6 address dedicated to it, with packets delivered into a ring buffer in user-mode. This is so similar to what goes on with hypervisors now that existing hardware designs might even be able to handle it already.<p>Just an idle thought...
it says it isn't fast _enough_<p>but as far as I can tell it's fast _enough_ just not as fast as it could be<p>mainly they seem to test situations related to bandwidth/latency which aren't very realistically for the majority of users (because most users don't have supper fast
high bandwidth internet)<p>this doesn't meant QUIC can't be faster or we shouldn't look into reducing overhead, just it's likely not as much as a deal as it might initially loook
There's a work in progress for kernel support: <a href="https://github.com/lxin/quic">https://github.com/lxin/quic</a>
QUIC is the standard problem across n number of clients who choose Zscaler and similar content inspection tools. You can block it at the policy level but you also need to have it disabled at the browser level. Which sometimes magically turns on again and leads to a flurry of tickets for 'slow internet', 'Google search not working' etcetera.
Does QUIC do better with packet loss compared to TCP? TCP perceives packet loss as network congestion and so throughput over high bandwidth+high packet loss links suffers.