QUIC is not quick enough over fast internet

664 pointsby Shank8 months ago

29 comments

raggi8 months ago

There are a number of concrete problems:- syscall interfaces are a mess, the primitive APIs are too slow for regular sized packets (~1500 bytes), the overhead is too high. GSO helps but it’s a horrible API, and it’s been buggy even lately due to complexity and poor code standards.- the syscall costs got even higher with spectre mitigation - and this story likely isn’t over. We need a replacement for the BSD sockets / POSIX APIs they’re terrible this decade. Yes, uring is fancy, but there’s a tutorial level API middle ground possible that should be safe and 10x less overhead without resorting to uring level complexity.- system udp buffers are far too small by default - they’re much much smaller than their tcp siblings, essentially no one but experts have been using them, and experts just retune stuff.- udp stack optimizations are possible (such as possible route lookup reuse without connect(2)), gso demonstrates this, though as noted above gso is highly fallible, quite expensive itself, and the design is wholly unnecessarily intricate for what we need, particularly as we want to do this safely from unprivileged userspace.- several optimizations currently available only work at low/mid-scale, such as connect binding to (potentially) avoid route lookups / GSO only being applicable on a socket without high peer-competition (competing peers result in short offload chains due to single-peer constraints, eroding the overhead wins).Despite all this, you can implement GSO and get substantial performance improvements, we (tailscale) have on Linux. There will be a need at some point for platforms to increase platform side buffer sizes for lower end systems, high load/concurrency, bdp and so on, but buffers and congestion control are a high complex and sometimes quite sensitive topic - nonetheless, when you have many applications doing this (presumed future state), there will be a need.

评论 #41485597 未加载

评论 #41488401 未加载

评论 #41486883 未加载

评论 #41486408 未加载

评论 #41485992 未加载

评论 #41485529 未加载

评论 #41493399 未加载

评论 #41508646 未加载

评论 #41485580 未加载

评论 #41498878 未加载

评论 #41501806 未加载

JoshTriplett8 months ago

In the early days of QUIC, many people pointed out that the UDP stack has had far far less optimization put into it than the TCP stack. Sure enough, some of the issues identified here arise because the UDP stack isn't doing things that it could do but that nobody has been motivated to make it do, such as UDP generic receive offload. Papers like this are very likely to lead to optimizations both obvious and subtle.

评论 #41485162 未加载

评论 #41488825 未加载

评论 #41485236 未加载

评论 #41487278 未加载

评论 #41486439 未加载

评论 #41494674 未加载

sbstp8 months ago

Even HTTP/2 seems to have been rushed[1]. Chrome has removed support for server push. Maybe more thought should be put into these protocols instead of just rebranding whatever Google is trying to impose on us.[1] <a href="https://varnish-cache.org/docs/trunk/phk/h2againagainagain.html" rel="nofollow">https://varnish-cache.org/docs/trunk/phk/h2againagainagain.h...</a>

评论 #41486184 未加载

评论 #41485255 未加载

评论 #41487373 未加载

评论 #41485881 未加载

评论 #41485295 未加载

评论 #41496571 未加载

botanical8 months ago

> we identify the root cause to be high receiver-side processing overheadI find this to be the issue when it comes to Google, and I bet it was known before hand; pushing processing to the user. For example, the AV1 video codec was deployed when no consumer had HW decoding capabilities. It saved them on space at the expense of increased CPU usage for the end-user.I don't know what the motive was there; it would still show that they are carbon-neutral while billions are busy processing the data.

评论 #41485477 未加载

评论 #41485400 未加载

评论 #41485435 未加载

JoshTriplett8 months ago

Seems to be available on arXiv: <a href="https://arxiv.org/pdf/2310.09423" rel="nofollow">https://arxiv.org/pdf/2310.09423</a>

评论 #41489737 未加载

M2Ys4U8 months ago

>The results show that QUIC and HTTP/2 exhibit similar performance when the network bandwidth is relatively low (below ∼600 Mbps)>Next, we investigate more realistic scenarios by conducting the same file download experiments on major browsers: Chrome, Edge, Firefox, and Opera. We observe that the performance gap is even larger than that in the cURL and quic_client experiments: on Chrome, QUIC begins to fall behind when the bandwidth exceeds ∼500 Mbps.Okay, well, this isn't going to be a problem over the general Internet, it's more of a problem in local networks.For people that have high-speed connections, how often are you getting >500Mbps from a single source?

评论 #41487688 未加载

评论 #41525658 未加载

评论 #41491955 未加载

crashingintoyou8 months ago

Don't have access to the published version but draft at <a href="https://arxiv.org/pdf/2310.09423" rel="nofollow">https://arxiv.org/pdf/2310.09423</a> mentions ping RTT at 0.23ms.As someone frequently at 150ms+ latency for a lot of websites (and semi-frequently 300ms+ for non-geo-distributed websites), in practice with the latency QUIC is easily the best for throughput, HTTP/1.1 with a decent number of parallel connections is a not-that-distant second, and in a remote third is HTTP/2 due to head-of-line-blocking issues if/when a packet goes missing.

apitman8 months ago

Currently chewing my way laboriously through RFC9000. Definitely concerned by how complex it is. The high level ideas of QUIC seem fairly straight forward, but the spec feels full of edge cases you must account for. Maybe there's no other way, but it makes me uncomfortable.I don't mind too much as long as they never try to take HTTP/1.1 from me.

评论 #41486381 未加载

评论 #41485460 未加载

AlphaCharlie8 months ago

Free PDF file of the research: <a href="https://arxiv.org/pdf/2310.09423" rel="nofollow">https://arxiv.org/pdf/2310.09423</a>

jacob0198 months ago

Maybe moving the connection protocol into userspace isn't such a great plan.

评论 #41486036 未加载

评论 #41485060 未加载

评论 #41486819 未加载

评论 #41485408 未加载

评论 #41494496 未加载

评论 #41485178 未加载

评论 #41485198 未加载

mholt8 months ago

I don't have access to the paper but based on the abstract and a quick scan of the presentation, I can confirm that I have seen results like this in Caddy, which enables HTTP/3 out of the box.HTTP/3 implementations vary widely at the moment, and will likely take another decade to optimize to homogeneity. But even then, QUIC requires a lot of state management that TCP doesn't have to worry about (even in the kernel). There's a ton of processing involved with every UDP packet, and small MTUs, still engrained into many middle boxes and even end-user machines these days, don't make it any better.So, yeah, as I felt about QUIC ... oh, about 6 years ago or so... HTTP/2 is actually really quite good enough for most use cases. The far reaches of the world and those without fast connections will benefit, but the majority of global transmissions will likely be best served with HTTP/2.Intuitively, I consider each HTTP major version an increased order of magnitude in complexity. From 1 to 2 the main complexities are binary (that's debatable, since it's technically simpler from an encoding standpoint), compression, and streams; then with HTTP/3 there's _so, so much_ it does to make it work. It _can_ be faster -- that's proven -- but only when networks are slow.TCP congestion control is its own worst enemy, but when networks aren't congested (and with the right algorithm)... guess what. It's fast! And the in-order packet transmissions (head-of-line blocking) makes endpoint code so much simpler and faster. It's no wonder TCP is faster these days when networks are fast.I think servers should offer HTTP/3 but clients should be choosy when to use it, for the sake of their own experience/performance.

评论 #41486452 未加载

评论 #41485159 未加载

评论 #41485168 未加载

评论 #41485150 未加载

评论 #41485304 未加载

jauntywundrkind8 months ago

I wonder if these results reproduce on Windows. Is there any TCP offload or GSO there? If not maybe the results wouldn't vary?

评论 #41486787 未加载

AtNightWeCode8 months ago

For us, what QUIC solves is that mobile users that move around in the subway and so on are not getting these huge latency spikes. Which was one of our biggest complains.

wtarreau8 months ago

Something that nobody seems to be talking about here is the congestion control algorithm, which is the problem here. Cubic doesn't like losses. At all. In the kernel, pacing is implemented to minimise losses, allowing Cubic to work acceptably for TCP, but if the network is slightly lossy, the perfs are terrible anyway. QUIC strongly recommends to implement pacing but it's less easy to implement accurately in userland when you have to cross a whole chain than at the queue level in the kernel.Most QUIC implementations use different variations around the protocol to make it behave significantly better, such as preserving the last metrics when facing a loss so that in case it was only a reorder, they can be restored, etc. The article should have compared different server-side implementations, with different settings. We're used to see a ratio of 1:20 in some transatlantic tests.And testing a BBR-enabled QUIC implementation shows tremendous gains compared to TCP with Cubic. Ratios of 1:10 are not uncommon with moderate latency (100ms) and losses (1-3%).At least what QUIC is enlightening is that if TCP has worked so poorly for a very long time (remember that the reason for QUIC was that it was impossible to fix TCP everywhere), it's in large part due to congestion control algorithms, and that since they were implemented in kernel by people carefully reading an academic paper that never considers reality but only in-lab measurements, such algorithms behave pretty poorly in front of the real internet where jitter, reordering, losses, duplicates etc are normal. QUIC allowed many developers to put their fingers in the algos, adjust some thresholds and mechanisms and we're seeing stuff improve fast (it could have improved faster if OpenSSL didn't decide to play against QUIC a few years ago by cowardly refusing to implement the API everyone needed, and imposing to rely on locally-built SSL libs to use QUIC). I'm pretty sure that within 2-3 years, we'll see some of the QUIC improvements ported to TCP, just because QUIC is a great playground to experiment with these algos that for 4 decades had been the reserved territory of just a few people who denied the net as it is and worked for the net how they dreamed it.Look at this for example, it summarizes it all: <a href="https://huitema.wordpress.com/2019/11/11/implementing-cubic-congestion-control-in-quic/" rel="nofollow">https://huitema.wordpress.com/2019/11/11/implementing-cubic-...</a>

Banou8 months ago

I think one of the reasons Google choose UDP is that it's already a popular protocol, on which you can build reliable packets, while also having the base UDP unreliability on the side.From my perspective, which is a web developer's, having QUIC, allowed the web standards to easily piggy back on top of it for the Webtransport API, which is ways better than the current HTTP stack and WebRTC which is a complete mess. Basically giving a TCP and UDP implementation for the web.Knowing this, I feel like it makes more sense to me why Google choose this way of doing, which some people seem to be criticizing.

评论 #41488230 未加载

throw0101c8 months ago

Netflix has gotten TCP/TLS up to 800 Gbps (over many streams):* <a href="https://news.ycombinator.com/item?id=32519881">https://news.ycombinator.com/item?id=32519881</a>* <a href="https://news.ycombinator.com/item?id=33449297">https://news.ycombinator.com/item?id=33449297</a>hitting 100 Gbps (20k-30k customers) using less that 100W:* <a href="https://twitter.com/ocochardlabbe/status/1781848334145130661" rel="nofollow">https://twitter.com/ocochardlabbe/status/1781848334145130661</a>* <a href="https://news.ycombinator.com/item?id=40630699#unv_40630785">https://news.ycombinator.com/item?id=40630699#unv_40630785</a>

jiggawatts8 months ago

I wonder if the trick might be to repurpose technology from server hardware: partition the physical NIC into virtual PCI-e devices with distinct addresses, and map to user-space processes instead of virtual machines.So in essence, each browser tab or even each listening UDP socket could have a distinct IPv6 address dedicated to it, with packets delivered into a ring buffer in user-mode. This is so similar to what goes on with hypervisors now that existing hardware designs might even be able to handle it already.Just an idle thought...

评论 #41486618 未加载

评论 #41486210 未加载

dathinab8 months ago

it says it isn't fast _enough_but as far as I can tell it's fast _enough_ just not as fast as it could bemainly they seem to test situations related to bandwidth/latency which aren't very realistically for the majority of users (because most users don't have supper fast high bandwidth internet)this doesn't meant QUIC can't be faster or we shouldn't look into reducing overhead, just it's likely not as much as a deal as it might initially loook

wseqyrku8 months ago

There's a work in progress for kernel support: <a href="https://github.com/lxin/quic">https://github.com/lxin/quic</a>

latentpot8 months ago

QUIC is the standard problem across n number of clients who choose Zscaler and similar content inspection tools. You can block it at the policy level but you also need to have it disabled at the browser level. Which sometimes magically turns on again and leads to a flurry of tickets for 'slow internet', 'Google search not working' etcetera.

评论 #41485687 未加载

评论 #41485947 未加载

评论 #41486806 未加载

necessary8 months ago

Does QUIC do better with packet loss compared to TCP? TCP perceives packet loss as network congestion and so throughput over high bandwidth+high packet loss links suffers.

ahmetozer8 months ago

For mobile connectivity -> quic For home internet wifi & cable access -> http2 For heavy loaded enterprise slow wifi network -> quic

edwintorok8 months ago

TCP has a lot of offloads that may not all be available for UDP.

404mm8 months ago

When looking at the tested browsers, I want to ask why this was not tested on Safari (which is currently the second most used browser by share).

exabrial8 months ago

QUIC needs an unencrypted mode!

评论 #41494557 未加载

sylware8 months ago

To go faster, you need to simplify a lot.

评论 #41486868 未加载

larsonnn8 months ago

Site is blocking Apples private relay :(

thelastparadise8 months ago

Gotta be QUIC er than that, buddy!

Sparkyte8 months ago

Maybe I'm the only person who thinks that trying to make existing internet protocols faster is wasted energy. But who am I to say anything.

评论 #41485919 未加载

评论 #41486538 未加载

评论 #41494567 未加载

29 comments

raggi8 months ago

评论 #41485597 未加载

评论 #41488401 未加载

评论 #41486883 未加载

评论 #41486408 未加载

评论 #41485992 未加载

评论 #41485529 未加载

评论 #41493399 未加载

评论 #41508646 未加载

评论 #41485580 未加载

评论 #41498878 未加载

评论 #41501806 未加载

JoshTriplett8 months ago

评论 #41485162 未加载

评论 #41488825 未加载

评论 #41485236 未加载

评论 #41487278 未加载

评论 #41486439 未加载

评论 #41494674 未加载

sbstp8 months ago

评论 #41486184 未加载

评论 #41485255 未加载

评论 #41487373 未加载

评论 #41485881 未加载

评论 #41485295 未加载

评论 #41496571 未加载

botanical8 months ago

评论 #41485477 未加载

评论 #41485400 未加载

评论 #41485435 未加载

JoshTriplett8 months ago

Seems to be available on arXiv: <a href="https://arxiv.org/pdf/2310.09423" rel="nofollow">https://arxiv.org/pdf/2310.09423</a>

评论 #41489737 未加载

M2Ys4U8 months ago

评论 #41487688 未加载

评论 #41525658 未加载

评论 #41491955 未加载

crashingintoyou8 months ago

apitman8 months ago

评论 #41486381 未加载

评论 #41485460 未加载

AlphaCharlie8 months ago

Free PDF file of the research: <a href="https://arxiv.org/pdf/2310.09423" rel="nofollow">https://arxiv.org/pdf/2310.09423</a>

jacob0198 months ago

Maybe moving the connection protocol into userspace isn't such a great plan.

评论 #41486036 未加载

评论 #41485060 未加载

评论 #41486819 未加载

评论 #41485408 未加载

评论 #41494496 未加载

评论 #41485178 未加载

评论 #41485198 未加载

mholt8 months ago

评论 #41486452 未加载

评论 #41485159 未加载

评论 #41485168 未加载

评论 #41485150 未加载

评论 #41485304 未加载

jauntywundrkind8 months ago

I wonder if these results reproduce on Windows. Is there any TCP offload or GSO there? If not maybe the results wouldn't vary?

评论 #41486787 未加载

AtNightWeCode8 months ago

For us, what QUIC solves is that mobile users that move around in the subway and so on are not getting these huge latency spikes. Which was one of our biggest complains.

wtarreau8 months ago

Banou8 months ago

评论 #41488230 未加载

throw0101c8 months ago

jiggawatts8 months ago

评论 #41486618 未加载

评论 #41486210 未加载

dathinab8 months ago

wseqyrku8 months ago

There's a work in progress for kernel support: <a href="https://github.com/lxin/quic">https://github.com/lxin/quic</a>

latentpot8 months ago

评论 #41485687 未加载

评论 #41485947 未加载

评论 #41486806 未加载

necessary8 months ago

Does QUIC do better with packet loss compared to TCP? TCP perceives packet loss as network congestion and so throughput over high bandwidth+high packet loss links suffers.

ahmetozer8 months ago

For mobile connectivity -> quic For home internet wifi & cable access -> http2 For heavy loaded enterprise slow wifi network -> quic

edwintorok8 months ago

TCP has a lot of offloads that may not all be available for UDP.

404mm8 months ago

When looking at the tested browsers, I want to ask why this was not tested on Safari (which is currently the second most used browser by share).

exabrial8 months ago

QUIC needs an unencrypted mode!

评论 #41494557 未加载

sylware8 months ago

To go faster, you need to simplify a lot.

评论 #41486868 未加载

larsonnn8 months ago

Site is blocking Apples private relay :(

thelastparadise8 months ago

Gotta be QUIC er than that, buddy!

Sparkyte8 months ago

Maybe I'm the only person who thinks that trying to make existing internet protocols faster is wasted energy. But who am I to say anything.

评论 #41485919 未加载

评论 #41486538 未加载

评论 #41494567 未加载