How do HTTP servers figure out Content-Length?

271 pointsby misonic7 months ago

21 comments

hobofan7 months ago

I think the article should be called "How do Go standard library HTTP servers figure out Content-Length?".In most HTTP server implementations from other languages I've worked with I recall having to either:- explicitly define the Content-Length up-front (clients then usually don't like it if you send too little and servers don't like it if you send too much)- have a single "write" operation with an object where the Content-Length can be figured out quite easily- turn on chunking myself and handle the chunk writing myselfI don't recall having seen the kind of automatic chunking described in the article before (and I'm not too sure whether I'm a fan of it).

评论 #41763515 未加载

评论 #41765879 未加载

评论 #41769673 未加载

评论 #41763874 未加载

评论 #41764629 未加载

pkulak7 months ago

And if you set your own content length header, most http servers will respect it and not chunk. That way, you can stream a 4-gig file that you know the size of per the metadata. This makes downloading nicer because browsers and such will then show a progress bar and time estimate.However, you better be right! I just found a bug in some really old code that was gzipping every response when it was appropriate (ie, asked for, textual, etc). But it was ignoring the content-length header! So, if it was set manually, it would then be wrong after compression. That caused insidious bugs for years. The fix, obviously, was to just delete that manual header if the stream was going to be compressed.

评论 #41764995 未加载

评论 #41763347 未加载

simonjgreen7 months ago

Along this theme of knowledge, there is the lost art of tuning your page and content sizes such that they fit in as few packets as possible to speed up transmission. The front page of Google for example famously fitted in a single packet (I don't know if that's still the case). There is a brilliant book that used to be a bit of a bible in the world of web sysadmin from the Yahoo Exceptional Performance Team which is less relevant these days but interesting to understand the era.<a href="https://www.oreilly.com/library/view/high-performance-web/9780596529307/" rel="nofollow">https://www.oreilly.com/library/view/high-performance-web/97...</a>

评论 #41764446 未加载

评论 #41767578 未加载

评论 #41765278 未加载

flohofwoe7 months ago

Unfortunately the article doesn't mention compression, because this is where it gets really ugly (especially with range requests), because IIRC the content-size reported in http responses and the range defined in range requests are on the compressed data, but at least in browsers you only get the uncompressed data back and don't even have access to the compressed data.

评论 #41764403 未加载

jaffathecake7 months ago

The results might be totally different now, but back in 2014 I looked at how browsers behave if the resource is different to the content-length <a href="https://github.com/w3c/ServiceWorker/issues/362#issuecomment-49011736">https://github.com/w3c/ServiceWorker/issues/362#issuecomment...</a>Also in 2018, some fun where when downloading a file, browsers report bytes written to disk vs content-length, which is wildly out when you factor in gzip <a href="https://x.com/jaffathecake/status/996720156905820160" rel="nofollow">https://x.com/jaffathecake/status/996720156905820160</a>

AndrewStephens7 months ago

When I worked on a commercial HTTP proxy in the early 2000s, it was very common for servers to return off-by-one values for Content-Length - so much so that we had to implement heuristics to ignore and fix such errors.It may be better now but a huge number of libraries and frameworks would either include the terminating NULL byte in the count but not send it, or not include the terminator in the count but include it in the stream.

aragilar7 months ago

Note that there can be trailer fields (the phrase "trailing header" is both an oxymoron and a good description of it): <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Trailer" rel="nofollow">https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Tr...</a>

评论 #41772541 未加载

评论 #41765322 未加载

matthewaveryusa7 months ago

Next up is how forms with (multiple) attachments are uploaded with Content-Type=multipart/form-data; boundary=$something_unique<a href="https://notes.benheater.com/books/web/page/multipart-forms-and-boundary-parameters" rel="nofollow">https://notes.benheater.com/books/web/page/multipart-forms-a...</a>

jillesvangurp7 months ago

It's a nice exercise in any web framework to figure out how you would serve a big response without buffering it in memory. This can be surprisingly hard with some frameworks that just assume that you are buffering the entire response in memory. Usually, if you look hard there is a way around this.Buffering can be appropriate for small responses; or at least convenient. But for bigger responses this can be error prone. If you do this right, you serve the first byte of the response to the user before you read the last byte from wherever you are reading (database, file system, S3, etc.). If you do it wrong, you might run out of memory. Or your user's request times out before you are ready to respond.This is a thing that's gotten harder with non-blocking frameworks. Spring Boot in particular can be a PITA on this front if you use it with non-blocking IO. I had some fun figuring that out some years ago. Using Kotlin makes it slightly easier to deal with low level Spring internals (fluxes and what not).Sometimes the right answer is that it's too expensive to figure out the content length, or a content hash. Whatever you do, you need to send the headers with that information before you send anything else. And if you need to read everything before you can calculate that information and send it, your choices are buffering or omitting that information.

评论 #41765682 未加载

lloeki7 months ago

Chunked progress is fun, not many know it supports more than just sending chunk size but can synchronously multiplex information!e.g I drafted this a long time ago, because if you generate something live and send it in a streaming fashion, well you can't have progress reporting since you don't know the final size in bytes, even though server side you know how far you're into generating.This was used for multiple things like generating CSV exports from a bunch of RDBM records, or compressed tarballs from a set of files, or a bunch of other silly things like generating sequences (Fibonacci, random integers, whatever...), that could take "a while" (as in, enough to be friendly and report progress).<a href="https://github.com/lloeki/http-chunked-progress/blob/master/draft-lnageleisen-http-chunked-progress-00">https://github.com/lloeki/http-chunked-progress/blob/master/...</a>

评论 #41764178 未加载

dicroce7 months ago

At least in the implementation I wrote the default way to provide the body was a string... which has a length. For binary data I believe the API could accept either a std::vector<uint8_t> (which has a size) or a pointer and a size. If you needed chunked transfer encoding you had to ask for it and then make repeated calls to write chunks (that each have a fixed length).To me the more interesting question is how web server receive an incoming request. You want to be able to read the whole thing into a single buffer, but you don't know how long its going to be until you actually read some of it. I learned recently that libc has a way to "peek" at some data without removing it from the recv buffer..... I'm curious if this is ever used to optimize the receive process?

评论 #41765774 未加载

skrebbel7 months ago

I thought I knew basic HTTP 1(.1), but I didn't know about trailers! Nice one, thanks.

评论 #41765491 未加载

nraynaud7 months ago

I have done crazy stuff to compute the content length of some payloads. For context one of my client works in cloud stuff and I worked in converting hdd format on the fly in a UI VM. The webserver that accepts the files doesn’t do chunked encoding. And there is no space to store the file. So I had to resort to passing over the input file once to transform it, compute its allocation table and transformed size, then throw away everything but the file and the table, restart the scan with the correct header and re-do the transformation.

Sytten7 months ago

There is a whole class of attacks called HTTP Desync Attacks that target just that problem since it is hard to get that right, especially accross multiple different http stacks. And if you dont get it right the result.is that bytes are left on the TCP connections and read as the next request in case of a reuse.

63833539507 months ago

My account not open plz help me

63833539507 months ago

Help me sir

Am4TIfIsER0ppos7 months ago

stat()?

TZubiri7 months ago

len(response)

remon7 months ago

Totally worth an article.

_ache_7 months ago

> Anyone who has implemented a simple HTTP server can tell you that it is a really simple protocolIt's not. Like, hell no. That is so complex. Multiplexing, underlying TCP specifications, Server Push, Stream prioritization (vs priorization !), encryption (ALPN or NPN ?), extension like HSTS, CORS, WebDav or HLS, ...It's a great protocol, nowhere near simple.> Basically, it’s a text file that has some specific rules to make parsing it easier.Nope, since HTTP/2 that is just a textual representation, not the real "on the wire" protocol. HTTP/2 is 10 now.

评论 #41763601 未加载

pknerd7 months ago

Why would someone implement the chunk logic when websockets are here? Am I missing something? What are the use cases?

评论 #41763995 未加载

评论 #41763992 未加载

评论 #41765492 未加载

评论 #41766146 未加载

评论 #41766543 未加载

21 comments

hobofan7 months ago

评论 #41763515 未加载

评论 #41765879 未加载

评论 #41769673 未加载

评论 #41763874 未加载

评论 #41764629 未加载

pkulak7 months ago

评论 #41764995 未加载

评论 #41763347 未加载

simonjgreen7 months ago

评论 #41764446 未加载

评论 #41767578 未加载

评论 #41765278 未加载

flohofwoe7 months ago

评论 #41764403 未加载

jaffathecake7 months ago

AndrewStephens7 months ago

aragilar7 months ago

评论 #41772541 未加载

评论 #41765322 未加载

matthewaveryusa7 months ago

jillesvangurp7 months ago

评论 #41765682 未加载

lloeki7 months ago

评论 #41764178 未加载

dicroce7 months ago

评论 #41765774 未加载

skrebbel7 months ago

I thought I knew basic HTTP 1(.1), but I didn't know about trailers! Nice one, thanks.

评论 #41765491 未加载

nraynaud7 months ago

Sytten7 months ago

63833539507 months ago

My account not open plz help me

63833539507 months ago

Help me sir

Am4TIfIsER0ppos7 months ago

stat()?

TZubiri7 months ago

len(response)

remon7 months ago

Totally worth an article.

_ache_7 months ago

评论 #41763601 未加载

pknerd7 months ago

Why would someone implement the chunk logic when websockets are here? Am I missing something? What are the use cases?