Node.js incorrectly parses HTTP methods

165 pointsby willlllalmost 12 years ago

18 comments

chrismorganover 11 years ago

I've been working on implementing a solid HTTP library in Rust, currently at <a href="http://github.com/chris-morgan/rust-http" rel="nofollow">http://github.com/chris-morgan/rust-http</a>. Servo has been using Joyent's HTTP parser and this is a problem that I had observed with it. Yes, it is pretty badly implemented, but it's not a mess because of the style of code—that's for performance. It's only a mess because it's inconsistent and incorrect.Reading the HTTP method in a high-performance way does lead to superficially ugly code. That's why code generation is good. Ragel has been mentioned and I intend to seriously consider using it, but for the moment my own HTTP method reading code is generated with:<pre><code> generate_branchified_method( writer, branchify!(case sensitive, "CONNECT" => Connect, "DELETE" => Delete, "GET" => Get, "HEAD" => Head, "OPTIONS" => Options, "PATCH" => Patch, "POST" => Post, "PUT" => Put, "TRACE" => Trace ), 1, "self.stream.read_byte()", "SP", "MAX_METHOD_LEN", "is_token_item(b)", "ExtensionMethod(%s)"); </code></pre> This is pleasantly easy to read and meaningful.This generates the high performance artwork shown at <a href="http://sprunge.us/HdTH" rel="nofollow">http://sprunge.us/HdTH</a>, which supports extension methods correctly. (Rust's algebraic data types are marvellous for many things in implementing such a spec.)

评论 #6242413 未加载

kiwidrewover 11 years ago

The most interesting part (to me) is that the server will handle e.g. "PUN" as though it actually said "PUT". I wonder if this could be used as an attack vector?Sounds a lot like a "confused deputy" situation: imagine that your L7 firewall has a rule to reject any PUT request, but it sees PUN and thus allows the request to pass through to node.js, which then treats it as though it were actually PUT.

评论 #6241342 未加载

评论 #6242311 未加载

hosay123over 11 years ago

The level of bikeshedded micro-optimization going on in that file is hilarious. The whole thing could be swapped out with a Ragel parser and nobody would notice a thing

评论 #6241299 未加载

评论 #6242716 未加载

评论 #6241403 未加载

评论 #6243547 未加载

评论 #6242364 未加载

评论 #6254747 未加载

chrismorganover 11 years ago

LINK was added after HTTP/1.0, and removed before HTTP/1.1. <a href="http://tools.ietf.org/html/draft-ietf-httpbis-method-registrations-12" rel="nofollow">http://tools.ietf.org/html/draft-ietf-httpbis-method-registr...</a> is being cited, but that is still a draft and the registry that refers to does not yet exist. I believe it is thus fair to say that LINK is not a standard method?

评论 #6242123 未加载

general_failureover 11 years ago

To people saying this is all micro optimization - have you guys actually measured? If so, please post the numbers.It's presumptuous to simply say 'all this is unnecessary' unless you have measured it and we have no reason to believe the author hasn't measured it.BTW, the file is copyright nginx.

andreineculauover 11 years ago

A few points to make:- I value software that keeps to the spec, because it's the spec that I (as a dev or non-dev) refer to. You never hear "NodeJS has HTTPish module", nor do you read documentation of that module's concepts and behaviour. Those are defined in the spec, and the __fill_in_with_any_language__ HTTP module just implements those definitions.- Optimizations, simplifications, corrections should be done in the spec, whenever the you find them at implementation-time.But until now there has not been ONE HTTP server that grasps and handles the HTTP specs in their whole. So then, I find it hilarious to read that about optimizations when neither of us have the whole picture.That said, I don't think it's Node.js to blame here (albeit they do have weird views of standards: <a href="https://github.com/joyent/node/issues/4850" rel="nofollow">https://github.com/joyent/node/issues/4850</a>) but HTTP itself because the spec's abstraction levels have been far away from the implementations' reach. HTTPs concepts are gorgeous but they are worth nil if implementation is "hard" and never done properly.Longer story at: <a href="http://andreineculau.github.io/hyperrest/2013-06-10-http-hell-no/" rel="nofollow">http://andreineculau.github.io/hyperrest/2013-06-10-http-hel...</a>

spullaraover 11 years ago

There are plenty of good ways to optimize this code. They didn't pick any of them. What I find more surprising than anything is that they didn't just optimize GET and handle the rest generically.

sandfoxalmost 12 years ago

It should probably be noted that: a) you don't have use node's built in HTTP server (yeah, I know, nearly everyone will), you are more thn free to write your own or use one from it's module repository (npm) b) The entire HTTP module is currently being over-hauled in the 0.11.X development branch and the changes should appear in the stable 0.12.x releases.Out of interest has anyone seen what other web servers support for these more 'esoteric' verbs is like?

评论 #6241149 未加载

评论 #6241479 未加载

评论 #6240864 未加载

评论 #6242597 未加载

评论 #6242385 未加载

nkuttleralmost 12 years ago

Nice catch, reading the nodejs code is really discouraging. But the post points to the relevant specs, so I guess this should be fixed rather sooner than later. Initial response looks good: <a href="https://github.com/joyent/node/issues/6078" rel="nofollow">https://github.com/joyent/node/issues/6078</a>

lnanek2over 11 years ago

Not a big deal. Even the REST people have long been just sending normal GET and POST and including some indicator that they really want it treated as some other verb.

评论 #6243303 未加载

tszmingover 11 years ago

No one remembered the conversations between Ryan Dahl and Zed Shaw on the http parser two years ago?[1] <a href="https://news.ycombinator.com/item?id=2549403" rel="nofollow">https://news.ycombinator.com/item?id=2549403</a>[2] <a href="https://twitter.com/zedshaw/status/15714602817" rel="nofollow">https://twitter.com/zedshaw/status/15714602817</a>

评论 #6244696 未加载

bslatkinover 11 years ago

Yet another reason to remove verbs from the HTTP spec: <a href="http://www.onebigfluke.com/2013/08/lets-remove-verbs-from-http-20.html" rel="nofollow">http://www.onebigfluke.com/2013/08/lets-remove-verbs-from-ht...</a>

评论 #6242301 未加载

lambyover 11 years ago

Hm, didn't nginx do something similarly "ugly" for a while? Like inspecting the (completely making it up here) second character to see if it's 'E'?A quick look now suggests they are using a parser generator now.

评论 #6242829 未加载

smackfuover 11 years ago

I've seen this kind of thing before. Someone falls in love with their optimization ("we can parse GET with a single int switch!") and when it is pointed out that it doesn't handle the spec correctly, and that handling the spec correctly would make this as slow as the naive way, it still ends up in the code because "it's good enough."

verroqover 11 years ago

Why not use a trie?

评论 #6241370 未加载

mulokaover 11 years ago

I imagine this could this be used to kill a running instance of node.js in production?

评论 #6241026 未加载

评论 #6241021 未加载

Xorlevover 11 years ago

At least they're looking at fixing it.

评论 #6242181 未加载

escaped_hnover 11 years ago

Clearly a case of premature optimization. What is the overhead of testing equality on one char vs the entire string?

评论 #6242048 未加载

18 comments

chrismorganover 11 years ago

评论 #6242413 未加载

kiwidrewover 11 years ago

评论 #6241342 未加载

评论 #6242311 未加载

hosay123over 11 years ago

The level of bikeshedded micro-optimization going on in that file is hilarious. The whole thing could be swapped out with a Ragel parser and nobody would notice a thing

评论 #6241299 未加载

评论 #6242716 未加载

评论 #6241403 未加载

评论 #6243547 未加载

评论 #6242364 未加载

评论 #6254747 未加载

chrismorganover 11 years ago

评论 #6242123 未加载

general_failureover 11 years ago

andreineculauover 11 years ago

spullaraover 11 years ago

There are plenty of good ways to optimize this code. They didn't pick any of them. What I find more surprising than anything is that they didn't just optimize GET and handle the rest generically.

sandfoxalmost 12 years ago

评论 #6241149 未加载

评论 #6241479 未加载

评论 #6240864 未加载

评论 #6242597 未加载

评论 #6242385 未加载

nkuttleralmost 12 years ago

lnanek2over 11 years ago

Not a big deal. Even the REST people have long been just sending normal GET and POST and including some indicator that they really want it treated as some other verb.

评论 #6243303 未加载

tszmingover 11 years ago

评论 #6244696 未加载

bslatkinover 11 years ago

评论 #6242301 未加载

lambyover 11 years ago

评论 #6242829 未加载

smackfuover 11 years ago

verroqover 11 years ago

Why not use a trie?

评论 #6241370 未加载

mulokaover 11 years ago

I imagine this could this be used to kill a running instance of node.js in production?

评论 #6241026 未加载

评论 #6241021 未加载

Xorlevover 11 years ago

At least they're looking at fixing it.

评论 #6242181 未加载

escaped_hnover 11 years ago

Clearly a case of premature optimization. What is the overhead of testing equality on one char vs the entire string?

评论 #6242048 未加载