Post Mortem: A single whitespace character

333 pointsby goleksiakover 10 years ago

27 comments

pilifover 10 years ago

Likely "Cowboy" is a transparent proxy added by your mobile service provider. I had a similar thing happening a year ago when the mobile provider used by most of our barcode scanners decided to add a transparent proxy into the loop (without telling anybody).The solution for this problem: Use SSL.I mean: There are already many good reasons to use SSL, but whenever you need to send any kind of mission critical data over the mobile network, you practically must use SSL if you want any kind of guarantees that the data you send to the server is what actually reaches the server (and reverse).Here's my war story from last year: <a href="http://pilif.github.io/2013/09/when-in-doubt-ssl/" rel="nofollow">http://pilif.github.io/2013/09/when-in-doubt-ssl/</a>

评论 #8515856 未加载

评论 #8515801 未加载

评论 #8515788 未加载

评论 #8517984 未加载

评论 #8515630 未加载

goleksiakover 10 years ago

Heroku came back and said:Looking through the system, I see that you were sent two emails (in August and September) as several of your apps were migrated to the new routing stack (<a href="https://devcenter.heroku.com/articles/heroku-improved-router" rel="nofollow">https://devcenter.heroku.com/articles/heroku-improved-router</a>). As mentioned in the documentation, the new router follows stricter adherence to the RFC specification, including sensitivity to spaces....and sure enough, there is a line that says:The request line expects single spaces to separate between the verb, the path, and the HTTP version.So the lesson is: RTFM-G

评论 #8518018 未加载

jrochkind1over 10 years ago

This very example -- requests were technically illegal all the time without devs realizing, but something in the stack changed to start rejecting them -- demonstrates the fallacy of the "be liberal in what you accept, strict in what you issue" principal. If all the web servers involved had been strict in rejecting the illegal request from the start, they would have noticed the bug in development before deploying to firmware in the field.

评论 #8515193 未加载

评论 #8515330 未加载

评论 #8515791 未加载

评论 #8515989 未加载

评论 #8515168 未加载

评论 #8515679 未加载

评论 #8516366 未加载

评论 #8515195 未加载

评论 #8516299 未加载

评论 #8515811 未加载

评论 #8515189 未加载

评论 #8518681 未加载

评论 #8519777 未加载

spydumover 10 years ago

The Server: cowboy tag is from an Erlang web server:<a href="https://github.com/ninenines/cowboy/blob/master/src/cowboy_protocol.erl#L177" rel="nofollow">https://github.com/ninenines/cowboy/blob/master/src/cowboy_p...</a>I'm guessing around here would be interesting to add a test case to handle.As far as whose server this is? I'd guess Heroku or AWS, though it's plenty possible T-Mobile could have devised some proxy to inspect traffic, but seems unlikely they would do so with Cowboy?

评论 #8515164 未加载

评论 #8515127 未加载

评论 #8517353 未加载

评论 #8516304 未加载

asveikauover 10 years ago

<pre><code> strcpy( ( char * ) commsOrderBuffer, "GET /v1/printer/"); strcat( ( char * ) commsOrderBuffer, ( char * ) settings.getIMEI()); strcat( ( char * ) commsOrderBuffer, "/orders.txt HTTP/1.1\r\n"); strcat( ( char * ) commsOrderBuffer, "HOST: "); strcat( ( char * ) commsOrderBuffer, SERVER_NAME); strcat( ( char * ) commsOrderBuffer, "\r\n"); strcat( ( char * ) commsOrderBuffer, "Authorization: Basic "); </code></pre> What the.... O(n) string concatenations, unnecessary pointer casts, no bounds checking... I think extra whitespace in an HTTP request is not their only problem.

评论 #8516354 未加载

评论 #8517001 未加载

评论 #8518814 未加载

userbinatorover 10 years ago

I saw it right away - "that HTTP/1.1 looks a bit farther away than it should be..." - and confirmed it by selecting the spaces. I thought it would be a bit more subtle than that... I remember working with a server that violated the HTTP spec by not accepting allowed extra spaces in headers.According to the new HTTP/1.1 RFC 7230, it should be a single space - the previous RFC didn't specify this clearly in the wording, although it is implied by the grammar (SP and not 1 * SP).<a href="https://tools.ietf.org/html/rfc7230#section-3.1.1" rel="nofollow">https://tools.ietf.org/html/rfc7230#section-3.1.1</a>"A request-line begins with a method token, followed by a single space (SP), the request-target, another single space (SP), the protocol version, and ends with CRLF."I'm surprised there doesn't seem to be any widely-used and easily available HTTP conformance checker - unlike the well-known HTML validators.This is also why monospace fonts are ideal for seeing small but significant differences like this.

评论 #8519556 未加载

评论 #8515404 未加载

jlouisover 10 years ago

This proves a very important pet peeve of mine: Your modern application has a highly dynamic operating point. There is no way you can deploy a system and expect it to be static for eternity. Back in the day with low interconnectivity you could. But today it is impossible.When you build stacks on top of system for which you have no direct control, you must be able to adapt your system. This means you can't statically deploy code without an upgrade path in one way or the other.

评论 #8515301 未加载

评论 #8515248 未加载

mmlover 10 years ago

Cowboy is quite a well respected we server of the Erlang flavor. I'd guess heroku rejiggered something in their stack, perhaps adding cowboy as a reverse proxy or load balancer in front of their junk.Cowboy apparently shot yor no-good dirty sidewinding web requests in the face.

评论 #8515328 未加载

kirabover 10 years ago

It's technically correct, according to the HTTP spec there must be a single "SP" character between the elements in the Request-Line:Request-Line = Method SP Request-URI SP HTTP-Version CRLFSource: <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1" rel="nofollow">http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1</a>

Animatsover 10 years ago

Another broken network device which takes it upon itself to mess with TCP connections passing through.I ran into this a few years ago with Coyote Point load balancers. It turns out that if you send HTTP headers to a Coyote Point load balancer, and the last header field is "User-agent", and that field ends with "m" but does not otherwise contain "m", the connection does not go through the load balancer.Complaining to Coyote Point produced typical clueless responses such as "Upgrade your software". (The problem wasn't at my end, but at sites with Coyote Point devices. Fortunately, I knew someone who had a Coyote Point unit, and we were able to force the situation there.) I had our system ("Sitetruth.com site rating system", note the "m") put an unnecessary "Accept" header field at the end of the header to work around the problem.Coyote Point's filtering software is regular-expression based, and I suspect that somewhere, there is a rule with a "\m" instead of "\n".A current issue: there are some sites where, if you make three HTTP requests for the same URL from the same IP address in a short period, further requests are ignored for about 15 seconds. You can make this happen with three "wget" requests. Try "wget <a href="http://bitcointalk.org"" rel="nofollow">http://bitcointalk.org"</a> three times in quick succession. Amusingly, this limiter only applies for HTTP sessions, not HTTPS.

Danieruover 10 years ago

That series of strcat's caught my eye as bad practice. Fine in this case since the destination string is short but horrible in general. Every single one of those calls needs to iterate over the entire existing string to find the string size. The code could be much cleaner with a small macro hiding the incrementation and the casts.

评论 #8515643 未加载

评论 #8515906 未加载

评论 #8515785 未加载

kyberiasover 10 years ago

What's the deal with all the scrollbars on this page?

评论 #8515232 未加载

评论 #8515271 未加载

colinbartlettover 10 years ago

It scares me to think all of these requests run over unencrypted HTTP.

评论 #8515175 未加载

robomartinover 10 years ago

Kudos for sorting this out quickly. Problems like this one can be really difficult to debug.I remember one case where the coefficient table for a polyphase FIR filter we implemented in an FPGA caused huge instability problems in a design. The coefficient table, if I remember correctly, was 32 wide (32 multipliers) and 128 phases long. That's 4096 numbers. The design had about 40 of these tables that would be loaded from firmware into FPGA registers in real time as needed. We built a tool in Excel to be able to compute these tables of FIR coefficients.We got word from a customer that things were not behaving correctly under certain circumstances. We were able to reproduce the problem in the lab but could not find anything wrong with the FPGA, microcontroller or Excel code after about three weeks of work by three engineers. This quickly became a nightmare as it threatened several lucrative contracts and failed to service our existing customer base adequately.I had to put our other two hardware engineers back to work on their existing projects so I took on the debugging process. This was the most intense debugging I've had to do in thirty years of software and hardware development. Lots at stake. The very reputation and financial well being of my business was at stake. Enter 18 hour days, 7 days a week.FOUR MONTHS LATER, at 2:00 AM on a fine Sunday morning without having slept for three days looking at code the bug jumped out at me. We've all had that moment but his one was well "one of those". The problem? We used "ROUND()" in instead of "ROUNDUP()" in calculation that had nothing to do with the FIR filter coefficients but rather affected the programming of counters related to them. This caused timing errors in a state machine that drove the FIR filters. If this were software this would be exactly like having the wrong count in a loop counter. Yup.I re-calculated after making the change and everything worked as advertised. That was the best Monday I've had in years. And I took a long vacation after that.Over four months to find a bug.That's why sometimes it is impossible and even unreasonable to create budgets for software development. One little bug can set you back weeks, if not months.

vvpanover 10 years ago

Way to abuse :first-letter.

peterwwillisover 10 years ago

Assuming the problem originates from something relating to eatabit's infrastructure, the important takeway (for me) would be: Depend as little on 3rd parties as possible.I know this is not a popular opinion among the HN crowd, mainly due to the entire web's love of linking to some other site's js/css to offload cost from their own site. But this makes no sense; you're not really reducing costs, you're just delaying them.People talk about how 3rd parties speed up development or (potentially) reduce costs. But if the success of your business depends on providing a service all the time that has to be reliable, the reliability of your product is directly proportional to the reliability of the 3rd party. And each 3rd party adds additional points of failure. If you don't control whatever service or product the 3rd party is giving you, you will be unable to even attempt to isolate and fix it yourself.Typically the answer to this problem is 'buy a better service contract'. But if the 3rd party doesn't provide 24/7 365 support along with multiple contact methods and harsh penalties for failing to supply you with timely service, you're wasting your money. You don't want to be the guy who has to tell the CIO "Sorry, I can't get a hold of our service provider or they aren't giving me timely updates, so I do not know when our product will be up again."

评论 #8519800 未加载

KMagover 10 years ago

When learning OCaml, I decided to write a little web client that would bruit force the password on my own home router. I wrote a client, and my router wasn't responding, so I tried having my client fetch pages from Yahoo, and it worked fine.I fired up wireshark and saw that everything looked fine... except that all of my line terminators were shift-in-formfeed instead of carriage-return-newline. It turns out that OCaml uses decimal character escapes instead of octal. (This was back when I was under the impression that portable code avoided use of \n in string literals because someone who misunderstood text mode file handles had told me that Microsoft compilers expanded \n to \015\012.)Apparently someone at Yahoo had experienced enough terribly terribly written web clients that they wrote their HTTP server to accept any two non-space whitespace characters as a line ending.

jameshartover 10 years ago

"our cellular printing api has printed over 9300 food orders for our client restaurants, stadiums and golf courses"Am I the only one who read this as a system using 3D printing to print food? Disappointed to discover it's not that kind of cellular.

justinsbover 10 years ago

Tangentially, why didn't curl escape the trailing space to %20?

jim_lawlessover 10 years ago

I experienced a similar problem with a POP3 utility that I had written years ago. I had been appending an extra space to the end of each text line (before the CRLF ).There were a few people using this utility with no problems until one day a particular POP3 server no longer tolerated my utility's malformed requests.

weissadamover 10 years ago

I have some advice. Hire a real C programmer. This code is _awful_ and probably full of vulns.

rcconfover 10 years ago

I've had the same issues when developing with Flask in Python. I forgot to URL encode some query parameters and it worked fine with the local HTTP server.But when I put nginx in front as a proxy, it denied all requests.

评论 #8515373 未加载

ericcholisover 10 years ago

Slightly off-topic, but this is why dev posts like this are important. I didn't know eatabit.com was a thing, it it sounds like a great service.

评论 #8518216 未加载

kstrauserover 10 years ago

If this were my team, I would be unsettled by the fact that we never caught it in testing. Did no one write tests to exercise this part of the app - the one where we're handcrafting HTTP requests?Objectively, you need to write more tests. At the minimum, this bug should have a regression test so that it can never accidentally happen again (say when a dev merges an old branch in for whatever reason).

评论 #8516295 未加载

评论 #8517428 未加载

cleanCodeAtWorkover 10 years ago

Are there any languages out there that handle scale and many connections like Erlang does, but with an easier to swallow syntax?

评论 #8515878 未加载

评论 #8515890 未加载

评论 #8516324 未加载

评论 #8516188 未加载

mikeklaasover 10 years ago

cofcdylanover 10 years ago

i'm just glad my city made it to HN.

评论 #8515818 未加载