Likely "Cowboy" is a transparent proxy added by your mobile service provider. I had a similar thing happening a year ago when the mobile provider used by most of our barcode scanners decided to add a transparent proxy into the loop (without telling anybody).<p>The solution for this problem: Use SSL.<p>I mean: There are already many good reasons to use SSL, but whenever you need to send any kind of mission critical data over the mobile network, you practically <i>must</i> use SSL if you want any kind of guarantees that the data you send to the server is what actually reaches the server (and reverse).<p>Here's my war story from last year: <a href="http://pilif.github.io/2013/09/when-in-doubt-ssl/" rel="nofollow">http://pilif.github.io/2013/09/when-in-doubt-ssl/</a>
Heroku came back and said:<p>Looking through the system, I see that you were sent two emails (in August and September) as several of your apps were migrated to the new routing stack (<a href="https://devcenter.heroku.com/articles/heroku-improved-router" rel="nofollow">https://devcenter.heroku.com/articles/heroku-improved-router</a>). As mentioned in the documentation, the new router follows stricter adherence to the RFC specification, including sensitivity to spaces.<p>...and sure enough, there is a line that says:<p>The request line expects single spaces to separate between the verb, the path, and the HTTP version.<p>So the lesson is: RTFM<p>-G
This very example -- requests were technically illegal all the time without devs realizing, but something in the stack changed to start rejecting them -- demonstrates the fallacy of the "be liberal in what you accept, strict in what you issue" principal. If all the web servers involved had been strict in rejecting the illegal request from the start, they would have noticed the bug in development before deploying to firmware in the field.
The Server: cowboy tag is from an Erlang web server:<p><a href="https://github.com/ninenines/cowboy/blob/master/src/cowboy_protocol.erl#L177" rel="nofollow">https://github.com/ninenines/cowboy/blob/master/src/cowboy_p...</a><p>I'm guessing around here would be interesting to add a test case to handle.<p>As far as whose server this is? I'd guess Heroku or AWS, though it's plenty possible T-Mobile could have devised some proxy to inspect traffic, but seems unlikely they would do so with Cowboy?
I saw it right away - "that HTTP/1.1 looks a bit farther away than it should be..." - and confirmed it by selecting the spaces. I thought it would be a bit more subtle than that... I remember working with a server that violated the HTTP spec by not accepting allowed extra spaces in headers.<p>According to the new HTTP/1.1 RFC 7230, it should be a single space - the previous RFC didn't specify this clearly in the wording, although it is implied by the grammar (SP and not 1 * SP).<p><a href="https://tools.ietf.org/html/rfc7230#section-3.1.1" rel="nofollow">https://tools.ietf.org/html/rfc7230#section-3.1.1</a><p>"A request-line begins with a method token, followed by a <i>single</i> space (SP), the request-target, another <i>single</i> space (SP), the protocol version, and ends with CRLF."<p>I'm surprised there doesn't seem to be any widely-used and easily available HTTP conformance checker - unlike the well-known HTML validators.<p>This is also why monospace fonts are ideal for seeing small but significant differences like this.
This proves a very important pet peeve of mine: Your modern application has a highly dynamic operating point. There is no way you can deploy a system and expect it to be static for eternity. Back in the day with low interconnectivity you could. But today it is impossible.<p>When you build stacks on top of system for which you have no direct control, you must be able to adapt your system. This means you can't statically deploy code without an upgrade path in one way or the other.
Cowboy is quite a well respected we server of the Erlang flavor. I'd guess heroku rejiggered something in their stack, perhaps adding cowboy as a reverse proxy or load balancer in front of their junk.<p>Cowboy apparently shot yor no-good dirty sidewinding web requests in the face.
It's technically correct, according to the HTTP spec there must be a single "SP" character between the elements in the Request-Line:<p>Request-Line = Method SP Request-URI SP HTTP-Version CRLF<p>Source: <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1" rel="nofollow">http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1</a>
Another broken network device which takes it upon itself to mess with TCP connections passing through.<p>I ran into this a few years ago with Coyote Point load balancers. It turns out that if you send HTTP headers to a Coyote Point load balancer, and the last header field is "User-agent", and that field ends with "m" but does not otherwise contain "m", the connection does not go through the load balancer.<p>Complaining to Coyote Point produced typical clueless responses such as "Upgrade your software". (The problem wasn't at my end, but at sites with Coyote Point devices. Fortunately, I knew someone who had a Coyote Point unit, and we were able to force the situation there.) I had our system ("Sitetruth.com site rating system", note the "m") put an unnecessary "Accept" header field at the end of the header to work around the problem.<p>Coyote Point's filtering software is regular-expression based, and I suspect that somewhere, there is a rule with a "\m" instead of "\n".<p>A current issue: there are some sites where, if you make three HTTP requests for the same URL from the same IP address in a short period, further requests are ignored for about 15 seconds. You can make this happen with three "wget" requests. Try "wget <a href="http://bitcointalk.org"" rel="nofollow">http://bitcointalk.org"</a> three times in quick succession. Amusingly, this limiter only applies for HTTP sessions, not HTTPS.
That series of strcat's caught my eye as bad practice. Fine in this case since the destination string is short but horrible in general. Every single one of those calls needs to iterate over the entire existing string to find the string size. The code could be much cleaner with a small macro hiding the incrementation and the casts.
Kudos for sorting this out quickly. Problems like this one can be really difficult to debug.<p>I remember one case where the coefficient table for a polyphase FIR filter we implemented in an FPGA caused huge instability problems in a design. The coefficient table, if I remember correctly, was 32 wide (32 multipliers) and 128 phases long. That's 4096 numbers. The design had about 40 of these tables that would be loaded from firmware into FPGA registers in real time as needed. We built a tool in Excel to be able to compute these tables of FIR coefficients.<p>We got word from a customer that things were not behaving correctly under certain circumstances. We were able to reproduce the problem in the lab but could not find anything wrong with the FPGA, microcontroller or Excel code after about three weeks of work by three engineers. This quickly became a nightmare as it threatened several lucrative contracts and failed to service our existing customer base adequately.<p>I had to put our other two hardware engineers back to work on their existing projects so I took on the debugging process. This was the most intense debugging I've had to do in thirty years of software and hardware development. Lots at stake. The very reputation and financial well being of my business was at stake. Enter 18 hour days, 7 days a week.<p>FOUR MONTHS LATER, at 2:00 AM on a fine Sunday morning without having slept for three days looking at code the bug jumped out at me. We've all had that moment but his one was well "one of those". The problem? We used "ROUND()" in instead of "ROUNDUP()" in calculation that had nothing to do with the FIR filter coefficients but rather affected the programming of counters related to them. This caused timing errors in a state machine that drove the FIR filters. If this were software this would be exactly like having the wrong count in a loop counter. Yup.<p>I re-calculated after making the change and everything worked as advertised. That was the best Monday I've had in years. And I took a long vacation after that.<p>Over four months to find a bug.<p>That's why sometimes it is impossible and even unreasonable to create budgets for software development. One little bug can set you back weeks, if not months.
Assuming the problem originates from something relating to eatabit's infrastructure, the important takeway (for me) would be: Depend as little on 3rd parties as possible.<p>I know this is not a popular opinion among the HN crowd, mainly due to the entire web's love of linking to some other site's js/css to offload cost from their own site. But this makes no sense; you're not really reducing costs, you're just delaying them.<p>People talk about how 3rd parties speed up development or (potentially) reduce costs. But if the success of your business depends on providing a service all the time that has to be reliable, the reliability of your product is directly proportional to the reliability of the 3rd party. And each 3rd party adds additional points of failure. If you don't control whatever service or product the 3rd party is giving you, you will be unable to even attempt to isolate and fix it yourself.<p>Typically the answer to this problem is 'buy a better service contract'. But if the 3rd party doesn't provide 24/7 365 support along with multiple contact methods and harsh penalties for failing to supply you with timely service, you're wasting your money. You don't want to be the guy who has to tell the CIO "Sorry, I can't get a hold of our service provider or they aren't giving me timely updates, so I do not know when our product will be up again."
When learning OCaml, I decided to write a little web client that would bruit force the password on my own home router. I wrote a client, and my router wasn't responding, so I tried having my client fetch pages from Yahoo, and it worked fine.<p>I fired up wireshark and saw that everything looked fine... except that all of my line terminators were shift-in-formfeed instead of carriage-return-newline. It turns out that OCaml uses decimal character escapes instead of octal. (This was back when I was under the impression that portable code avoided use of \n in string literals because someone who misunderstood text mode file handles had told me that Microsoft compilers expanded \n to \015\012.)<p>Apparently someone at Yahoo had experienced enough terribly terribly written web clients that they wrote their HTTP server to accept any two non-space whitespace characters as a line ending.
"our cellular printing api has printed over 9300 food orders for our client restaurants, stadiums and golf courses"<p>Am I the only one who read this as a system using 3D printing to print food? Disappointed to discover it's not that kind of cellular.
I experienced a similar problem with a POP3 utility that I had written years ago. I had been appending an extra space to the end of each text line (before the CRLF ).<p>There were a few people using this utility with no problems until one day a particular POP3 server no longer tolerated my utility's malformed requests.
I've had the same issues when developing with Flask in Python. I forgot to URL encode some query parameters and it worked fine with the local HTTP server.<p>But when I put nginx in front as a proxy, it denied all requests.
If this were my team, I would be unsettled by the fact that we never caught it in testing. Did no one write tests to exercise this part of the app - the one where we're handcrafting HTTP requests?<p>Objectively, you need to write more tests. At the minimum, <i>this</i> bug should have a regression test so that it can never accidentally happen again (say when a dev merges an old branch in for whatever reason).