Software Folklore – A collection of weird bug stories

370 点作者 tpaschalis大约 5 年前

45 条评论

I've posted this story before, but it fits here rather nicely.I had a function that looked like this:<pre><code> void f() { bool flag = true; while (flag) { g(); } } </code></pre> This function would sometimes exit. But that's really all there was to the function. Somehow flag was becoming false, even though nothing ever wrote to it.So you might think about g() smashing the stack, when a variable is mysteriously changing, but you'd expect the return address to also get written, and it wasn't - the function returned from g() to f(), found flag to be false, exited the loop, and returned from f().Eventually I got desperate enough to look at the assembly code produced by the compiler, and I became enlightened. (This was g++ on an ARM, by the way.) flag was being stored in R11, not in memory. (Might have been R12 - it's been a while.) When g() was called, f() just pushed the return address. Then g() pushed R11, because it was going to have its own variable to stash there, and then created space for its stack variables. And one of those variables was smashing the stack by 4 bytes, over-writing the saved flag value from f().Worse, the way the stack was getting smashed was on a call to mesgrecv(). This takes a pointer to a structure and a size, but the relationship between the two isn't what you'd expect. The size isn't the size of the structure, but rather the size of a substructure within that structure. A contractor had gotten that detail wrong when they used that mechanism for IPC between two chips. (They'd gotten it wrong on the sending side, too, so the data stayed in sync.)The net result was that the flag got cleared when four next-door-but-unrelated bytes on another CPU were all zero. It took me a month, off and on, to figure that out.

评论 #23026494 未加载

gumby大约 5 年前

I love these toughies, especially the full ssh one! A true debugging wizard.“Dumb” problems can happen to anyone too. I once walked by the desk of$well_known_open_source_developer who was struggling with a mysterious bug. He’d narrowed it down to the specific function and was groveling in the function setup code (what the compiler generates before your code is called) He asked me to take a look and within seconds I saw an uninitialized variable being read.This is not because he was a bozo! He had decades of experience. It’s simply that sometimes we get slightly wedged and can’t see the thing that is “staring us in the face”. He was embarrassed (so not mentioning his name) but he should not have been. If anything it simply proves that it can happen to anyone.Related to this: at one organization my debugging skills were (spoiler: undeservedly) legendary...literally word got around until some new hire asked me about it months later.Why? I came in one morning to find some folks trying to get some new model of terminals to work with the mainframe. Back then you needed the right combo of byte length, stop bits etc. they asked me if I could fix it and I said sure. As one does I poked at the setting switches and walked off to get my coffee So I could come back and think clearly. By the time I came back all the terminals were in use so I just went on with my day.Apparently I had randomly toggled the necessary bit. But the way the story was told: I had walked in, agreed to help, rubbed my chin then simply pushed the right button and walked off without another word. Which in some sense is true, But gave me completely undeserved credit.

评论 #23028630 未加载

评论 #23033147 未加载

评论 #23024204 未加载

notacoward大约 5 年前

Here's the craziest one that actually happened to me.The company I worked for had installed what's best described as a mini-supercomputer (though we avoided the term) at a site in Boulder. We started getting reports of failures on the internal communication links between the compute nodes ... only at high load, late in the day. Since I was responsible for the software that managed those links, I got sent out. Two days in a row, after trying everything we could to reproduce or debug the problem, I got paged minutes after I'd left (and couldn't get back in) to tell me that it had failed again.Our original theory was that it had to do with cosmic rays causing bit-flips. This was a well known problem with installations in that area, having caused multi-month delays for some of the larger supercomputer installations in the area. But we'd already corrected for that. It wasn't the problem.What it ultimately turned out to be was airflow and cooling. The air's thinner up there, so it carries less heat. But it wasn't the processors or links that were getting too hot. It was the power supply. When a power supply gets warmer it gets less efficient. Earlier in the day or with shorter runs as we tried different things this wasn't enough to cause a problem. With it being warmer later in the day, continuous load for longer periods was enough to cause slight brown-outs, and those were making our links flaky. And of course it would always restart just fine because it had cooled down a bit.The fix ended up being one line in a fan-controller config.

评论 #23027338 未加载

评论 #23019817 未加载

评论 #23018986 未加载

wolfspider大约 5 年前

"Fail on certain moon phases" reminds me of a C++ bug I encountered while trying to set up the demo for PSIP (Digital TV Guide) destined for NAB in Las Vegas. We had programming schedules resembling excel spreadsheets and my job was just to create a good one for the demo. I would spend all night making one and sent it to my boss and each morning would get in trouble for sending in blank schedules and had no idea why. On one occasion I happened to be editing at 3am and noticed all of my edits rolling back one by one. It was actually viewable on the screen as if someone took control of excel and was rolling back each field. My immediate thought was I really need to get some sleep but later we found the auto-save feature inverted itself after 3am exactly and would go through each delta one by one rolling itself back as it had been edited. The bug was found in the calculation of the vernal equinox which moves from 3am to 9pm to 3pm. Since it was triggering the leap year code 6 hours of time would get rolled back edits and all! This was of course 2008 year of the digital transition from analog cable which happened to also be a leap year.

rich_sasha大约 5 年前

I can't scan documents when my daughter is asleep. When she is awake, all is fine, but the minute she goes to sleep, and I'd like to use my free time to scan documents and suchlikes, forget it. I could still print documents on the same device though. Here's what I found:The printer-scanner was connected to wi-fi. The wi-fi router was in my daughter's room, as that is where the cable socket was, tucked just behind a bench in her room. It was also near that bench that her baby monitor camera was standing. It wasn't wi-fi connected, but for whatever reason it interfered with the wi-fi signal. Same with the receiver, if I put it near my laptop, the wi-fi connection would die.The monitor was off most of the time, and on precisely when my daughter was asleep.As for why I could still print, just not scan: presumably that's something to do with the bandwidth, I'm guessing it took more wi-fi bandwidth to send a scanned image than to print a document (I never printed pictures on this printer).

评论 #23018236 未加载

评论 #23018826 未加载

评论 #23018730 未加载

VBprogrammer大约 5 年前

My favourite crazy bug was during a university course on autonomous robotics. One of the other groups was using a a metal castor at the back of the robot along with 2 driven wheels. After a little while their robot would completely crash and stop responding.I'd previously encountered a similar issue which was due to the library code we'd been given which opened a new /dev/i2c file for each motor command, eventually exceeding the max file handles and killing the program. So I assumed it was something sensible like that.Some time later they got all excited and called us over to explain the real reason it was crashing. Their robot would initially work fine for a reasonable period of time. Then when the robot drove over the metallic tape on the floor of the arena it would die. The robot must have been building up a static charge while moving around which would eventually be dissipated when the metal touched the tape.I wouldn't have believed it had they not setup two tests, one outside the normal arena and one inside. Changing the metal castor for a bit of lego fixed the problem.

rogierhofboer大约 5 年前

Display intermittently blanking, flickering or losing video signal:<a href="https://support.displaylink.com/knowledgebase/articles/738618-display-intermittently-blanking-flickering-or-los" rel="nofollow">https://support.displaylink.com/knowledgebase/articles/73861...</a>"Surprisingly, we have also seen this issue connected to gas lift office chairs. When people stand or sit on gas lift chairs, they can generate an EMI spike which is picked up on the video cables, causing a loss of sync. If you have users complaining about displays randomly flickering it could actually be connected to people sitting on gas lift chairs. Again swapping video cables, especially for ones with magnetic ferrite ring on the cable, can eliminate this problem. There is even a white paper about this issue."

评论 #23027464 未加载

评论 #23029670 未加载

angarg12大约 5 年前

I got a tiny one of these.One time I was writing some code in C. I found a bug, the solution seemed pretty obvious, so I fixed it, recompiled the code, and ran it again. The bug was still there.I took a look at the rest of the code in case that I missed something. I couldn't find anything, so I added a few print statements and recompiled. I ran the code and nothing came up.Interesting, apparently the code is not executing the branches it should. I verified the input data and code. It didn't make sense, there had to be some serious bug there that I didn't consider. I added a bunch more prints.Recompile and execute. Still nothing. Wait a minute, THAT doesn't look good. I added a print statement right at the entry point of the program. Nothing.At this point the root problem became apparent; my changes just weren't getting compiled. Phew, problem solved! I cleaned all the cached files and recompiled the source code. Those print statements still weren't coming up.At the end I had to move my source code to another machine and compile it there to get it working. I suspect some global variables or path trickery to be involved, but up to this day I still haven't got a clue what was wrong, or have I seen it happen again.

评论 #23019437 未加载

评论 #23020521 未加载

评论 #23019493 未加载

schemescape大约 5 年前

Obvious in retrospect, but very surprising to my inexperienced past self:I'd been working on some C code for an hour or two. It wasn't behaving how I expected it to (and at the time I knew nothing of debuggers), so I added a print statement and recompiled. I got a compilation error: something like "Syntax error on line 123: #incl5de <stdio.h>". Shocked, I scrolled to that line in my text editor to fix the typo, but it wasn't there. I compiled the same code again and there were no errors.Turns out there wasn't a bug in my code! I immediately shut down my computer because my RAM was going bad. To this day, what surprises me most is that my computer was able to successfully boot and behave normally for an hour or two, even though random bits were apparently being flipped.

评论 #23020774 未加载

评论 #23020305 未加载

评论 #23079417 未加载

评论 #23026554 未加载

arethuza大约 5 年前

I remember reading one years ago where someone had a problem installing new software on some embedded device - whatever they did it came up "checksum is bad".After much testing they eventually realised that the checksum literally was the hex "bad".

stevesimmons大约 5 年前

I have two favourite bugs, one weird and one dumb.Weirdest one was an IDE where the colorizer gave up on source lines longer than 998 chars. Instead it rendered the whole line as background, i.e. invisible. I once wasted two hours debugging a program with an invisible line of code!The dumbest was a postage billing system for a bank using a third party Print-and-Mail company. Somehow the billing system went live adding the previous day's total postage costs to itself, then adding the new day's postage. These expontentially growing totals were then paid automatically by the accounting system each night... So it goes live, ... and a week later Finance gets an alert the account is overdrawn... They actually paid out nearly $1b in postage costs before hitting their internal credit limit with the bank's treasury.

评论 #23022534 未加载

indymike大约 5 年前

I love this article.My most recent bizarre bug: a coworker came to me with a bug where no matter what he tried, he could not get an if some_var is null to be true. The debugger would show the value was null. The logger showed the value was null, but the if statement would not work. After a morning of trying to fix it, he asked if I would take a look. I told him to put the null in quotes in the if. It worked. Turned out a JavaScript library had a bug where it would use the string "null" instead of null.

评论 #23019270 未加载

评论 #23020391 未加载

jtlienwis大约 5 年前

My favorite was when I was working at SGI after it had taken over Cray Research. I was one of the lowly Cray guys in Wisconsin working with the wonderkind in California. I was to run the regression tests on the chip being design in California using some software that they had provided. I would run the tests, but some days they would crash in the middle of the night. Then the California guys would be angry that they got no tests results. I started debugging the code and got to a program called lswalk that would dole out jobs to the dozens of servers to be 'run'. The code was written by a hot shot young MIT graduate, but I was sure that the problem was with this code. I got the source code and started looking for problems and one thing I found was that if one of the servers resplonding with error the code would print out an error message. One problem though... The error string printed had an uninitialized string, so that when the printf routine would search for an end of string that was never there, probably overwriting buffers and crashing code all over the place. So one lesson is that even the best and brightest make mistakes. Sometimes I wonder how we accomplished anything in those days with software that had so may trap doors beneath it.

rikroots大约 5 年前

The very first time my company farmed me out to work onsite with a client. Day 1, Job 1: download the client's website code and get it running on my laptop.... It just wouldn't work. Everything I tried - failed. Everything else on my laptop was working fine, except this code. Everyone else who had ever downloaded the code had managed to get it working on their machines within a couple of hours. Colleagues working onsite with me tried to help, but everything they tried - failed. Finally the decision was taken to reset my laptop to factory defaults and reinstall everything. That took up half of Day 2. Tried to get the client's site running - failed! Things were beginning to get really embarrassing - all this was happening in full view of the clients. In desperation, my company called me back to their offices and issued me with a new laptop. Back onsite, the code downloaded ... and worked first time!Turned out that the issue was that my hard drive filesystem had been setup (not by me!) as case-sensitive, and the client code included a file with an all-caps filename, which the code called using a lowercase string. Almost lost my job over that one.

hnick大约 5 年前

My weirdest that I can recall right now was a PDF file that would not print. Since the printer was typically unhelpful with the error message, as was support (this is a room-sized commercial printer but we didn't get the help I'd really expect), I had to dive into it myself.Long story short, whatever had produced the PDF had also embedded a TrueType font where one character was named //something. This is fine. The character just has a weird name, but it works. It's technically up to spec AFAIK, and I got it out of the PDF with ttfdump to have a look at it.Well the printer's internal RIP, unknown to us, converted the PDF to Postscript when rasterising. And //something is called an "immediately evaluated name" which I forget the details for, but basically this font character, interpreted as postscript, was causing a lookup for a named variable which did not exist. Hence the crash.I had a similar one where Adobe InDesign had been used to make a PDF where someone had selected the words to change the font, but not the spaces between (or perhaps they did, and it was a bug). This meant that the PDF included a subset font that only included the space character. Since the space character is not drawn, this resulted in a 0 byte long glyf table. Based on my reading of the TrueType font spec at the time, this isn't really proper.Printer didn't like that one bit and died as it does to anything that smells slightly wrong. Adobe said it was fine and up to spec though, apparently 'TrueType' has a different meaning inside a PDF :)

newswasboring大约 5 年前

My favorite out of these is the 500 miles email limitation one. I work mostly on big bulky manufacturing equipment but my job is to abstract out the computing part. This story reminds me that every time I want to do something I am still limited by physics. I am reminded of this story whenever the hardware people ask me to insert an artificial delay in computation.

评论 #23019613 未加载

ljm大约 5 年前

I love programmer stories like this. My favourite personal experience was on my first Ruby on Rails project after first moving to London. I was pretty green at the time, having had only a few years of PHP experience under my belt and little else.We had to build a Rails app around a poker game. We didn't own the source to the poker game or its API, but we had to embed it nonetheless. We had this really strange issue where some people, under a certain circumstance, couldn't get into the game. It would just boot them out. Me and my team mate must have poured through the Ruby code dozens and dozens of times and found no evidence of this bug, no ability to reproduce it; bearing in mind I was still learning the ropes and jumping head first into an unfamiliar codebase is quite daunting.Eventually I decide to get my hands dirty and I start poking into this game engine. We embedded it as s flash widget, but the server doing most of the work was written in a mix of C++ and Python. I didn't fully comprehend what I was looking at but, even though things looked suspicious, I couldn't put my finger on an actual problem until I looked at the API written in Flask and noticed that one line of code didn't look like any other.<pre><code> some_value = params['some_key'] </code></pre> If the request didn't contain the parameter `some_key` then this would raise a KeyError.After maybe three solid weeks of trying to debug this thing, I submitted a one line patch:<pre><code> some_value = params.get('some_key') </code></pre> It's not quite as weird or as fun as most examples but for me personally it was such a great lesson in debugging and being curious about unfamiliar stuff, rather than closed off or afraid.

KingOfCoders大约 5 年前

I had this one.We were using mSQL in the 90s for web projects. A very important customer wanted a "real" database so we bought DB2. Because we didn't have an IBM plattform or Solaris we went with Windows NT.Everyhing went fine, until one day we recognized the website being slow. Investigating brought the database as the culprit. So I went there and logged into the NT box in the data center and checked the DB2. Everything was fast. Back to my desk and the database was slow again after some time. Back to the NT server and the same thing happend.After quite a long time I found the real culprit. The NT pipes GL software render screen blanker. After some time without interaction the screen blanker started up and took all the CPU. So the database and the website went slow. Someone had set the screenblanker to the nice GL pipes renderer.[Searching the web, IBM introduced DB2 for Windows NT 31.10.1995 and I went to Cebit that year to check it out]

评论 #23030973 未加载

teddyh大约 5 年前

See also:“COMPUTER-RELATED HORROR STORIES, FOLKLORE, AND ANECDOTES”<a href="https://www.cs.earlham.edu/~skylar/humor/Unix/computer.folklore.from.net.rumors.html" rel="nofollow">https://www.cs.earlham.edu/~skylar/humor/Unix/computer.folkl...</a>“Computer Stupidities”<a href="http://www.rinkworks.com/stupid/" rel="nofollow">http://www.rinkworks.com/stupid/</a>

BitwiseFool大约 5 年前

We're actually working on a collection of such stories internal to our division. We've found that these tales are a great way of helping people understand the complexities and quirks of our nearly 3 decade old code base.

评论 #23022629 未加载

marcodave大约 5 年前

First month or so at my new employer, big consultancy firm for a financial institution. Had a fairly complex distributed monolithic application integrated with Tibco EMS, Oracle DB and distributed XE transactions.Regularly, but randomly, in production, after receiving a good amount of messages in the input queues, (which then got rerouted to other event queues for parallel processing) some DB transactions simply were getting stuck. Not rolled back, but stuck in limbo -- after a while the DB simply refused new transactions because so many were stuck. Nobody got a clue on why that was happening, it meant regular manual restart of the services and re-feeding of the failing messages. Users started to get fed up and the project threatened to fail.Got into it, after couple of weeks of investigations and trial and errors with all possible weird flags, turned out that the version of Tibco EMS had a wierd behavior with distributed transaction when the queues got full of messages (queues had 50MB size limit).Instead of rolling back gracefully the JMS+JDBC XE transaction, it...kinda exited with an IO error.Turned out that newer versions of Tibco EMS fixed that issue, but no way to ask ops to install that new version. Since upgrading was out of the question, the actual fix was to enable message compression to limit the size of the messages coming into the queues, turned out that the XML we sent there were up to 1.5MB (!)After discovering that, became basically a war hero and respected by the client as the "savior of the project". Good times.

评论 #23044049 未加载

coreyp_1大约 5 年前

This one happened last night. A student contacted me because her Anaconda Jupyter notebook (installed on her laptop) just wouldn't connect to the Python kernel. (The notebook itself would load, though, meaning that the server was running fine. It's just that the kernel and its websocket was failing.) I should point out that, because of COVID-19, this troubleshooting was over Zoom, which complicated the diagnosis a bit.She had not been using Jupyter for several months, as we have been writing stand-alone programs in class using Spyder (the editor that comes with Anaconda), and the command line, and Jupyter had worked the last time that she tried it.We restarted everything, and still the problem was there. I helped her to update everything, but that didn't solve the problem.Finally, I looked at the error messages in the console where the Jupyter server is running. It had a huge list of errors, all relating to the pickle library.We had done an exercise with pickle in the class, but nobody had reported a similar problem. When we looked in her classwork directory, though, we saw that she had created a "pickle.py" file when she was testing something with pickle. But, at that point in the class, we were working in the command line, and everything (including Spyder) still worked just fine.Evidently, this was the cause of Jupyter's problem. When trying to start the Python kernel in Jupyter, it imported pickle, and evidently it imported her test file rather than the actual library. The fix was simple: we renamed her test file, and everything worked perfectly.

darekkay大约 5 年前

There's a GitHub repository for such stories [1]. I've even contributed one of my stories: "Script crashes before 10 a.m." [2][1]: <a href="https://github.com/danluu/debugging-stories" rel="nofollow">https://github.com/danluu/debugging-stories</a>[2]: <a href="https://darekkay.com/blog/script-crashes-before-10/" rel="nofollow">https://darekkay.com/blog/script-crashes-before-10/</a>

nervousvarun大约 5 年前

This scratches a "thedailywtf" itch I had forgotten I had :)

yxhuvud大约 5 年前

One favorite in the category is always the good old "Can't print on tuesdays" ubuntu bug that has been submitted here a bunch of times: <a href="https://bugs.launchpad.net/ubuntu/+source/cupsys/+bug/255161/comments/28" rel="nofollow">https://bugs.launchpad.net/ubuntu/+source/cupsys/+bug/255161...</a>

me_again大约 5 年前

I had a function which only failed at 8 or 9 minutes past the hour.It parsed a string containing the timestamp and "08" or "09" was interpreted as an invalid octal number. Argh.

评论 #23044313 未加载

throwaway3563大约 5 年前

Had a flaky unit test that would randomly fail with some random Chinese character in the output.The test was running a log parsing tool against a temporary file that had a pseudo-SQL syntax where you could “select ... from c:\Users\...\temp\abcd1234.xyz\testdata.dat”. The temporary directory was a randomly generated name so that the folder was guaranteed to be empty before every execution of the test.The test failed on the rare occasion that the randomly generated temp dir consisted of the letter ‘u’ plus four characters that were valid hex digits. When this happened the randomly generated dir name interacted with the backslash before it and become a Unicode escape sequence. It was easy to fix but that test was flaky for months before anyone worked out why.

chiph大约 5 年前

A bank I did some contracting for had a problem where their Token-Ring network would crash at random intervals during the day in one of their branches. It would also crash at night, but the times when it would happen were more predicable.And that was the clue they needed to solve the problem - it turned out that the wiring installers had run the cable up the elevator shaft. When the elevator stopped at a certain floor the door motor was sometimes interfering with the signal. The more-regular nightly disruptions were because of the security guard making his rounds.It turned out that run was pretty close to the length limit for 16mbps Token-Ring, so they added a repeater in the middle to boost the signal strength.

Tade0大约 5 年前

Was looking for the 500-mile email - it's there.

nieve大约 5 年前

I eventually gave up without finding the issue, but somewhere deep inside one version of the Sphinx full text search software was a bug that would sometimes switch query got what result set. It only happened sometimes when queries were within a few seconds of each other, but it wouldn't happen with only one front end process even in multithreaded mode and would disappear if requests were _too_ close together. If I'd found a way to reproduce it I'd have submitted it to the Sphinx team, but after a few days of potentially private info leaking I gave up and moved to PostgreSQL's FTS.

kazinator大约 5 年前

Similar to the train being stopped by a toilet flush, in the 90's, I worked with devices based on Microsoft's Pocket PC OS. These were equipped with wireless radios. The transmission of a packet caused some interference that the device interpreted as a click on the screen. The cursor was over the [X] to close the application window, so the application would just quit, looking like it crashed.

villuv大约 5 年前

IBM Java 1.1.8 that was embedded into Lotus Notes 5 (if I remember correctly) didn't have 29th of April if it happened to be on Tuesday.When you constructed a Date object of 29th of April with such a year that it was a Tuesday, you get the 30th of April when you read back the value. Took a while to figure out why date calculations were sometimes off. The flux of expletives was impressive when we finally did...

评论 #23030705 未加载

themeiguoren大约 5 年前

Not software, but along the lines of cool engineering stories one of my favorites is this one about fixing 230 kV, many-hundred-amp, 10 mile long coax cable in Southern California.<a href="https://www.jwz.org/blog/2002/11/engineering-pornography/" rel="nofollow">https://www.jwz.org/blog/2002/11/engineering-pornography/</a>

MaxBarraclough大约 5 年前

Glad to see the More Magic story in the list.

benibela大约 5 年前

I just had a weird bug in a programming competition.You basically had to sort the English letters that occur in a text according to their frequency descending. Except the one letter that occurs the least, needs to be sorted as if it occurs the most.The expected output of the sample case was TPFOXLUSHBI ran my program on the sample, the output looked correctly; then I submitted it, and the judge said it failed the sample case. In fact, it was printing ͲPFOXLUSHBThat nearly looks like the correct output.I had confused two variables and it was printing the frequency count as codepoint rather than the letter. But such a coincidence that it looks the same

arafalov大约 5 年前

I used to work as a senior technical support for BEA Weblogic and had all sorts of crazy situations to debug remotely. Including one time when I had to get a person to edit a config file in Vim (which they never used before), on Unix (which they never used before), with me guiding them by phone (no visuals).This is the one I recorded that seems to fit into the current theme: <a href="https://www.outerthoughts.com/2004/10/perfect-multicast-storm/" rel="nofollow">https://www.outerthoughts.com/2004/10/perfect-multicast-stor...</a> (tldr: multicasting on 237.0.0.1 is bad).And if somebody really understands network and multicast, I would love to know whether I actually nailed the problem or just made it go away accidentally. I have no problems with being wrong, especially this much later :-)

blurryroots大约 5 年前

Thanks, this is hilarious! "Okay! I'm braking now", definitly my new going to the toilet catch phrase.

teddyh大约 5 年前

I'm very disappointed that the very first entry in the list is a bogus story confirmed to be false.I mean, if you simply want general computing legends of unconfirmed veracity, read “The Devouring Fungus” by Karla Jennings.

评论 #23021687 未加载

raverbashing大约 5 年前

The SSH one is brilliantI find that most of these "hard problems" are something small, so something that's almost unnoticeable to not break immediately but that makes it kinda work.Now finding exactly what is the trick

DesiLurker大约 5 年前

for me it was 21 day bug for a broadcast video encoder, some internal frame counter was coded with int instead of int64. would reset every 21 days. fun to debug it was not!

smitty1e大约 5 年前

It is good to pore over these legendary tales. They come in handy when we need to break out of the moment and try something outrageous to solve the problem.

robocat大约 5 年前

Crash cows: “There were often significant food shortages in the Soviet Union, and the government plan was to mix the meat from Chernobyl-area cattle with the uncontaminated meat from the rest of the country. This would lower the average radiation levels of the meat without wasting valuable resources.”There is some sense to that: low levels of radiation are not a cancer risk last I read - everything we eat is slightly radioactive. That said, I can’t think how significantly radioactive cows could be “diluted“ enough.

评论 #23035807 未加载

评论 #23033263 未加载

andrepd大约 5 年前

The Crash Bandicoot story is my favourite.

评论 #23018553 未加载

pascalmahe大约 5 年前

Always impressed by Mel's story.

评论 #23026054 未加载

superice大约 5 年前

So at my previous employer we still had an old frontend running an atrociously old version of EmberJS. I run the build but it suddenly fails. We were on a three weekly release schedule, so I figure it must have happened somewhere in the past three weeks, about 400 commits. So I start Git Bisecting, teaching it to a handful of my more junior colleagues as I go along. It took us way to long to figure out that the original build also failed however.So my teammates wish me good luck, and I go off on a journey debugging what the issue actually is. As it turns out, the horribly old Ember.JS CLI package version we're using is a version called '0.2.0-beta'. That did not bode well. This frontend of course did not use the nice yarn dependency pinning, just a regular old package.json file, so I go tracing the error into the dependencies.Eventually I trace the thing do a dependency nested three layers deep or so. A library added a deprecation warning when being used. That in itself is not so bad, but it did that using an injected logging framework from the package using that library. Except that wasn't introduced until a way later version. Ofcourse this tiny little addition could never cause any breakage, so this was released as a semver bugfix release.The commit time shows it was 23:00 local time (<a href="https://github.com/goldenice/ember-cli-babel/commit/c4c95d6f1637bfb8988f68dcacbaa436c6eb94bb" rel="nofollow">https://github.com/goldenice/ember-cli-babel/commit/c4c95d6f...</a>) when I figured the problem out and committed a fix. So I submit a PR to the library, figuring that if the author happens to be awake I won't have to figure out a way to pin the dependency to an earlier version (which would have been easy in a regular dependency, but this was a global dependency, where it's not as trivial as switching from npm 2.x or 3.x to yarn)The author thankfully responds almost immediately, asking me to provide a fallback to console.warn instead of skipping the deprecation entirely. Makes sense, so I update my PR, submit it within a few minutes, and I see that the author immediately publishes a new version. Finally something works out for me. Or so I thought.As it turns out the author made a tiny stylistic fix in my code. Except that tiny stylistic fix butchers my carefully crafted if statement, and now the code is broken again. It took me a while to figure out that the new version of the dependency WAS being used, but was also broken.So I contact the author again, explain the situation. They changed the code immediately, pushing out another update. In the meantime I figured out how to do dependency pinning and all was well with the world again.And that kids, is the story of how I came to appreciate transitive dependency pinning as a really useful feature.(It's still amazing to me by the way that I can contact somebody that wrote some random code that our code happens to rely on and get a response within half an hour.)

45 条评论

AnimalMuppet大约 5 年前

评论 #23026494 未加载

gumby大约 5 年前

评论 #23028630 未加载

评论 #23033147 未加载

评论 #23024204 未加载

notacoward大约 5 年前

评论 #23027338 未加载

评论 #23019817 未加载

评论 #23018986 未加载

wolfspider大约 5 年前

rich_sasha大约 5 年前

评论 #23018236 未加载

评论 #23018826 未加载

评论 #23018730 未加载

VBprogrammer大约 5 年前

rogierhofboer大约 5 年前

评论 #23027464 未加载

评论 #23029670 未加载

angarg12大约 5 年前

评论 #23019437 未加载

评论 #23020521 未加载

评论 #23019493 未加载

schemescape大约 5 年前

评论 #23020774 未加载

评论 #23020305 未加载

评论 #23079417 未加载

评论 #23026554 未加载

arethuza大约 5 年前

stevesimmons大约 5 年前

评论 #23022534 未加载

indymike大约 5 年前

评论 #23019270 未加载

评论 #23020391 未加载

jtlienwis大约 5 年前

rikroots大约 5 年前

hnick大约 5 年前

newswasboring大约 5 年前

评论 #23019613 未加载

ljm大约 5 年前

KingOfCoders大约 5 年前

评论 #23030973 未加载

teddyh大约 5 年前

BitwiseFool大约 5 年前

评论 #23022629 未加载

marcodave大约 5 年前

评论 #23044049 未加载

coreyp_1大约 5 年前

darekkay大约 5 年前

nervousvarun大约 5 年前

This scratches a "thedailywtf" itch I had forgotten I had :)

yxhuvud大约 5 年前

me_again大约 5 年前

I had a function which only failed at 8 or 9 minutes past the hour.It parsed a string containing the timestamp and "08" or "09" was interpreted as an invalid octal number. Argh.

评论 #23044313 未加载

throwaway3563大约 5 年前

chiph大约 5 年前

Tade0大约 5 年前

Was looking for the 500-mile email - it's there.

nieve大约 5 年前

kazinator大约 5 年前

villuv大约 5 年前

评论 #23030705 未加载

themeiguoren大约 5 年前

MaxBarraclough大约 5 年前

Glad to see the More Magic story in the list.

benibela大约 5 年前

arafalov大约 5 年前

blurryroots大约 5 年前

Thanks, this is hilarious! "Okay! I'm braking now", definitly my new going to the toilet catch phrase.

teddyh大约 5 年前

评论 #23021687 未加载

raverbashing大约 5 年前

DesiLurker大约 5 年前

for me it was 21 day bug for a broadcast video encoder, some internal frame counter was coded with int instead of int64. would reset every 21 days. fun to debug it was not!

smitty1e大约 5 年前

It is good to pore over these legendary tales. They come in handy when we need to break out of the moment and try something outrageous to solve the problem.