<a href="http://web.mit.edu/6.033/2007/wwwdocs/assignments/handson-liability.html" rel="nofollow">http://web.mit.edu/6.033/2007/wwwdocs/assignments/handson-li...</a><p>>I fully recognize that there are dangers and risks to which I may be exposed by participating in Reproduction of Therac-25 accidents . The following is a description and/or examples of significant dangers and risks associated with this activity
Acute gullibility, Failure to understand April Fool's jokes, Night terrors associated with medical radiation machines .
Leveson's fantastic Therac-25 paper is probably the most important document in the formative years of my young sweng career.<p>I still re-read it every couple of years, and it's held up a lot better than one of my other early favorites, which was Feynman's appendix to the Challenger report. In the sense that I still draw new thoughts and realizations from it as I re-read it with additional experience in some of the engineering and organizational disciplines it touches. Sad as it is, it's got a little bit of everything.<p>It's definitely got my vote for the <i>I Ching</i> of critical systems engineering.<p>Spend the time. Chances are you'll remember it.
I remember reading --- and vehemently disagreeing --- with the report on the incidents, which danced around the matter but didn't point directly at the underlying cause: excessive complexity in the software, which easily created bugs and hid them. For example, they used multiple threads and a whole OS, when a simple loop would've been sufficient, perhaps in a misguided attempt at trying to keep the UI "responsive"; there would not be any race conditions if everything ran in the same thread.<p>As Tony Hoare says: "There are two ways to develop software: Make it so simple that there are obviously no bugs, or so complex that there are no obvious bugs."
At a Starbucks years ago, a woman next to me requested to plug into my USB charger, which had an open port. Since any interaction of mine with a new person was doomed to awkwardness, of course I ended up with the sadly comical three-try, two-rotation final insertion. She made a crack about Schroedinger's USB port and I smiled, then she began to explain to me that it was a physics thing. After I replied that, yeah, that was my undergrad. It turned out to be hers as well. Then we found out that we had both gone into programming. She was doing programming in medicine, and, when prompted for further details, mentioned working on the software for one of those multi-beam radiation gizmos.<p>I mentioned Therac-25 and we were off to the races. I think we talked for about two hours on programming in high-risk situations, hers in those machines, mine in location services and routing for emergency services. That maniacal technician who irradiated that poor boy (the names escape me) was brought up. At one point we hit on the Challenger disaster. I was comforted to meet another programmer who had certain philosophies about making reliable niche software. I'm not at that level, but I think it is something to which I ought to occasionally aspire.
Because the linked page doesn't include a description of what a Therac-25 is:<p>> "The Therac-25 is a computer-controlled radiation therapy machine produced by Atomic Energy of Canada Limited in 1982. The Therac-25 was involved in at least six accidents between 1985 and 1987, in which some patients were given massive overdoses of radiation. Because of concurrent programming errors (also known as race conditions), it sometimes gave its patients radiation doses that were hundreds of times greater than normal, resulting in death or serious injury. These accidents highlighted the dangers of software control of safety-critical systems."<p><a href="https://en.wikipedia.org/wiki/Therac-25" rel="nofollow">https://en.wikipedia.org/wiki/Therac-25</a>
Every time I read the story of Therac-25 I feel incredibly frustrated AECL never faced real consequences or (criminal) liability for it.<p>Maybe I'm retroactively imposing modern day safety culture, but reading the timeline and history, it feels like AECL was completely negligent in waving off the issue as more and more fatalities kept piling up.<p>Can't believe the devices weren't pulled offline to definitively solve the issue after the first death. Instead, they basically went "can't repro, oh well".
From my blog[1]:<p>"In 2017, Leveson revisited those lessons[2] and concluded that modern software systems still suffer from the same issues. In addition, she noted:<p>* Error prevention and detection must be included from the outset.<p>* Software designs are often unnecessarily complex.<p>* Software engineers and human factors engineers must communicate more.<p>* Blame still falls on operators rather than interface designs.<p>* Overconfidence in reusing software remains rampant."<p>[1]: <a href="https://dave.autonoma.ca/blog/2019/06/06/web-of-knowledge/" rel="nofollow">https://dave.autonoma.ca/blog/2019/06/06/web-of-knowledge/</a><p>[2]: <a href="https://ieeexplore.ieee.org/document/8102762" rel="nofollow">https://ieeexplore.ieee.org/document/8102762</a>
Recently I had to get a panoramic dental x-ray and I was making small talk with the person who was running the machine.<p>I joked that I'm always cautious about machines like this, even knowing the dosage of radiation is low, simply because of the history of software safety controls and the story of Therac-25. She hadn't heard of it before and I gave her the gist of it, that an issue with the programming made it so it was possible to accidentally dose a patient considerably more than the intended amount (in a few different ways). It was interesting to her but I then had to pause so she could run the machine. I shut up and she did her thing.<p>Then, after a few minutes of scanning, she sucked her teeth a bit and apologized, saying she needed to run it once more. No worries, let's get it done! She starts it again and as I'm getting scanned she explains that "for whatever reason I was getting an error so I just had to restart it, this happens sometimes and I'm not really sure why." I give a little half-nervous chuckle and then the scan completes. Once I pull my head out of the machine, I <i>finally</i> get to finish my lovely Therac-25 story wherein I explain that one of the issues was... a combination of non-descriptive error codes, insufficient failsafes, and operator error resulting in patient casualties as the procedure was restarted one or more times.<p>We shared a little laugh and discussed other things, cost of living primarily. I'm still alive so I'm at least 63% sure I didn't get megadosed or anything but its been a funny conversation to revisit now and then.
Well There's Your Problem Podcast (with slides)<p>Episode 121: Therac-25
<a href="https://www.youtube.com/watch?v=7EQT1gVsE6I" rel="nofollow">https://www.youtube.com/watch?v=7EQT1gVsE6I</a>
Assume from the paper this simulator does not contain the software patch that fixed it? I'd be curious to see the "patch" for this that would, spelled out in more modern code (not sure what the original Therac was written in).
I fully recognize that there are dangers and risks to which I may be exposed by participating in Reproduction of Therac-25 accidents . The following is a description and/or examples of significant dangers and risks associated with this activity
Acute gullibility, Failure to understand April Fool's jokes, Night terrors associated with medical radiation machines .
I remember reading about the Therac-25 when the first post-mortem explanations appeared. Those poor patients, they could feel something was wrong but were told everything was fine because the operators didn't see anything out of the ordinary.
They suggest the students should do this exercise in pairs. When doing so, the lucky partner is the one who doesn't have to sit in front of the radiation source.