Great article, made this exact same mistake in space exploration before learning to transmit demand and not supply. There are so many principles like this I learned from playing through the game. One area I've been focusing on improving is trying to anticipate how any given subsystem might go wrong and add even simple circuitry to detect condition and alarm/signal to the factory dashboard. Has cut down on the times where I realize some part of factory hasn't been working properly for like 5 hours. Applying this to software development has made me better about getting better about how I use rollbar logging
This is why 4-20 mA is a common signaling standard in industrial automation. 4 mA means zero and 20 mA means one. 0 mA means <i>broken transmitter</i>!
One reply tweet notes "a complex system always operates in a failure state", but to find more discussion on this point it's worth noting that this is a restatement of:<p>> "The Fundamental Failure-Mode Theorem (F.F.T.): complex systems usually operate in a failure mode."<p>-- John Gall, <i>General Systemantics</i> (aka <i>Systemantics</i>, aka <i>The Systems Bible</i>), 1977 < <a href="https://en.wikipedia.org/wiki/Systemantics" rel="nofollow">https://en.wikipedia.org/wiki/Systemantics</a> ><p>Searching for "fundamental failure mode theorem" will provide interesting further insights on this.
If your code has side effects, it only ever seems to be working.<p>----<p>> the newbie says "aww, why isn't it working?"<p>> the intermediate says "yay, it's working!"<p>> the expert goes "hmm, why is it working?"<p>I've also seen this as<p>My thing isn't working and I change X and now I get error J instead of error K, so now I have to change X back because it's still not working.<p>Where K is a "better" error than J, such as K = failed to connect and J is Server 500 error. With J you are at least talking to a webserver.
Reminds me of Jurassic Park (book, I think?); nobody realised that dinosaurs were loose and breeding, because the system was doing:<p>if dinosaurs < expected {
alert("escaped dino!");
}<p>and nobody anticipated dinosaurs > expected
There are two good lessons here:<p>1) Robust system design involves identifying the parts of your system that are mission-critical and <i>always</i> monitoring them. NASA missions have great automation and a 24/7-staffed mission control.<p>2) If a system failure <i>can</i> result in massive secondary damage, isolate that system. Warehouses receiving orbital payloads should probably be nice and far away from the base you care about.
While I have not really played SE (past initial launch, it was too daunting to build the ship in space, etc) I have run into these kind of "issues" in Factorio. While they can have disastrous effects, I often really enjoy finding the core problem and finding a solution to make it "fail-safe".<p>Sometimes I wish I could go back and wipe all my Factorio knowledge and start from scratch again, most importantly refusing to use blueprints from the internet except for basic things like balancers. Finding early/mid/late game malls/blueprints sort of ruined the game for me. Min/maxing is fine by myself but once I'm "competing" against the internet or feel obligated to find the most efficient/best green/red/blue circuit factory, or science packs, etc it really ruins the game for me and makes it feel more like a job.<p>I got a good thousand or two hours out of the game before I hit that point and someday I want to try playing it again with self-enforced limits on what I'm "allowed" to get from the internet and what I need to just figure out on my own. The first game I played was pure bliss (and I played with Bob's mods and a number of others, yes that was stupid for my first playthrough but damn it was fun), I'd love to recapture that.
It’s really interesting how factorio problems are pretty much the same as multiprocessing/distributed systems problems. It just shows how universal and fundamental the idea of “work” and “workers” are. An engineer on my old team came up with a credit system to solve this problem, the receiver issues credits to the sender, which only sends when there is a credit available - this ends up with the same better failure mode that the article discovers and allows you to overlap communication with work.
Wow, Factorio is approaching Full Time Job levels of detail and work required to keep everything running. Incredible game that I will never play, I feel I’m back at my job!
Same goes for work. If you join a well oiled machine that just seems to be working great, you may never understand how it works, since you won't be exposed to various people diagnosing issues. Join a place that used to work and is now creaking, or a place that never worked, and there's more pain but also more learning.<p>Also when does a Factorio system ever work? There's permanently a pressure for it to do more, stuff stuck on the wrong belts, not enough of some input...
Just want to note that the novel noted in the first post, <i>The Moon Is A Harsh Mistress</i>, is a great read. I think I read it after seeing it recommended in Mary Robinette Kowal's Lady Astronaut series, but it's a nice balance of technical and science fiction with politics, almost like a condensed version of KSR's Mars Trilogy. It's mostly (not entirely) devoid of the more typical Heinlein sexism/objectification.
Reminds me a lot about the fascinating read of Knight Capital, and the $440M-in-28-min bug that lost them 75% of their equity.<p>Reminds me a lot about Knight Capital bug that cost them $440M in 28 min.<p>> Rogue orders seemed originated from the new RLP router code, but no one could pinpoint the bug.... they reverted to last-stable.... and even <i>more</i> trades executed than before.<p><a href="https://www.henricodolfing.com/2019/06/project-failure-case-study-knight-capital.html" rel="nofollow">https://www.henricodolfing.com/2019/06/project-failure-case-...</a><p>I remember a PDF that went into even greater detail, was a very good read.
Tests that output booleans (pass/fail) are an antipattern.<p>Tests and their dashboards must distinguish “the testing system worked and the test failed” from “the testing system failed”.
Whenever I write a non-trivial amount of code that appears to work the first time, I'm immediately suspicious. I probably spend more time testing that code than I would have if I had broken and fixed some things along the way.
In the self-driving context, this reminds me strongly of Tesla Autopilot. Good enough to work most of the time, but likely would end up in greater overall injuries/accidents per mile if actually enabled at scale.<p>Waymo, Cruise, Aurora, and others are doing it the right way.
We once learned a good lesson about this, as well as recognizing the real threat that ESD poses. We once had a terrible failure that was the result of a main signaling line being damaged by ESD such that it had much higher resistance than it should have. Our signal levels were too close such that it danced over the line a few times and got interpreted as the opposite value, but only sometimes. It also taught a good lesson that when the robots <i>do</i> become self-aware and take over the world, it's gonna hurt like hell.
Statisticians learned long ago that "missing" needs to be treated special. It should be either an entirely separate signal or a "sentinel value" riding on the existing signal that everyone knows and couldn't possibly be a normal operation value. Which is why sentinel values are usually a huge bunch of 9's or the maximum possible value of the field or something.<p>Interesting that the sentinel value is zero in this case. In data analysis that's usually a terrible sentinel value, but here it's the most practical one.
This sounds close enough to Golang where a null value is basically a zero value. I think that would be another million dollar mistake that would need fixing later in the language
A more narrow lesson: this is why all SQL statements involving NULL are always false.<p>> in any case, losing power means the transmitter stops transmitting.<p>> and here's the fun part: your circuit which controls that inserter is set to insert "if [ICE] < 8000"<p>> and GETTING NO SIGNAL AT ALL counts the same to it as ICE=0.<p>> 0 is < 8000.
Spoiler alert! (lol)<p>I'm in the middle of reading this book and I figured this would (finally) happen next chapter. I'll be looking forward to this pivotal moment.
Threadreaderapp of the tweets (still live):
<a href="https://threadreaderapp.com/thread/1581643415850098688.html" rel="nofollow">https://threadreaderapp.com/thread/1581643415850098688.html</a>
Not familiar with this particular game. But in real-world systems design the default should be (?)<p><pre><code> boolean deliver = false;
while( polling ){
deliver = readyToReceive();
if( deliver ) {
send();
}
}</code></pre>
Tweet seems to be deleted now. A brief summary of what it was:<p>* In Factorio (game) you can have space stations that send supplies to your ground base<p>* If the supplies are not caught, then they cause damage to your base<p>* To avoid sending supplies that cannot be caught, you can use logic controls<p>* A common approach is to have the station say "if (ground supply < X) send"<p>* This fails if the ground supply loses power, as no signal is interpreted as 0 and 0 < X<p>* Thus, the system will appear to work until your ground base loses power, at which point it will be destroyed<p>* A better system is to have the ground base use logic to say "if supply < X send signal" and the station to say "if signal received, send". This way, a power failure fails safe instead of fails active.<p>With a fun callout to <a href="https://en.wikipedia.org/wiki/The_Moon_Is_a_Harsh_Mistress" rel="nofollow">https://en.wikipedia.org/wiki/The_Moon_Is_a_Harsh_Mistress</a>
As I and others have previously pointed out [1] foone uses they/them pronouns and feels particularly frustrated when their posts get featured on HN, because users here seem unable to honor this simple preference of theirs. It looks like they deleted their tweet out of frustration with todays latest round of misgendering.<p>I hope people on HN will learn to respect all members of our community. Yes that involves not assuming every person online is a man!<p>[1] <a href="https://news.ycombinator.com/item?id=32978438" rel="nofollow">https://news.ycombinator.com/item?id=32978438</a>
A bit of an off-topic: I only ever saw white traffic lights in the US and Canada (specifically, pedestrian traffic lights) and they confused me a lot at first - the rest of the world uses green. The icons displayed (hands) are confusing too: the rest of the world uses an icon of a man walking or standing.
This tweet has been deleted... I should have checked sooner, had too much trust that it'd still be working... I feel like there is a joke in there somewhere
What drama with the Factorio devs is he talking about at the end (and namesearchers?.. I don't get Twitter sometimes)<p>I've never heard anyone say anything about the Factorio devs except praise their productivity and professionalism.
Hmm, what does he mean "I woke up the next day"? Did he leave Factorio running, or did it get an online component while I wasn't looking?