This reminds me of the story one of my university teachers used to introduce himself. He was the most brilliant educator I've had, really passionate about making sure people could actually follow and so humble. The course was called "Digital and Computer Technology" and it was a first year course teaching you to build a CPU starting from logic gates and then program it with assembly.<p>He said that when he had recently graduated and just moved from Norway to Sweden his first job here was as a technician for the local railway. One of his first assignments was a call where the computer controlling all the track switches had stopped working. Luckily there was a backup, but they needed him to fix it immediately.<p>He arrived, took a look at the computer and its backup running next to it. He started out by measuring voltages on both, comparing to see what the difference was. After a while some men in suits came in and asked how things were going. He said it was all good, they said it wasn't because now all trains had stopped. Apparently, he had short circuited the backup.<p>"And that's when I decided to go for a theoretical career!" My teacher happily concluded. The classroom was left in a stunned silence.
While having algorithms and computers dictating every aspect of train movements, crew scheduling, resource allocation is efficient and in a sense optimal, I think there's something compelling to be said about having a system that can survive with human-rememberable patterns when such an optimizer fails. That it can fall back on (or have at the core) simple patterns that people can still operate without algorithmic intervention. I.e. the algorithmic tuning enhances at the edges, and isn't at its core some tangled non-interpretable system that someone can't operate without computer assistance. The idea of a protected set of core routes and staffing patterns that operates without having to be computer-determined?<p>I used to be so impressed at European train stations that they had a single sheet that would give the entire day's arrivals and departures, down to which track. I assume that behind that were schedules that weren't so far from being hand-calculated with equivalent sheets that told people and equipment where to be at what times.<p>I think we lose something when we jump so far forward that you cannot fall back gracefully without a system completely breaking down.
My curiosity is killing me as to what exactly went wrong. And as someone living in The Netherlands I'm also kind of mad at the apparent fragility of this huge chunk of our infrastructure. If this had happened on a work day, it would have been a real national emergency, rather than just a huge national inconvenience.<p>I would appreciate one of those post-outage "what happened" reports like with the AWS and Facebook outages last year. But outside of IT I don't think anyone really expects those over here. And there might be national security considerations preventing such disclosure until any chance of a repeat has been engineered away, at which point everyone will have forgotten.
Reminds me of when the entire Swiss train network shut down in 2005[0]<p>This was, however, not due to a computer glitch, but problems in SBB's electrical grid, which lead to overload of some of the lines and then to a complete shutdown of the entire grid.<p>Warnings were displayed on control consoles, but were drowned in 1000s of other meaningless warnings (there's a lesson for an UI designer here).<p>I was in the ICE from Basel to Zurich at the time and after fuming for a few minutes I figured: "What can you do?" and moved to the bar wagon. There we had something of an impromptu party until there was no more power for the cash register and they stopped sales.<p>In the end it was a (relatively) funny adventure for most and a <i>huge</i> embarrassment for SBB. I don't think they like to be reminded of that faithful day.<p>[0] <a href="https://www.swissinfo.ch/eng/swiss-train-network-shuts-down/4578052" rel="nofollow">https://www.swissinfo.ch/eng/swiss-train-network-shuts-down/...</a>
What jumps out at me after reading this notice is how it completely lacks any apology. I am not prescribing what it should say but merely observing that having lived in North America for sometime now, it seems hard for me to imagine for such a notice to not sound profusely apologetic here.<p>Note: I do not consider words like 'unfortunately' and 'extremely unpleasant' apologetic.
Also, I understand there are cultural differences between the Netherlands and the USA- no joke.
I think it’s worth mentioning that while the IT system couldn’t coordinate the trains safely, it was a management decision to stop them from running. Coordination can happen myriad ways. For example, trains could operate according to their normal schedule, and if one needed to deviate they could radio in, and an operator could let downstream trains know to proceed accordingly. Maybe simple cell-phone GPS pings could suffice. Perhaps the state of the system was in such chaos that such orchestration wasn’t possible with the staff on hand. That said, a truly resilient system should have multiple coordination channels to fall back upon.
I was in the station for a different reason when I heard the announcement. The people entering and hearing the announcement were unsurprisingly very crestfallen, but it appears that twitter helped organize a lot of travel, with people posting where they were/going and picking up other travellers.<p>Also probably worth noting is that trains here are used fairly frequently, however unfortunately this isn't the first time trains are stopping - for example snow is a common reason for delayed/reduced service. (Snow isn't very common here in the NLs)
In 2019 there was a power blackout in London. When power was restored, a number of new trains refused to start without a technician physically coming over with a diagnostics laptop to them because power fluctuated outside allowed bounds or something before the blackout.
More and more we are seeing "the computer says no" in our lives. We've replaced human activity and decision making with computerized versions, which are nothing more than the automation of stored human knowledge and routines.
As a developer, any day in which you don't see a national headline about how your f*#(up has just made a lot of other people sad and/or angry, is a day that could have been worse.
Sounds similar to a problem that happened a couple of weeks ago in Madrid, although that was not the whole network it pretty much shut down the trains for the day. That was an IT failure too.
<a href="https://www.20minutos.es/noticia/4973632/0/graves-retrasos-en-cercanias-por-una-averia-informatica-en-chamartin/" rel="nofollow">https://www.20minutos.es/noticia/4973632/0/graves-retrasos-e...</a> (in Spanish)
There is a swiss company based in Zürich that develops this kind of software. I believe it was a c#, wpf application, the last time I spoke with them. The schedules that their software produces is calculated and set for several years ahead. That baffled me! The tracks and infrastructure is booked up so far ahead! I think they mentioned that the german railway uses their software, nut only the sbb.
I wonder if this is connected to a similar incident in Italy. <a href="https://news.yahoo.com/italys-state-railway-may-target-163417138.html" rel="nofollow">https://news.yahoo.com/italys-state-railway-may-target-16341...</a>
Seems like trains are up and running again today as announced, with only little delays:<p><a href="https://travic.app/?z=9&x=628581.0&y=6814491.0" rel="nofollow">https://travic.app/?z=9&x=628581.0&y=6814491.0</a>
Based in my knowledge as a public transportation enthusiastist in China, I got really surprised to see a IT problem bring the entire system down. There should usually be some "more manual" backup mechanism that allows downgrade and operation of trains without computerized signaling system. By kinds of token signaling it will still be possible to run trains as long as the physical infrastructure is still there. The efficiency will be much lower and more prone to (potentialy deadly) errors, but no more trains today sounds unlikely.
The fact that it happened on a Sunday suggests to me that this is the result of a service outage gone awry. Realize that the trains failing on a Sunday is really not a big deal. However, if the trains don't run for Monday morning rush hour that is a big deal. I bet they made a change Saturday night and something broke Sunday morning.<p>It's now Monday morning here and the trains are back to normal in time for rush hour. So it's not really a big deal.
I do hope they have a more detailed RCA at some point, right now this is the only helpful paragraph in the article about what happened:<p>> The IT failure occurred at the end of the morning. It affected the system that generates up-to-date schedules for trains and staff. This system is important for safe and scheduled operations: if there is an incident somewhere, the system adjusts itself accordingly. This was not possible due to the failure.
This post is a fascinating use of grammatic passive voice. The hallmark is gratuitous use of "being" verbs, i.e. "is,was". Passive voice is a way to avoid using active voice, which is an implicit way to avoid responsibility for an error.<p>"Due to the enormous impact of the failure in the IT system, it is unfortunately not possible to run any trains today."
// =>
"Due to the enormous impact of the failure in the IT system, NS cannot run trains today."<p>"Although the cause of the failure has now been resolved, the impact is considerable. To be able to start up reliably, systems must be updated and trains must be brought to the right place. That takes time. For our passengers, this is extremely unpleasant news."
// =>
"Although we have fixed the cause of the failure, the failure impacted most of our customers. To fix this failure so that trains would start reliably, we needed to return all trains to a central depot, which took nearly 12 hours. We apologize for the extremely unpleasant interruption."<p>"The expectation is that tomorrow morning the normal timetable can largely be resumed. The night trains can still run."
// =>
We expect to resume minor operation with night trains first and then major operations tomorrow morning.<p>"The IT failure occurred at the end of the morning. It affected the system that generates up-to-date schedules for trains and staff. This system is important for safe and scheduled operations: if there is an incident somewhere, the system adjusts itself accordingly. This was not possible due to the failure."
// =>
The IT failure occured at 11am in the scheduling system, a major requirement for safe operation; if train operations are interrupted at one location, the system normally reroutes other trains to prevent collisions. This system failed.<p>"The international trains are not affected by this failure. For information about the timetables of other transporters, passengers can consult the websites of these transporters."
// =>
The failure did not affect international trains. If you have an inquiry about other transportation services, please consult those services.<p>"The journey planner is updated."
// =>
We updated the journey planner accordingly to account for the interruptions.<p>// Generally the use of "passive voice" indicates poorly-taught grammar, however the use could also indicate desire to avoid responsibility.
I was walking up to Amsterdam Centraal when I noticed floods of people all exiting the building... within minutes an Uber was booked to Schiphol before prices rocketed due to demand.
There was a massive southwest airlines IT failure compounded by weather issues over the weekend. This cancelled numerous flights but also snowballed into most of the southwest flights being delayed. We were lucky to return home. The news is only reporting weather delays yet for those at the airport this weekend...it was clearly larger.<p>If it is true that Netherland's and Italy's transport infrastructure were also down I wonder if Russian supported hackers are to blame.