I guess some of my questions are addressed in the latter half of the post, but I'm still puzzled why a prominent service didn't have a plan for what looked like a run of the mill hardware outage. It's hard to know exactly what happened as I'm having trouble parsing some of the post (what is a 'network connector'? is it a cable? nic?). What were some of the 'increasingly outlandish' workarounds? Are they actually standing up production hosts manually, and was that the cause of a delay or unwillingness to get new hardware goin? I think it would be important to have all of that set down either in documentation or code seeing as most of their technical staff are either volunteers, who may come and go, or part timers. Maybe they did, it's not clear.<p>It's also weird seeing that they are still waiting on their provider to tell them exactly what was done to the hardware to get it going again, that's usually one of the first things a tech mentions: "ok, we replaced the optics in port 1" or "I replaced that cable after seeing increased error rates", something like that.