TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Sysadmin left finger on power button for an hour to avert SAP outage

246 pointsby singoldabout 7 years ago

17 comments

js2about 7 years ago
I used to work for Loudcloud, an early dot-com hosting company. We used very expensive EMC Symmetrix storage for our DB tier. (Search the web for EMC Symmetrix 3830 if you want an idea what these beasts looked like.)<p>The Symmetrix had an EPO (Emergency Power Off) which was a red button mounted in a recessed area on the back of the cabinet, and was protected by a plastic lid. To perform an EPO, you had to lift the lid and hold the button down for 30 seconds or so.<p>One of our DC ops employees was moving a heavy server into a cage and accidentally bumped the corner of the server into the plastic lid. The plastic lid was forced inward and got jammed depressing the EPO button. Moments later the entire Symmetrix powered off.<p>Later that day, as the word got around, another DC ops employee in a different datacenter looked at the Symmetrix and curiosity got the better of him. He didn&#x27;t see how it was possible for the plastic lid to get jammed. So he punched the lid with his hand. Moments later that Symmetrix went down too. :-(<p>We reported this design issue to EMC. A while later, a few of us were on a factory tour at EMC. They pointed out to us the &quot;Loudcloud Stopper&quot; work-around. It was a rubber stopper mounted next to the EPO button that prevented the plastic lid from being pressed inward.
ilamontabout 7 years ago
From &quot;Founders at Work,&quot; James Hong recalling how they prevented Hot or Not from being accidentally turned off:<p><i>But the Salon.com article was coming out the next morning. I called the writer and asked her if she could push the story back, but she said it was a slow news day and she couldn&#x27;t. So the article came out and the server got slammed.<p>My brother needed the server for XMethods, so we did the quickest thing we could think of, which was that night at 3:00 a.m., we took the site down, grabbed an extra PC--a 400 megahertz Celeron, no-memory-in-it machine that I got for free when I opened an eTrade account--and drove to Berkeley where Jim had a shared office.<p>I remember taking the top off a case for pushpins and mounting it on top of the power switch of the machine so no one could turn it off. Then we put it in the corner under his desk and surrounded it with books, so it just looked like a bunch of stuff under his desk with a little Ethernet cable coming out. And as soon as we turned the site back on, the access logs started flying. It was 5 in the morning!</i><p><a href="http:&#x2F;&#x2F;wcarss.org&#x2F;founders&#x2F;james_hong_hot_or_not.txt" rel="nofollow">http:&#x2F;&#x2F;wcarss.org&#x2F;founders&#x2F;james_hong_hot_or_not.txt</a>
评论 #16530538 未加载
krylonabout 7 years ago
During my training, I worked on a BIND4-to-BIND9 migration in an IBM mainframe environment. One week I got bored and started &quot;benchmarking&quot; the server, wrote this little perl script that swamped the server with DNS queries. Then I realized that my feeble little antique of a desktop (Pentium II @400MHz, running NT 4.0, in 2004!) was not even capable to put some serious load on that behemoth, and had not IBM just recently ported Perl 5.8 to z&#x2F;OS?<p>So I scp the script over to the mainframe, ssh into it, run it again... and grow disappointed that my puny little perl script is <i>still</i> the bottleneck. How much can this beast take, I wonder. Maybe, if I forked off a couple of children?<p>In retrospect, I should have let it go at this point. My benchmark was already querying the nameserver at a far higher rate than it would ever encounter in production. I should have written in my report that the performance impact of some configuration changes was negligible if not zero.<p>But I really wanted to see how many queries this beast could handle. So I kept increasing the number of worker processes hammering BIND with the same queries over and over, until ... my ssh connection dropped. I pinged the mainframe, but I got no response. Ooops.<p>I was trying to look really busy as the monitoring guy who always looked as if he had just woken up walked down the corridor into our open plan office, grinning, and asked if anyone had something to tell him. Nobody replied. I do not think I have ever been that quiet in my entire life.<p>&quot;Okay&quot;, he said, &quot;the TCP&#x2F;IP stack on that particular system just crashed, just in case you are wondering.&quot;. <i>Oops</i><p>&quot;Yeah, but SNA still works&quot;, the sysprog replied, &quot;And the LPAR is scheduled for an IPL on Saturday, anyway. It&#x27;ll do.&quot;<p>Obviously, it was a testing LPAR, so nobody got hurt; they would not let a trainee anywhere near a production system. But let the record show that I did manage to disable VTAM (at least the TCP&#x2F;IP side of it) with a simple perl script from an unprivileged user account. By accident, but still. Also, I lost about a kilogram in sweat that day.
评论 #16532128 未加载
zer00eyzabout 7 years ago
HA! This is my favorite interview question to ask candidates:<p>&quot;What is your all time biggest screw up, and how did you come back from it&quot; - I then tell them the story of me loosing several hundred thousand dollars and the funny things that happened around it to set the tone. If you have been in tech for any length of time you have one of these stories (if not a few). I have heard some great ones by simply asking and it gives great insight into a candidate (humor, stress response, the things you have seen).
评论 #16530104 未加载
评论 #16529808 未加载
评论 #16530171 未加载
评论 #16530539 未加载
评论 #16530152 未加载
koolbaabout 7 years ago
That’s pretty funny though I don’t think it’d work on a modern setup as everyone I’ve seen for the past 20 or so years does a hard power off after holding down the power button for 5+ seconds.
评论 #16529705 未加载
评论 #16529653 未加载
评论 #16529820 未加载
评论 #16529592 未加载
snuxollabout 7 years ago
Makes me realize how much I take quality of life features in modern servers for granted. We don&#x27;t need to be physically present to reboot servers, eliminating the possibility (well, mostly) we will power down the wrong one like this - even if the OS is completely unresponsive there&#x27;s lights out management that can be used to remotely manage power to the system. For the times that one needs to do physical maintenance on a server a blinking light can be toggled through the LOM interface to identify the machine, you can have the hostname display on a little LCD on the front panel too.<p>It&#x27;s really amazing to see how far computing has come in just the past two decades.
评论 #16529962 未加载
scrumperabout 7 years ago
This is an old-style directly connected power switch. If you release and re-press it quickly enough the power won&#x27;t go out as there&#x27;s still enough residual energy in the PSU capacitors. I used to do this all the time on my 486 as a sort of absent-minded tick.<p>I don&#x27;t blame the guy for not trying that with a production SAP server though...
jontroabout 7 years ago
15 years ago when we were hosting servers in a co located facility I accidentally turned off a server instead of rebooting it (from terminal services).<p>The support personel were annoyed as they had to drive over to the facility and manually push the power button
评论 #16530064 未加载
zitterbewegungabout 7 years ago
I have used ngrok to make my laptop work as a production server when I was user testing <a href="https:&#x2F;&#x2F;github.com&#x2F;zitterbewegung&#x2F;mms2text" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;zitterbewegung&#x2F;mms2text</a> . I setup twilio to point to the url provided by ngrok. I just left my laptop home and I got people to test the app. Eventually I set it up on AWS but it chugged away fine on my laptop (Macbook Pro TB 13 inch).
gk1about 7 years ago
Speed 3: Uptime
lmilcinabout 7 years ago
I did the same almost two decades ago. Old AT power supplies on Proliant servers would turn the server off only after you lifted your finger. I have pressed it on a wrong server. Had to reach with my foot to the phone lying close by on the floor to call accounting department to log off the application to prevent corruption when the Novell Netware server powering it was was rebooted.
评论 #16532105 未加载
waltwaltherabout 7 years ago
I RDP&#x27;d to a Windows server an hour&#x27;s drive from my office at a public library in another town. I had right-clicked on the network connection to check out some settings....and accidentally clicked DISABLE instead of PROPERTIES (or whatever it was called in Windows 2000 server) and disabled the network connection. It was a long drive...with my phone ringing the entire time. Never made that mistake again.
zaarnabout 7 years ago
There is a rather similar (maybe same but changed to protect the not-so-innocent?) story on the daily wtf; <a href="http:&#x2F;&#x2F;thedailywtf.com&#x2F;articles&#x2F;Trauma-Center" rel="nofollow">http:&#x2F;&#x2F;thedailywtf.com&#x2F;articles&#x2F;Trauma-Center</a>
squozzerabout 7 years ago
A modern interpretation of Hans Brinker.
sd6594about 7 years ago
What about disabling the power button effect in the OS?
评论 #16529678 未加载
评论 #16529672 未加载
评论 #16529637 未加载
评论 #16529719 未加载
tomcooksabout 7 years ago
Unscrew a bolt from the server rack and tape it on the button, done
评论 #16529644 未加载
评论 #16530126 未加载
评论 #16529634 未加载
评论 #16529621 未加载
评论 #16529658 未加载
hartatorabout 7 years ago
&gt; Jeremy told Who, me? that his mate asked to be relieved, as he was in a bit of pain. Those requests were denied due to the risk of the power going off and also out of a desire to make the poor chap suffer for his error.<p>I think that&#x27;s just awful.
评论 #16530267 未加载