More info on Twitter from OVH's CEO: <a href="https://twitter.com/olesovhcom" rel="nofollow">https://twitter.com/olesovhcom</a><p>and on <a href="https://twitter.com/ovh_support_en" rel="nofollow">https://twitter.com/ovh_support_en</a><p>"SBG: ERDF is trying to find out
the default. 2 separated 20kV lines are down. We are trying to restart 2 generators A+B for SBG1/SG4. 2 others generators A+B work in SBG2. 1 routing room is in SBG1, the second in SBG2. Both are down. "<p>"An incident is ongoing impacting our network. We are all on the problem. Sorry for the inconvenience."<p>"SBG: 1 gen restarted."<p>"RBX: all optical links 100G from RBX to TH2, GSW, LDN, BRU, FRA, AMS are down."
It started with all our SBG servers going down simultaneously. Approximately 1h later all our RBX servers went down as well including the OVH status page and all other OVH web applications. Either their SBG and RBX data centers are somehow connected or those are indeed two independent incidents.
I moved away from OVH after I paid 3 months advance (~$300) for a server which burned down after 1 1/2 months. They did not issue any refunds (data, blood, sweat and tears were lost that day). I have been an OVH customer for 12 years.<p>Today, I'm glad to have moved away all my production environments as well.
Damn, every emergency power supply I have encountered (the big ones with fuel and hundreds of batteries) always fail to start when they have to... Why is that ?
UPDATE: not all datacenters are down, it seems like that in europe because ovh routing hasn't been updated so from our point of view everythign is down but really it is not :)
Network and RBX are UP again: <a href="https://twitter.com/olesovhcom/status/928556358353539072" rel="nofollow">https://twitter.com/olesovhcom/status/928556358353539072</a> (but SGB's datacenters are still being restarted)
wow, yesterday I was playing with their public cloud because considering choosing them. I had some connection problem with my private networking there (deleted it more than once) and opened a ticket. If it was me... sorry, haha. Not good advertisement but it can happen to everyone.
This affects DNS as well, since domaindiscount24 (a rather large registrar in Germany) happens to host all three of their nameservers with OVH.<p>Just in case you wonder why your sites don't work, even if you host them somewhere else.
The status page is up again <a href="http://status.ovh.net/" rel="nofollow">http://status.ovh.net/</a><p>I paste the report so far:<p>-------------<p>FS#15162 — SBG<p>Attached to Project— Network<p>Task Type: Incident<p>Category: Strasbourg<p>Status: In progress<p>Percent Complete: 0%<p>Details<p>We are experiencing an electrical outage on Strasbourg site.<p>We are investigating.<p>Comments (2)<p>Comment by OVH - Thursday, 09 November 2017, 10:55AM<p>SBG: ERDF repared 1 line 20KV. the second is still down. All Gens are UP. 2 routing rooms coming UP. SBG2 will be UP in 15-20min (boot time). SBG1/SBG4: 1h-2h<p>Comment by OVH - Thursday, 09 November 2017, 12:04PM<p>Traffic is getting back up. About 30% of the IP are now UP and running.<p>-------------<p>VPSes are still marked as read in the dashboard. I can't access mine.
Btw, note for those who use ovh ISP like me (this is a thing in France): your connection works, only the DNS's do not.<p>Fix (debian-like):<p><pre><code> sudo apt-get install bind9
</code></pre>
Then put in /etc/resolv.conf, if it's not already there:<p><pre><code> nameserver 127.0.1.1
</code></pre>
This runs a local nameserver that you use directly for resolving.<p>Oh, obviously, you need resolving to install the resolver :) Hope you have a 4g connection available.<p>Alternatively, you can just use google dns:<p><pre><code> nameserver 8.8.8.8
nameserver 8.8.4.4</code></pre>
They now posted their explanation [1] but I don't buy it. I find it hard to believe that the RBX incident happened shortly after the SGB incident without any connection between these two. They should have redundant networking (at least that's what they say) so one corrupted DB in RBX shouldn't have brought down the whole DC (or 7 DCs according to their system). Maybe they pulled corrupt data from SGB because it was down but I don't believe that at the same time of a power failure, two redundant network nodes got corrupted without any notice. Otherwise wouldn't that mean that one hardware issue can also bring down a whole region?<p>[1] <a href="http://status.ovh.net/?do=details&id=15162&PHPSESSID=7220be21848b5db440d2cb66c5ee7e14" rel="nofollow">http://status.ovh.net/?do=details&id=15162&PHPSESSID=7220be2...</a>
Some servers in GRA still appear to work if that's of any help. All data centres offline at once sounds more like an attack than a power failure in one location. According to them, there was a power failure in SBG but I don't see how that should affect routing in data centres several hundred miles away.<p><a href="https://twitter.com/olesovhcom/status/928541667283623936" rel="nofollow">https://twitter.com/olesovhcom/status/928541667283623936</a><p>EDIT: Maybe related to the Cisco issue?<p><a href="https://blogs.cisco.com/security/cisco-psirt-mitigating-and-detecting-potential-abuse-of-cisco-smart-install-feature" rel="nofollow">https://blogs.cisco.com/security/cisco-psirt-mitigating-and-...</a>
Details here: <a href="http://travaux.ovh.net/?do=details&id=28244" rel="nofollow">http://travaux.ovh.net/?do=details&id=28244</a><p>Apparently, the root cause of that issue is a critical software bug in Cisco NCS 2000 transponders.
Not "all"!<p>Maybe their main DCs, or their largest, but not all of them. I have virtual servers in thier Quebec DC (BHS) and it hasn't gone down since the last time I rebooted it.
How Complex Systems Fail<p><a href="http://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf" rel="nofollow">http://web.mit.edu/2.75/resources/random/How%20Complex%20Sys...</a>
A website that I host on ovh is up: <a href="https://sudokugarden.de/" rel="nofollow">https://sudokugarden.de/</a><p>ovh.com looks down for me too.<p>You can check it's hosted by OVH:<p>$ whois $(dig sudokugarden.de +short)
Huh. I have services active on two dedicated machines from OVH in Canada, and I was logged into both via SSH all night, and didn't have any interruption at all.
I had never heard of this company til I saw this post. Shrugged, thought, "huh, wonder who that's affecting."<p>Opened up Age of Empires II....no connection. Go to website for game servers..."Our provider, OVH, is down...."<p>Go figure.
I imagine Mr Good Guy at OVH telling some others:<p>"guys we have a single point of failure in our architecture with SBG, maybe we should...<p>- naaah it's fine, we do not have time nor resources"<p>Then shit happens.<p><i>edit: I have no idea what is happening exactly, but OVH being what it is, it seems extremely weird that all datacenters "can" get down at the same time, and it looks like a serious architecture problem to me (or backup systems, like generators, not being correctly tested... whatever). I am really curious about the future explanation with what happened exactly</i><p><i>edit2: Why all the downvotes? Even the status page of OVH is down, do not tell me it is good design. We are not here to be charitable, but realist.</i>
Title is misleading. Only RBX and SBG were affected.<p>06:15 UTC SBG serves failed.<p>OVH network weathermap: <a href="http://weathermap.ovh.net" rel="nofollow">http://weathermap.ovh.net</a><p>Btw. First post: <a href="https://news.ycombinator.com/item?id=15660524" rel="nofollow">https://news.ycombinator.com/item?id=15660524</a>
Trending on Twitter with the hashtag #OVHGATE<p><a href="https://twitter.com/hashtag/OVHGATE?src=hash" rel="nofollow">https://twitter.com/hashtag/OVHGATE?src=hash</a>