OVH Incident in Strasbourg

311 pointsby fvvover 7 years ago

40 comments

lodeover 7 years ago

More info on Twitter from OVH's CEO: <a href="https://twitter.com/olesovhcom" rel="nofollow">https://twitter.com/olesovhcom</a>and on <a href="https://twitter.com/ovh_support_en" rel="nofollow">https://twitter.com/ovh_support_en</a>"SBG: ERDF is trying to find out the default. 2 separated 20kV lines are down. We are trying to restart 2 generators A+B for SBG1/SG4. 2 others generators A+B work in SBG2. 1 routing room is in SBG1, the second in SBG2. Both are down. ""An incident is ongoing impacting our network. We are all on the problem. Sorry for the inconvenience.""SBG: 1 gen restarted.""RBX: all optical links 100G from RBX to TH2, GSW, LDN, BRU, FRA, AMS are down."

评论 #15661131 未加载

评论 #15660992 未加载

qwerty69over 7 years ago

It started with all our SBG servers going down simultaneously. Approximately 1h later all our RBX servers went down as well including the OVH status page and all other OVH web applications. Either their SBG and RBX data centers are somehow connected or those are indeed two independent incidents.

评论 #15660934 未加载

评论 #15661219 未加载

arekkasover 7 years ago

I moved away from OVH after I paid 3 months advance (~$300) for a server which burned down after 1 1/2 months. They did not issue any refunds (data, blood, sweat and tears were lost that day). I have been an OVH customer for 12 years.Today, I'm glad to have moved away all my production environments as well.

评论 #15660920 未加载

评论 #15661005 未加载

评论 #15660974 未加载

评论 #15665869 未加载

wiz21cover 7 years ago

Damn, every emergency power supply I have encountered (the big ones with fuel and hundreds of batteries) always fail to start when they have to... Why is that ?

评论 #15661301 未加载

评论 #15661425 未加载

评论 #15661428 未加载

评论 #15663200 未加载

fvvover 7 years ago

UPDATE: not all datacenters are down, it seems like that in europe because ovh routing hasn't been updated so from our point of view everythign is down but really it is not :)

评论 #15660872 未加载

fxaguessyover 7 years ago

Network and RBX are UP again: <a href="https://twitter.com/olesovhcom/status/928556358353539072" rel="nofollow">https://twitter.com/olesovhcom/status/928556358353539072</a> (but SGB's datacenters are still being restarted)

评论 #15661145 未加载

therealmarvover 7 years ago

wow, yesterday I was playing with their public cloud because considering choosing them. I had some connection problem with my private networking there (deleted it more than once) and opened a ticket. If it was me... sorry, haha. Not good advertisement but it can happen to everyone.

评论 #15661015 未加载

评论 #15661808 未加载

评论 #15660940 未加载

NiklasMortover 7 years ago

can't wait to read about the detailed followup on this in a few days, it is always interesting to see how such major outages happen

jedisct1over 7 years ago

Not "all datacenters". Only 2 of them. They have 22, not counting all the POPs.

评论 #15662415 未加载

drchaosover 7 years ago

This affects DNS as well, since domaindiscount24 (a rather large registrar in Germany) happens to host all three of their nameservers with OVH.Just in case you wonder why your sites don't work, even if you host them somewhere else.

评论 #15661289 未加载

pmontraover 7 years ago

The status page is up again <a href="http://status.ovh.net/" rel="nofollow">http://status.ovh.net/</a>I paste the report so far:-------------FS#15162 — SBGAttached to Project— NetworkTask Type: IncidentCategory: StrasbourgStatus: In progressPercent Complete: 0%DetailsWe are experiencing an electrical outage on Strasbourg site.We are investigating.Comments (2)Comment by OVH - Thursday, 09 November 2017, 10:55AMSBG: ERDF repared 1 line 20KV. the second is still down. All Gens are UP. 2 routing rooms coming UP. SBG2 will be UP in 15-20min (boot time). SBG1/SBG4: 1h-2hComment by OVH - Thursday, 09 November 2017, 12:04PMTraffic is getting back up. About 30% of the IP are now UP and running.-------------VPSes are still marked as read in the dashboard. I can't access mine.

评论 #15665688 未加载

oelmekkiover 7 years ago

Btw, note for those who use ovh ISP like me (this is a thing in France): your connection works, only the DNS's do not.Fix (debian-like):<pre><code> sudo apt-get install bind9 </code></pre> Then put in /etc/resolv.conf, if it's not already there:<pre><code> nameserver 127.0.1.1 </code></pre> This runs a local nameserver that you use directly for resolving.Oh, obviously, you need resolving to install the resolver :) Hope you have a 4g connection available.Alternatively, you can just use google dns:<pre><code> nameserver 8.8.8.8 nameserver 8.8.4.4</code></pre>

评论 #15661524 未加载

tyingqover 7 years ago

My OVH dedicated servers seem fine. Webservers, ssh, all working. All ones in Canada.

评论 #15661191 未加载

评论 #15661400 未加载

qeternityover 7 years ago

All of our dozen or so bare metal boxes are up in GRA as well as all of our cloud instances. However object storage is down.

dx034over 7 years ago

They now posted their explanation [1] but I don't buy it. I find it hard to believe that the RBX incident happened shortly after the SGB incident without any connection between these two. They should have redundant networking (at least that's what they say) so one corrupted DB in RBX shouldn't have brought down the whole DC (or 7 DCs according to their system). Maybe they pulled corrupt data from SGB because it was down but I don't believe that at the same time of a power failure, two redundant network nodes got corrupted without any notice. Otherwise wouldn't that mean that one hardware issue can also bring down a whole region?[1] <a href="http://status.ovh.net/?do=details&id=15162&PHPSESSID=7220be21848b5db440d2cb66c5ee7e14" rel="nofollow">http://status.ovh.net/?do=details&id=15162&PHPSESSID=7220be2...</a>

dx034over 7 years ago

Some servers in GRA still appear to work if that's of any help. All data centres offline at once sounds more like an attack than a power failure in one location. According to them, there was a power failure in SBG but I don't see how that should affect routing in data centres several hundred miles away.<a href="https://twitter.com/olesovhcom/status/928541667283623936" rel="nofollow">https://twitter.com/olesovhcom/status/928541667283623936</a>EDIT: Maybe related to the Cisco issue?<a href="https://blogs.cisco.com/security/cisco-psirt-mitigating-and-detecting-potential-abuse-of-cisco-smart-install-feature" rel="nofollow">https://blogs.cisco.com/security/cisco-psirt-mitigating-and-...</a>

评论 #15660882 未加载

评论 #15660893 未加载

评论 #15661115 未加载

评论 #15660848 未加载

jedisct1over 7 years ago

Details here: <a href="http://travaux.ovh.net/?do=details&id=28244" rel="nofollow">http://travaux.ovh.net/?do=details&id=28244</a>Apparently, the root cause of that issue is a critical software bug in Cisco NCS 2000 transponders.

评论 #15662931 未加载

dorfsmayover 7 years ago

Not "all"!Maybe their main DCs, or their largest, but not all of them. I have virtual servers in thier Quebec DC (BHS) and it hasn't gone down since the last time I rebooted it.

ashitlerferadover 7 years ago

I have 30+ servers on OVH. All are online.

xmichael99over 7 years ago

This happens to Internap almost weekly... I always wondered why they never make it in the news.

dredmorbiusover 7 years ago

How Complex Systems Fail<a href="http://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf" rel="nofollow">http://web.mit.edu/2.75/resources/random/How%20Complex%20Sys...</a>

ever1over 7 years ago

Detailed report <a href="https://twitter.com/olesovhcom/status/928904373949919232" rel="nofollow">https://twitter.com/olesovhcom/status/928904373949919232</a>

perlgeekover 7 years ago

A website that I host on ovh is up: <a href="https://sudokugarden.de/" rel="nofollow">https://sudokugarden.de/</a>ovh.com looks down for me too.You can check it's hosted by OVH:$ whois $(dig sudokugarden.de +short)

评论 #15660996 未加载

fapjacksover 7 years ago

Huh. I have services active on two dedicated machines from OVH in Canada, and I was logged into both via SSH all night, and didn't have any interruption at all.

r1chover 7 years ago

Looks like only their routing / network was down. My servers just came back up and haven't experienced any power outage.

评论 #15661397 未加载

nstricevicover 7 years ago

I just moved 2 apps to OVH. So this was totally unexpected. My apps are unavailable for more than 7 hours.Does this happen often with OVH?

askmikeover 7 years ago

My server hosted on OVH had some problems (DNS lookups) but has stayed up and works fine right now.EDIT: Hosted in EU.

gizzlonover 7 years ago

Now this status page is down as well. Sucks to be them right now =/ (I'm in Europe)

pavlakoosover 7 years ago

I'm trying to find ETA for solving the issue, but they didn't post it on Twitter.Anybody knows ETA?

评论 #15661163 未加载

评论 #15660967 未加载

stevenhover 7 years ago

My OVH servers in Canada and Australia are running fine.My OVH servers in France are all inaccessible.

评论 #15661303 未加载

jagermoover 7 years ago

This has to be one of the least informative status page I have ever seen.

treoover 7 years ago

Looks like they are starting to come back up. My VPS is accessible again.

aerovistaeover 7 years ago

I had never heard of this company til I saw this post. Shrugged, thought, "huh, wonder who that's affecting."Opened up Age of Empires II....no connection. Go to website for game servers..."Our provider, OVH, is down...."Go figure.

评论 #15661209 未加载

评论 #15661206 未加载

oronover 7 years ago

not all of them, I have some servers in Canada, working OK

thejoshover 7 years ago

Sydney is fine.

KeitIGover 7 years ago

I imagine Mr Good Guy at OVH telling some others:"guys we have a single point of failure in our architecture with SBG, maybe we should...- naaah it's fine, we do not have time nor resources"Then shit happens.edit: I have no idea what is happening exactly, but OVH being what it is, it seems extremely weird that all datacenters "can" get down at the same time, and it looks like a serious architecture problem to me (or backup systems, like generators, not being correctly tested... whatever). I am really curious about the future explanation with what happened exactlyedit2: Why all the downvotes? Even the status page of OVH is down, do not tell me it is good design. We are not here to be charitable, but realist.

评论 #15661023 未加载

评论 #15661813 未加载

评论 #15662618 未加载

评论 #15660927 未加载

评论 #15661956 未加载

contingenciesover 7 years ago

To make error is human. To propagate error to all server in automatic way is #devops. - @devopsborat

评论 #15661503 未加载

评论 #15661211 未加载

Sami_Lehtinenover 7 years ago

Title is misleading. Only RBX and SBG were affected.06:15 UTC SBG serves failed.OVH network weathermap: <a href="http://weathermap.ovh.net" rel="nofollow">http://weathermap.ovh.net</a>Btw. First post: <a href="https://news.ycombinator.com/item?id=15660524" rel="nofollow">https://news.ycombinator.com/item?id=15660524</a>

评论 #15661245 未加载

评论 #15661779 未加载

评论 #15661339 未加载

metafunctorover 7 years ago

Someone with access might wish to update the title of this post, because all OVH datacenters are definitely not down.

评论 #15661782 未加载

评论 #15660966 未加载

评论 #15663171 未加载

评论 #15660987 未加载

Hates_over 7 years ago

Trending on Twitter with the hashtag #OVHGATE<a href="https://twitter.com/hashtag/OVHGATE?src=hash" rel="nofollow">https://twitter.com/hashtag/OVHGATE?src=hash</a>

评论 #15661090 未加载

40 comments

lodeover 7 years ago

评论 #15661131 未加载

评论 #15660992 未加载

qwerty69over 7 years ago

评论 #15660934 未加载

评论 #15661219 未加载

arekkasover 7 years ago

评论 #15660920 未加载

评论 #15661005 未加载

评论 #15660974 未加载

评论 #15665869 未加载

wiz21cover 7 years ago

Damn, every emergency power supply I have encountered (the big ones with fuel and hundreds of batteries) always fail to start when they have to... Why is that ?

评论 #15661301 未加载

评论 #15661425 未加载

评论 #15661428 未加载

评论 #15663200 未加载

fvvover 7 years ago

UPDATE: not all datacenters are down, it seems like that in europe because ovh routing hasn't been updated so from our point of view everythign is down but really it is not :)

评论 #15660872 未加载

fxaguessyover 7 years ago

评论 #15661145 未加载

therealmarvover 7 years ago

评论 #15661015 未加载

评论 #15661808 未加载

评论 #15660940 未加载

NiklasMortover 7 years ago

can't wait to read about the detailed followup on this in a few days, it is always interesting to see how such major outages happen

jedisct1over 7 years ago

Not "all datacenters". Only 2 of them. They have 22, not counting all the POPs.

评论 #15662415 未加载

drchaosover 7 years ago

评论 #15661289 未加载

pmontraover 7 years ago

评论 #15665688 未加载

oelmekkiover 7 years ago

评论 #15661524 未加载

tyingqover 7 years ago

My OVH dedicated servers seem fine. Webservers, ssh, all working. All ones in Canada.

评论 #15661191 未加载

评论 #15661400 未加载

qeternityover 7 years ago

All of our dozen or so bare metal boxes are up in GRA as well as all of our cloud instances. However object storage is down.

dx034over 7 years ago

评论 #15660882 未加载

评论 #15660893 未加载

评论 #15661115 未加载

评论 #15660848 未加载

jedisct1over 7 years ago

评论 #15662931 未加载

dorfsmayover 7 years ago

Not "all"!Maybe their main DCs, or their largest, but not all of them. I have virtual servers in thier Quebec DC (BHS) and it hasn't gone down since the last time I rebooted it.

ashitlerferadover 7 years ago

I have 30+ servers on OVH. All are online.

xmichael99over 7 years ago

This happens to Internap almost weekly... I always wondered why they never make it in the news.

dredmorbiusover 7 years ago

How Complex Systems Fail<a href="http://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf" rel="nofollow">http://web.mit.edu/2.75/resources/random/How%20Complex%20Sys...</a>

ever1over 7 years ago

Detailed report <a href="https://twitter.com/olesovhcom/status/928904373949919232" rel="nofollow">https://twitter.com/olesovhcom/status/928904373949919232</a>

perlgeekover 7 years ago

评论 #15660996 未加载

fapjacksover 7 years ago

Huh. I have services active on two dedicated machines from OVH in Canada, and I was logged into both via SSH all night, and didn't have any interruption at all.

r1chover 7 years ago

Looks like only their routing / network was down. My servers just came back up and haven't experienced any power outage.

评论 #15661397 未加载

nstricevicover 7 years ago

I just moved 2 apps to OVH. So this was totally unexpected. My apps are unavailable for more than 7 hours.Does this happen often with OVH?

askmikeover 7 years ago

My server hosted on OVH had some problems (DNS lookups) but has stayed up and works fine right now.EDIT: Hosted in EU.

gizzlonover 7 years ago

Now this status page is down as well. Sucks to be them right now =/ (I'm in Europe)

pavlakoosover 7 years ago

I'm trying to find ETA for solving the issue, but they didn't post it on Twitter.Anybody knows ETA?

评论 #15661163 未加载

评论 #15660967 未加载

stevenhover 7 years ago

My OVH servers in Canada and Australia are running fine.My OVH servers in France are all inaccessible.

评论 #15661303 未加载

jagermoover 7 years ago

This has to be one of the least informative status page I have ever seen.

treoover 7 years ago

Looks like they are starting to come back up. My VPS is accessible again.

aerovistaeover 7 years ago

评论 #15661209 未加载

评论 #15661206 未加载

oronover 7 years ago

not all of them, I have some servers in Canada, working OK

thejoshover 7 years ago

Sydney is fine.

KeitIGover 7 years ago

评论 #15661023 未加载

评论 #15661813 未加载

评论 #15662618 未加载

评论 #15660927 未加载

评论 #15661956 未加载

contingenciesover 7 years ago

To make error is human. To propagate error to all server in automatic way is #devops. - @devopsborat

评论 #15661503 未加载

评论 #15661211 未加载

Sami_Lehtinenover 7 years ago

评论 #15661245 未加载

评论 #15661779 未加载

评论 #15661339 未加载

metafunctorover 7 years ago

Someone with access might wish to update the title of this post, because all OVH datacenters are definitely not down.

评论 #15661782 未加载

评论 #15660966 未加载

评论 #15663171 未加载

评论 #15660987 未加载

Hates_over 7 years ago

Trending on Twitter with the hashtag #OVHGATE<a href="https://twitter.com/hashtag/OVHGATE?src=hash" rel="nofollow">https://twitter.com/hashtag/OVHGATE?src=hash</a>

评论 #15661090 未加载