GoDaddy outage caused by corrupted router tables

100 点作者 jakeludington超过 12 年前

18 条评论

druiid超过 12 年前

I find this extremely suspicious (I.E. knowing routers, I call bullshit). The change to the Verisign anycast DNS service which I noted yesterday in another thread... brought godaddy.com back up, yet did not result in bringing other DNS services back up.Someone is lying here in my opinion. I hope I'm proven wrong because this is a terrible excuse for the company to make.EDIT: And as someone else pointed out... their IP addresses could be pinged which further goes to disprove a routing issue. More than likely high traffic crashed one or more routers (THIS I have seen happen) and the live/saved configs didn't match. I'd put more money on something like this happening if it was router related.

评论 #4506446 未加载

评论 #4507285 未加载

评论 #4506757 未加载

评论 #4507556 未加载

评论 #4507228 未加载

highfreq超过 12 年前

How can they claim 99.999% uptime, when they just had several hours of service outage? I'm not sure how long they've been providing DNS hosting, but by the most generous assumption this would be the entire 15 years of their existence. 99.999% allows them about 1.3 hours of outage in 15 years.

评论 #4506619 未加载

评论 #4506407 未加载

评论 #4506504 未加载

评论 #4507028 未加载

评论 #4506442 未加载

staunch超过 12 年前

Based on a long history of working in datacenters I'd bet someone misconfigured something and later claimed it was "corrupted" to save their ass - happens all the time. It's just so simple to make very confusing and damaging mistakes in a complicated network.I wouldn't be surprised to hear that GoDaddy's corporate culture wouldn't respond well to someone admitting to a mistake this damaging.

评论 #4506499 未加载

评论 #4507269 未加载

kijeda超过 12 年前

The DNS is designed to provide resiliency to these kinds of problems by providing the ability to list multiple NS records located in different networks. It is standard practice for top-level domain operators and other high-activity domains to place their name servers in different networks to guard against these kinds of issues. When companies put all their name servers in the same network, they are removing the diversity benefit and create a single point of failure. Domain operators should take this as a cautionary tale that they shouldn't have all their eggs in one basket and make sure a single network failure couldn't take all their name servers offline.

评论 #4506639 未加载

mootothemax超过 12 年前

I have to wonder how many extra customers the various third-party DNS services have gained as a direct result of this.I've just switched to DNSMadeEasy - for anyone concerned about the time involved, they have some cool timesavers like templates you can apply to all of your domains at once. Really makes a difference not having to manually set up individually the entries for Google Apps on 20+ domains.

SilasX超过 12 年前

Semi-OT: Why is it so hard to find yesterday's highly-rated GoDaddy outage discussion? Neither sorting by relevance nor recency nor points will find it. Or maybe there wasn't one?

评论 #4506649 未加载

评论 #4507006 未加载

评论 #4507210 未加载

oasisbob超过 12 年前

On the outages mailing list[1], Mike Dob (GoDaddy Network Engineering Manager) just added more details, saying:> It was BGP related and more details should be posted today[1] <a href="http://puck.nether.net/mailman/listinfo/outages" rel="nofollow">http://puck.nether.net/mailman/listinfo/outages</a>

评论 #4508451 未加载

hbz超过 12 年前

"The service outage was not caused by external influences. It was not a "hack" and it was not a denial of service attack."So Anonymous0wn3r or whoever was just claiming responsibility for something they had no hand in? The router tables just corrupted themselves?

评论 #4506211 未加载

评论 #4506293 未加载

mbell超过 12 年前

"We have determined the service outage was due to a series of internal network events that corrupted router data tables."I'm witholding any judgement on internal vs external involvement till this series of events is defined. (doubt it ever will be)

dingdingpop超过 12 年前

Their engineer claims it was an issue with BGP (<a href="http://permalink.gmane.org/gmane.org.operators.isotf.outages/4279" rel="nofollow">http://permalink.gmane.org/gmane.org.operators.isotf.outages...</a>).BGPlay (<a href="http://bgplay.routeviews.org/" rel="nofollow">http://bgplay.routeviews.org/</a>) does not show anything indicative in the BGP default-free table (what the Internet sees), as abnormal or misconfigured. While there could be iBGP issues, like others have stated there was (intermittent) connectivity by IP during the outage.It's both bullshit PR and more importantly spreading disinformation to save face. Why?A security breach would instill customer fear and generate negative press. Customers would leave by the droves.A DoS/DDoS displays that GoDaddy has inadequate infrastructure while competitors such as CloudFare actually do. Furthermore, why would a company that pisses off the Internet be appealing to anyone? Again it will generate negative/bad press, and customers will leave by the drove.Spreading disinformation by claiming it was either a human error or equipment fault? From a company perspective this is actually the best option. Just provide generous service credit to your customers, you may generate positive press, you will gain customer goodwill and regain their confidence. This is GoDaddy's best option.Until they provide actual details with proof that it was a misconfiguration or hardware fault, I will continue to call bullshit. Too many factors don't add up, especially the publicly available data which monitors the BGP DFT on the Internet.The two conjectures that seem plausible so far is the SQL injection in their web interface for DNS and/or a DoS/DDoS attack.

TomGullen超过 12 年前

I don't know much about hardware at all, but aren't routers fairly simple, time tested pieces of hardware? Can they really corrupt en-masse in this way?

评论 #4506320 未加载

评论 #4506190 未加载

评论 #4506262 未加载

kevincennis超过 12 年前

For anyone interested, the person who claimed responsibility for this is tweeting about GoDaddy's response: <a href="https://twitter.com/AnonymousOwn3r/status/245568841160196096" rel="nofollow">https://twitter.com/AnonymousOwn3r/status/245568841160196096</a>

评论 #4507055 未加载

评论 #4508454 未加载

评论 #4507237 未加载

xtdx超过 12 年前

Not a good week for claiming credit...

wethesheeple超过 12 年前

"yet did not result in bringing other DNS services backup"Can you be more specific?Which other domain names did you try?Also, I believe some parts of the world were unaffected by the outage.I would guess a large majority of GoDaddy customers would not even know this outage occurred. They are "casual" domain name registrants and in some cases "casual" website operators. They registered some names and then never did anything with them. Or they operate a website but it's very low traffic and they rarely think about it. That is only a guess.

oomkiller超过 12 年前

This doesn't do anything to explain why it was out for so long. I guess I should expect this type of thing from GoDaddy though, they are mainly a consumer company.

goldeneye超过 12 年前

This statement makes a lot of sense. I found it a bit suspicious that Anonymous Own3r twitted: "When i do some DDOS attack i like to let it down by many days, the attack for unlimited time, it can last one hour or one month" Which sounds like he actually has no control over what is happening and makes a statement that is impossible to disprove.

ww520超过 12 年前

What is their 99.9% SLA liability going to be?

评论 #4506394 未加载

评论 #4506486 未加载

overworkedasian超过 12 年前

I wasnt really paying attention to the outage, but if it was indeed a routing issue, then you shouldnt have been able to reach any godaddy ip address. ICMP/traceroutes would have failed and showed the error.

评论 #4506369 未加载

评论 #4506980 未加载

评论 #4506656 未加载