TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

More on today's Gmail issue

112 pointsby mgcreedover 15 years ago

9 comments

mrshoeover 15 years ago
These kinds of domino effects are one reason why scalability is so hard to get right. It reminds me of precipitation in supersaturated solutions. Everything seems normal until you reach some unforeseen tipping point, and then all hell breaks loose.<p>I like his little veiled pitch for Google's services when he talks about how easy it was to bring more request routers online given their elastic architecture. It makes me wonder why that elasticity isn't automated -- more routers should <i>automatically</i> be brought online if any routers hit their maximum load.
评论 #799425 未加载
评论 #799408 未加载
评论 #799680 未加载
评论 #801822 未加载
spolskyover 15 years ago
Wow, I was impressed by how closely this mea culpa was the same as Amazon's when they had that big S3 outage:<p>Compare to:<p><a href="http://developer.amazonwebservices.com/connect/message.jspa?messageID=79978#79978" rel="nofollow">http://developer.amazonwebservices.com/connect/message.jspa?...</a>
评论 #799500 未加载
smakzover 15 years ago
I admire the transparency but I don't pretend for a second it's the whole story. This happened during work hours and if they indeed did get notified so fast, I'm wondering why it took over 90 minutes to recover.<p>Also, the outage, for me anyway, seemed to last much longer then the stated 100 minutes. I seem to remember being unable to access GMail for a span of about 3 hours today.
评论 #799441 未加载
pmoriciover 15 years ago
Interesting that they say the outage lasted 100 minutes instead of 1 hour 40 minutes which to me sounds worse.
评论 #799466 未加载
ssnover 15 years ago
Look at the bright side of this: GMail just got more reliable.
arfrankover 15 years ago
Its nice to see them being so transparent about what happened and how they plan on fixing it in the future. They're obviously working on anticipating problems in the future, but what I wonder about is things like this, where they thought they were covered. How does one go about finding these failure points on systems that span multiple locations? I hope they followup with lessons learned on their quest to improve reliability.
taitemsover 15 years ago
This is probably the most glaring flaw in SaaS and cloud computing. Even the giants go down eventually. Couple that with your own ISP's issues and your potential downtime is doubled.
评论 #799476 未加载
评论 #799445 未加载
评论 #800227 未加载
评论 #799487 未加载
评论 #799462 未加载
lallysinghover 15 years ago
Unless they've had other downtime on gmail, their uptime's been (after this fault) 99.99239%. Pretty good.
sanjover 15 years ago
Why didn't the routing servers come back online after they cleared their queue?