Indeed. As I write this we're in the midst of our third S3 outage of the day. The past two were eventually documented on the AWS Service Dashboard. The latest one has not yet received its tiny status icon to indicate an outage.<p>It's one thing that S3 keeps going down today; we run our own server cluster and I accept that 100% uptime isn't possible. But it's aggravating that they can't at least figure out how to give timely updates on their dashboard when something is broken.<p>We inevitably learn of S3 outages through our internal error reporting systems before AWS posts it to their status page. When they do finally post, it is usually a tiny "information" icon, even when reporting a problem that makes the service unusable. The laggy, misleading nature of their status page gives the impression they must be tying bonuses to the status icons. Can't fathom why else they would be so inept when it comes to keeping us updated when something is wrong. Surely they have sufficient internal monitoring to pick up on these outages long before they update their customers.
From Amazon:<p>"Hello, We have just become aware of EC2 network connectivity issues in the US-EAST-1 region. The impact of this issue is loss of network connectivity to EC2 instances in US-EAST-1. The AWS support and engineering teams are actively working on bringing closure to this issue. I will share additional information as soon as I learn more about this issue."
I'm risking being inflammatory here, but do people really believe that they get better uptime from AWS compared to renting dedicated servers?<p>I feel like AWS has way too many moving parts to be stable.<p>It's very tempting for them to reuse bits of infrastructure everywhere which increases the chances that if something goes wrong somewhere it will break your stuff. So for example, hosting instance images on S3 means that when S3 has issues, now EC2 has issues.
The us-east-1 region gets treated differently than all other regions by AWS. Part of the reason it gets treated differently it is the default, and hence the most popular. It also doesn't help that it is on the east coast, and experiences more weather.<p>For the above reasons, and that I work in the SF bay area, I put everything in us-west-2. us-west-2 sometimes has it's own issues, but nothing quite at the level of us-east-1.
"12:28 PM PDT Between 12:03 PM to 12:19 PM PDT we experienced elevated errors for requests made to Amazon S3 in the US-STANDARD Region. The issue has been resolved and the service is operating normally"<p>Our AWS TAM called us. I don't think he wanted the nasty call I gave him at 4:30am
Amazon yet again lying to it's customers about the status of the service is the only real issue I see here>. Services fail, it's a fact of life but at least admit it's broken and that the issue is being fixed instead of blatantly lying and saying minor disruptions.
We saw a short burst of 503s a short while ago, but we have not seen any since. Hopefully we do not see any more though.<p>Also, for the record, S3 has been very stable for us otherwise. We have been rather happy with AWS overall.
What happened to that 99.99% availability? Either way this just got posted at reddit.com/r/sysadmin which might be useful to some for tracking error rate: <a href="https://pulse.turbobytes.com/results/55c8751aecbe400bf80005f2/" rel="nofollow">https://pulse.turbobytes.com/results/55c8751aecbe400bf80005f...</a>
We had problems while connecting to S3 standard US region from us-east-1 at 19UTC but it was solved 20 minutes later.<p>edit: seeing connectivity issues again at 19h50UTC