TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Amazon RDS failure - data has been lost

82 pointsby akhkharualmost 13 years ago
Our RDS instance is in "failure" state after 8 hours of downtime. Have to restore from point in time backup which does not have actual data.<p>Amazon says:<p>Jun 15, 4:03 AM PDT The RDS service is now operating normally. All affected Multi-AZ RDS instances operated normally throughout the power event after failing over. We were able to recover many Single AZ instances successfully, but storage volumes attached to some Single-AZ instances could not be restored, resulting in those instances being placed in Storage failure mode. Customers with automated backups turned on for an affected database instance have the option of initiating a Point-in-Time Restore operation. This will launch a new database instance using a backup of the affected database instance from before the event. To do this, follow these steps: 1) Log into the AWS Management console 2) Access the RDS tab, and select DB Instances on the left-side navigation 3) Select the affected database instance 4) Click on the "Restore to Point in Time" button 5) Select "Use Latest Restorable Time 6) Select a DB instance class that is at least the same size as the original DB instance 7) Make sure No Preference is selected for Availability Zone 8) Launch DB Instance and connect your application We will be following up here with the root cause of this event.

7 comments

EwanTooalmost 13 years ago
RDS should not have lost data, and if I were a user of it, I'd be annoyed too.<p>At the same time, if you've not spotted by now that EBS (elastic block storage, which powers RDS) is not reliable and not to be trusted, then you have to look at yourself too.<p>EBS is by far the worst product AWS offer, you simply should not use it without a very good reason, and if you do need to use it, you have to assume any given drive image will disappear at any moment - as it did here.<p>Beyond that, any time you're running a database, no matter who the provider is, if you're not doing backups every day or hour, then you're not doing things right.
评论 #4116223 未加载
评论 #4116357 未加载
评论 #4116683 未加载
评论 #4116329 未加载
评论 #4116881 未加载
justincormackalmost 13 years ago
Use multi AZ then, which performed as expected. There have been so many warnings about single AZ that you would hope people get it by now.
评论 #4116071 未加载
评论 #4116937 未加载
PaulHoulealmost 13 years ago
If you had a database running on a dedi you could get trashed by a server failure too.<p>Good backups are the best defense.
评论 #4116148 未加载
评论 #4116131 未加载
bananashakealmost 13 years ago
Why do you think the "Restore to Point in Time" failed to work? That puzzles me the most in this catastrophe and no has addressed it. In theory with Point-in-Time restoration you should not lose data from a failure on just the storage where the InnoDB is stored.
purephasealmost 13 years ago
I'm not sure I understand the "which does not have actual data" part of your statement.<p>Could you explain that a bit more?
评论 #4116091 未加载
评论 #4116095 未加载
mschallealmost 13 years ago
Always assume Murphy's law will hold, regardless of what service provider you use.<p>If you were running your own database, you surely would have had rigorous backups because the responsibility was on you.<p>Assume that if a service can fail, it will. If data can be lost, it will be. Then, plan accordingly.<p>EDIT: grammar
debaclealmost 13 years ago
But but...the cloud.