DigitalOcean lost our data and gave us $500

55 pointsby danielfernandezover 11 years ago

28 comments

So if you were backing up your data to Tarsnap, then you'd be up and running as quickly as you could launch a new instance and redownload everything. And $500 credit is enough to power a micro droplet for 100 months, or a small droplet for 50 months. DO handled this well.<a href="http://www.tarsnap.com" rel="nofollow">http://www.tarsnap.com</a>EDIT: s/years/months/g. Thanks.

评论 #6684055 未加载

AznHisokaover 11 years ago

So this is a technical problem I am having right now that's preventing me from backing up a Postgres database completely (hope someone here can help).I have a master Postgres database that is receiving a TON of transactions per second (I'm talking about thousand concurrent transactions). We tried running pg_dump on this database, but the DB is just too huge, and it took more than 4 days to completely dump out everything. Not only that but it impacted performance to the point where backing it up was just not feasible.No problem.. just create a slave-DB and run pg_dump on that, right? We did just that, but the problem is that you can't run long running queries on a hot standby (queries that take more than a minute).What would you do in my scenario? With the hot standby, I technically am backing up my data, but I would have 100% piece of mind if I could daily backups in case someone accidentally ran a "DROP DATABASE X", which would also delete the hot standby/slave db as well.

评论 #6684147 未加载

评论 #6684231 未加载

评论 #6684149 未加载

评论 #6684141 未加载

评论 #6684199 未加载

评论 #6684295 未加载

评论 #6684200 未加载

评论 #6684239 未加载

评论 #6684247 未加载

chcover 11 years ago

The abrasive headline is kind of unfortunate, as the actual moral of the story given at the end is exactly the right takeaway: Never assume your hardware is infallible, so always have backups that you know you can use when your server experiences a wildly improbable catastrophe.Also, very impressed by Digital Ocean's response here. Given their reputation as a budget host, they really do put a lot of effort into service.

评论 #6684367 未加载

alex_sfover 11 years ago

That's way more compensation than I would have expected. AWS usually won't even notify you until after the node has gone down.Hardware failures happen; an application needs to be tolerant of it.

评论 #6684180 未加载

gregdover 11 years ago

This is 2013. Why are we still talking about backups as a lesson learned? Is it because startups are skimping on Sys Admins?

评论 #6684352 未加载

评论 #6685434 未加载

评论 #6684438 未加载

deanclatworthyover 11 years ago

It's great you had backups, but why a write-up. Is it an attempt to smear DO's otherwise good name? It's an un-managed VPS so it's your responsibility to keep backups of your box, not theirs. And hardware fails all the time, so you can expect this to happen anywhere.

viraptorover 11 years ago

> And if you just launched and have a single instance running, let your alpha users know that there will probably be some downtime.That's true. But there's no reason for extended downtime even if that instance goes down. Make sure your whole setup is described in chef/puppet/salt/ansible/cf/whatever and even a rebuild from scratch takes only minutes then. There's really little reason to skip that these days.

pheaover 11 years ago

DO is affordable enough that the minimum you should run are 2 droplets. Having said that, I'm actually fairly impressed with the 500 credit and now you have no excuses to run 2 vms. Consider it a lesson learnt.

JackFrover 11 years ago

Alternate title: DigitalOcean went above and beyond their SLA for us.

kylecover 11 years ago

DigitalOcean's pricing page indicates that "All cloud hosting plans include automated backups". (<a href="https://www.digitalocean.com/pricing" rel="nofollow">https://www.digitalocean.com/pricing</a>) From the email you received, it sounds like this is clearly not the case. I wonder what other claims DigitalOcean is making that are not true.

评论 #6684628 未加载

epochwolfover 11 years ago

That's always a risk with servers. They can die and everything can do with them. But they had backups so they didn't lose everything.

thuover 11 years ago

It seems that is very nice from DO. I would not expect them to be responsible of data loss in case of hardware failure.

KaiserProover 11 years ago

This might sound a bit glib, but raid 5 shouldn't really be used in modern storage.If you ignore the performance issues (which can vary by device) its just not safe. Depending on the size of drive can take anywhere up 30hours+ to rebuild.bear in mind that you tend to use disks that are all the same batch, it leaves you in the danger zone for far too long.Your options are: somesort of clever RAID (ZFS type thing) Another type of clever RAID (Like the LSI chunk thingy in the DCS37000) RAID 10

评论 #6684515 未加载

arsover 11 years ago

Was this really a dual drive failure, or was this the rather common single drive failure plus undetected errors on a backup drive, that show up when trying to rebuild?Because that happens a lot, and it's very important to do a full read of every drive in the array at least weekly! You have two options for doing that:If you are using linux md raid then run the "check" command, which automatically does the test using background I/O (but does still impact things). On debian, and perhaps other distros too the mdadm command will do it every month by default. Make sure to set a minimum speed or it might never finish if you have a busy system.You can also use the built in SMART on the disk to do a long self test. This also uses background I/O and I think it has a bit less impact on existing operations. (But you have to have some idle time on the disk or it will never finish.) If you install smartmontools you can set smartd to do this test for you every week, and keep an eye on the results.I personally do both, plus a short self test of the disk every night.

neomover 11 years ago

I truly believe that we did the best we could in this instance. Drive failures are always always unfortunate, even with backups, downtime exists.That being said, we're always genuinely looking to improve, and I'd welcome your feedback on how you feel we did and how you feel we could do better. Please do reach out to me personally john@do! Thanks. :)

mgkimsalover 11 years ago

"we had backups".Do you mean you had backups on digital ocean (using their backup service) or something else?

评论 #6684266 未加载

评论 #6684114 未加载

kbar13over 11 years ago

Good thing you had backups.With that being said, these days it's a good idea to use a deployment tool or configuration management system like puppet/salt/ansible/chef/etc, especially in a virtualized environment. This will help with scalability as well as situations such as these.

sebslomskiover 11 years ago

This is the reason why I moved all data away from my server instances. My images are hosted by cloudinary(with s3 bucket backup) and my databases are Amazon RDS instances. I don't care if a server goes down, I can launch a new one in a matter of minutes (with ansible) without any data loss.

评论 #6684396 未加载

level09over 11 years ago

The author is sweet, his conclusion was "always backup your data" if it was me I would probably say "I'm moving away, will never trust them again on my data" ..

评论 #6684376 未加载

评论 #6684542 未加载

评论 #6684358 未加载

rb2eover 11 years ago

The $500 credit from DO is quite reassuring. Usually if the HD fails and your data is lost, your out of luck. I hear the "horror" stories of some hosts reusing consumer Hard Drivers between servers so learned, Your data is your responsibility. I'm glad the OP had backups but these failures happen, thankfully DO had the business sense to compensate them.Seems good advertising for DO, as any knowledgable system admin knows Drives fail. DO could have not done anything.

Xorlevover 11 years ago

Linkbait title, they handled it exceedingly well. Onus is on you to back up your data. You did not 'lose' your data, given you had backups.

cbsmithover 11 years ago

> And if you just launched and have a single instance running, let your alpha users know that there will probably be some downtime.How about instead "alpha users should know that there will probably be some downtime". Multiple instances don't really fix that.

aquadropover 11 years ago

Nice move from DO to give everyone $500 credit. As I remember, they don't guarantee data safety (you still need backups even if they did). Double disk failure is a rare thing, but it happens.

bcoatesover 11 years ago

Is DO apologizing here as a PR move, or do they make reliability claims that would lead you to think this sort of thing wouldn't happen?

jonkneeover 11 years ago

DigitalOcean proudly advertises that they use SSDs... A dual drive failure with data loss should be very rare. I wonder what happened.

monkeyzover 11 years ago

So they now run raid5? I remember they boasted about raid10 a while ago, now they silently downgraded to raid5 :)

Liongadevover 11 years ago

What is the best/cost effectiv way to backup a windows server?

评论 #6684235 未加载

od2mover 11 years ago

What are the best options for backing up DO externally?