DigitalOcean lost our server

290 pointsby ayiover 9 years ago

51 comments

mizzaoover 9 years ago

I'm more worried about the following post in that thread:<begin quote> --------------------------------------------------------We were loyal customers of DigitalOcean for over 2 years. We showed up to work one morning and had a client email stating that their website was down. We checked, sure enough. We tried logging in to our DI account and it said it was suspended. After searching around, we realized our CC had expired. No biggie, we'll just update it, turn the server back on and be on our way.Nope. We had to contact customer service to get back in to our account. After updating the card info we realize that all our droplets are gone. We reach out to customer service again in which case they let us know that when a CC expires, an automated process kicks off and deletes the droplets. We've been a customer for 2 years, surely they could pick up the phone and call us. We've spent thousands of dollars with them.Plan B, let's restore them from the backups we've been paying for. Nope! When they delete your droplets, they also delete your backups.This is where we ask to speak to them on the phone and are denied. I then ask if they are insane and why they would delete someone's servers and backups via an automated process without a human at least checking to see if it is a loyal multi year customer who has a simple lapse with their CC expiry.We had to find a backup on a developers machine from over a month prior and rebuild data using various megtods that took close to a week. In the end, DI gave us a $500 credit.You get what you pay for. Anyone who uses DI for production or anything more than a hobby app is playing with fire. They do not care, they are apathetic, they will screw you over and the throw you a credit for their terrible service as a half assed apology.

评论 #11039241 未加载

评论 #11039706 未加载

评论 #11040165 未加载

评论 #11040307 未加载

评论 #11039937 未加载

评论 #11045026 未加载

评论 #11040335 未加载

评论 #11039810 未加载

评论 #11040546 未加载

评论 #11044468 未加载

评论 #11040277 未加载

评论 #11039880 未加载

gdgtfiendover 9 years ago

This seems like a non-issue to me. If you're using an IaaS provider you should be treating the network as volatile from the get-go. This is the reason AWS has things like auto-scaling groups. You should be designing for failure in "the cloud"

评论 #11037000 未加载

评论 #11037280 未加载

评论 #11037131 未加载

评论 #11037392 未加载

评论 #11036858 未加载

评论 #11038180 未加载

评论 #11036985 未加载

评论 #11037382 未加载

评论 #11037199 未加载

phamiltonover 9 years ago

I'm glad the author didn't turn this into a negative review of DO. "What will you do when your server is lost?" is exactly the right question to learn from a scenario like this.

评论 #11037237 未加载

评论 #11036938 未加载

评论 #11036799 未加载

评论 #11039174 未加载

评论 #11037136 未加载

vinceguidryover 9 years ago

If you're relying on backups for servers other than your database then you're keeping state on your servers and that's a Bad Thing. You should regularly destroy your own servers and recreate them using your configuration / deployment scripts if the prospect of this happening worries you. Do it before your business starts to rely on it.For database servers you need to have procedures in place to quickly switch the production application over to a database you just spun up and migrated the data from a recent backup to in order to test your data backups. You want this to happen as smoothly as possible in the case of a failure. Keep data backups in three different places and test your procedure on all of them.On a production system, do not keep the database on the same server as the application.

评论 #11037288 未加载

评论 #11038520 未加载

评论 #11037289 未加载

chrismartinover 9 years ago

Probably due to optimization of the conversion funnel, it's VERY easy to build things on DigitalOcean without understanding that - unlike many other *-as-a-service - droplets do not include backups and that is your responsibility. As a sysadmin, this is fine since I don't trust a single provider for both prod and backups anyhow, but I fear for those who are just getting started and don't realize this.I agree that DO handled this situation appropriately, but they should warn people a little more deliberately when they sign up and create a droplet. A checkbox along the lines of "I, user, understand that I'm responsible for backing up my server", with a few links to some how-to articles of the excellent quality that DO is known for, would suffice and empower without reducing the conversion rate.

评论 #11037128 未加载

评论 #11040044 未加载

bikamonkiover 9 years ago

This is yet another reason to push forward the idea of static websites/webapps. There is no single argument to support the idea of using a database to power a 5-6 page website. Not even a blog with daily posts. How many sites out there have less than 10 pages, are updated maybe once per month, sport a contact form or newsletter subscription box? 90%? And how many of them use an insecure behemonth like WP or Joomla?My new model: client wants a pretty responsive theme, I get the HTML version of it. Turn into a template I can use to spit out a static site (from my homebrewed CMS). I tell clients your monthly support charges are $0. If you ever need updates I charge hourly (most never call in months). Some clients want "a list of something" say products. If small and won't grow: a static csv file that also gets munched by the homebrewed CMS. Large list? A BAAS service with API.I really don't care if the servers are gone/hacked/ransomware. Git clone, go.

评论 #11039795 未加载

kentonvover 9 years ago

I think Google Compute Engine does the right thing here: by default it uses "persistent disks" (network-attached redundant/highly-available block devices) for all disks. The only case I've heard of where persistent disk data was lost was a few acknowledged writes occurring just before an unusual lightning-induced power outage: <a href="https://status.cloud.google.com/incident/compute/15056" rel="nofollow">https://status.cloud.google.com/incident/compute/15056</a>For added protection, you can take regular snapshots. You only pay to store the diff from the last snapshot (so go ahead and snapshot often), and snapshot storage is geographically distributed.(Note: I have no idea what EC2 does, maybe it's similar.)

评论 #11038350 未加载

评论 #11037363 未加载

r1chover 9 years ago

I definitely wouldn't recommend DO's built in backup service in production based on my experience. It brings my site down every week since all I/O freezes during the backup. Sometimes the droplet doesn't recover and needs a hard reset. Apparently it's a known issue but hard to fix.

justizinover 9 years ago

"Luckily, we made the decision at Spatie to host every site on it’s own droplet, so only one site was affected."I think that's a poor lesson learned here. Were this me, I would have said:"Luckily, all of our sites run on several servers, access data in a shared, replicated cluster, and a small shell script I wrote kept me from writing this entire blog post."IaaS has only surfaced what has always been true: your data lives on little physical things that are screwed into a thing and goes through a controller that could fuck up due to cosmic rays.Do better for your customers.

评论 #11038712 未加载

bsg75over 9 years ago

TL;DR:Backup servers no matter where hosted, because two is one and one is none.

csomarover 9 years ago

Two things to learn from this:1. Never have a single point of failure. Relying on DO for Server+Backups is putting your eggs on one basket.2. Your server state should be programmable. This is not quite easy for complex configurations. But today, DO has an API, we have Docker, and quite modern deployment tools.Here is setup:1. Github for the server state. Basically, a repository to configure and deploy my infrastructure.2. Enable DO backups in case I mess up something and want a quick come-back.3. File backups through Tarsnap. Since I use Docker, I have a volume container. Backup the volume container with Tarsnap.

评论 #11038067 未加载

justinhjover 9 years ago

This has happened to me with Amazon servers too, it's just the nature of the cloud. I think Digital Ocean handled this as well as can be expected.

captn3m0over 9 years ago

Cache Link: <a href="https://webcache.googleusercontent.com/search?q=cache%3Ahttps%3A%2F%2Fmurze.be%2F2016%2F02%2Ftoday-digitalocean-lost-our-entire-server%2F&oq=cache%3Ahttps%3A%2F%2Fmurze.be%2F2016%2F02%2Ftoday-digitalocean-lost-our-entire-server" rel="nofollow">https://webcache.googleusercontent.com/search?q=cache%3Ahttp...</a>

评论 #11036918 未加载

tomphooleryover 9 years ago

That really stinks, but this should be a warning to you that in the future, you should have some kind of redundancy plan so data does not get lost. Even if Amazon decided to terminate all of my app instances right now, I would just need to rebuild them. No data is actually lost, and furthermore backups are made frequently so that if a server decides to explode one day, we're still covered. DigitalOcean droplets are really not meant to be holding your database and application server...you shouldn't have experienced "data loss" by losing a droplet because your database should not have been hosted on a droplet.

评论 #11038529 未加载

评论 #11038996 未加载

braytonover 9 years ago

They had to make space for the YC portfolio companies getting $250k of credits ;)

评论 #11037087 未加载

vidyeshover 9 years ago

To my surprise, this HN thread has no link to any external backup solution guide or little-to-no suggestion of best way to backup your server to an external service or another VPS/backup server.

评论 #11040872 未加载

autotuneover 9 years ago

>Error establishing a database connection.I award OP the "Most Accurate Title Of The Day" Award.

评论 #11036946 未加载

seanwilsonover 9 years ago

What if you need to scale up to more servers? What if your server gets hacked and you have to recreate it? What if you accidentally delete the server or perform an upgrade that completely breaks it?None of these would be an issue if you've scripted server recreation from scratch and permanent data is stored in high availability shared databases and services like S3. If you're relying on weekly backups to save your server state and customer data you're just asking for trouble. I like that Heroku recreates your server once a day to force you to do this properly.

trustfundbabyover 9 years ago

The weekly backup deal is something I really do not like with Digital Ocean, I think Linode does it better, I've been planning to move for a while now, I've just been super lazy, but this might be the push I need. 7 days of lost data is unacceptable to me, especially when I'm paying for backups.> Three backup slots are executed and rotated automatically: a daily backup, a 2-7 day old backup, and an 8-14 day old backup. A fourth backup slot is available for on-demand backups.

评论 #11037145 未加载

评论 #11037367 未加载

评论 #11037310 未加载

codazodaover 9 years ago

Although I have backups of my critical files, I don't want to try to rebuild from those files if I can help it.After learning that Digital Ocean does no backups of their own (even for critical hardware failure on their own side) I've enabled weekly backups for an additional $1 per month.Glad you posted this so that I know that option is available and really necessary. And who wouldn't pay $1?Edit: It costs 20% but I only pay $5 for my personal server.

评论 #11037133 未加载

apocalyptic0n3over 9 years ago

Does anyone have any good backup solutions to mitigate this? My agency uses a script we wrote in-house to more-or-less rsync our data to an AWS instance, but it's always seemed a poor way to handle it. I'm using DO's backup service for my personal site which was always supposed to be a temporary solution (the timing of the backups is inconvenient as mentioned by the article).A better solution would be wonderful.

评论 #11037471 未加载

评论 #11038062 未加载

rakemover 9 years ago

I see this as a win for your processes (having backups) and DO for being honest. Stuff breaks, it happens. Good for you for having backups.

ryanmarrover 9 years ago

The comment thread on this post makes me very happy. Having recently switched all my production instances to individual docker containers, I'm very happy to hear that the consensus seems to be that server instances should be killed and spun back up like bacteria. There's hope for humanity yet.

arihantover 9 years ago

I do understand the part where they terminated and killed of the VMs. I do not, at all, understand why they deleted the backups. How can the CEO sit in his chair and make that decision? If he didn't and he doesn't know, why is he still in that chair?Even damn Netflix preserves data for 10 months. Deleting backups is making 100% sure that the customer doesn't come back to you. It's amazingly stupid.Unless you're as smart and as big as Google, use the damn phone. Don't use the word automated at all. It's not automated if one fool made a decision and another one wrote the script.

outworlderover 9 years ago

Oh well, It's not like DigitalOcean doesn't tell you to enable backups. And you should not expect that hardware failures are now a thing of the past just because it's in "the cloud".

charliedevolveover 9 years ago

If you're paying for backups and they get deleted the moment there's an issue with payment, ur doin' it rong. I don't care whatever the TOS says, they're being completely shitty to customers. Sure there should be regular off-site backups. No one is arguing against that. But the fastest restore is usually from the closest source, and that's partly why you'd pay them for it.

Glyptodonover 9 years ago

Between the lines takeaway: hardware raid is dangerous.

评论 #11037111 未加载

评论 #11039751 未加载

ljoshuaover 9 years ago

I hadn't heard of the backup providers mentioned in the article. Does anyone have experience with those or with other recommended solutions?I can re-implement the infrastructure of my VMs fairly easily (thank you Ansible), but backing up content outside of the provider's built-in options is something I haven't played with yet, and would obviously be the crucial piece.

评论 #11038043 未加载

nikdowover 9 years ago

Tell me if I'm doing something wrong: my two EC2 servers both use EBS including for the root device, so I'm not using any device-attached storage. If Amazon "lost" my instance, it would appear to me to be just a reboot.And yes, I do a nightly backup of databases, webroot, /etc and some other directories.

评论 #11039357 未加载

krinchanover 9 years ago

I use DO as a test bed for a lot of stuff. I ended up destroying/recreating so many droplets I just started using Chef. Also, I come from a major background in AWS, ChaosMonkey in prod, and a very strict push-button + nuke & pave deployment strategy.So, all this strikes me as "lol you didn't know this?"

rubberstampover 9 years ago

Whats the point of raid if raid card failure resulted in complete loss of data? I came across an article a week or two ago that discussed various alternatives, I think ZFS with checksums, which unlike raid will not replicate corrupt data from drive nearing failure to the healthy drive.

评论 #11040128 未加载

the_watcherover 9 years ago

Holy hell, obviously where critical data is concerned, redundancy is something you should insist on, but this definitely sucks. Seems ridiculous that all you got from DigitalOcean is a $15 credit though. I'd have expected something like 6 months of paid backups comped.

评论 #11038938 未加载

评论 #11039820 未加载

jvoorhisover 9 years ago

Sounds like a Tuesday at AWS.

mdellabittaover 9 years ago

Hey, anybody remember LeafyHost?<a href="http://arstechnica.com/civis/viewtopic.php?f=25&t=238085" rel="nofollow">http://arstechnica.com/civis/viewtopic.php?f=25&t=238085</a>

kullover 9 years ago

We are also experiencing a huge problem with performance of DO servers as we scale. Especially databases. With stories like that , a decision of moving to AWS seems pretty obvious.

scurvyover 9 years ago

Anyone have any experiences with running VM's with ceph storage? I've used it for other things, but I know that VM hosting is quite popular. Care to share any stories?

sauereover 9 years ago

I had two DO machines with a unrepairable file system after a crash and reboot. Shit happens. Never rely on a _any_ VPS. I treat them as throw-away boxes that can fail anytime.

Lanariover 9 years ago

I recommend having a backup using Git on some reliable Git hosting(s). I can't think on anything safer and more practical.

mrnismo92over 9 years ago

You should always build for failure, especially if you are running on IaaS. Good post & a great reminder for many!

twundeover 9 years ago

DO just gave a talk (~2 weeks ago) where they mentioned that they were moving to software RAID.

评论 #11037361 未加载

therealmarvover 9 years ago

No surprise for me. It also happened on DO for me one time. It seemed I've had my machine on a wrong rack/server there. At the end you get what you pay for I would say... you can have luck on your DO server but there is a reason why Rackspace and Amazon still exist nowadays! If you can afford one of the bigger ones go for it.

评论 #11037909 未加载

tvvocoldover 9 years ago

Lucky to u, they just delete my server just for i set up a VPN server...

halfdanover 9 years ago

..and the second due to a flood of people from HN.Server seems down :-/

20centuryboysover 9 years ago

How exactly do you "ssh" into your DO droplet?

评论 #11038170 未加载

fidgetover 9 years ago

As if this can't happen on physical machines also.

评论 #11037095 未加载

tehbmarover 9 years ago

Welp, I'm verifying back ups on my server now.

beachstartupover 9 years ago

you should hear what people say when we suggest that they make their own backups in addition to the ones we provide for them.adults acting like children. nothing more, nothing less.take responsibility for your own data. back it up yourself.

halisover 9 years ago

Lucy!? You got some splainin' to do...

z3t4over 9 years ago

I don't know how to write this without upsetting quite a lot of people, but RAID!???!I run Redundant disk controllers and hard drives on ZFS. And it's cheaper than a decent RAID system.First thing I do with a new server is disabling RAID.

评论 #11037069 未加载

d0ugieover 9 years ago

Where's that internet permanence when you need it?

vellisover 9 years ago

Typical PHP user, thinking that cloud is like their shared HostGator box. Pathetic.