Data loss incident (snapshots)<p>Dear Customer,
Unfortunately, we have to inform you that there was a data loss incident that affects a small amount of your snapshots on Hetzner Cloud.
All snapshots you create are stored on our highly available storage systems. The snapshot contents are distributed over multiple internal servers and data is stored in a way that allows up to two separate disks to fail without impacting data integrity.
This means the snapshot can still be accessed, even if two disks fail at the same time.
Due to a recent, very unfortunate series of events in one of our clusters, multiple disks failed in short succession and caused a small number of snapshots to become unavailable.
We immediately tried to recover the affected snapshots but unfortunately the data is lost and we have exhausted all our options.<p>Affected snapshots in your account:
XXXXXXXXX<p>The snapshots have been removed from our system as they are no longer accessible.
We sincerely hope this doesn’t cause too much trouble for you; we know losing data is the worst-case scenario. Also, we have added 20€ as Cloud Credits to your account (valid for one year). While we know that this will not bring back your data, we still hope that you will accept the gesture.
In response to this we will re-evaluate our snapshot cluster data replication strategies as well as our strategies for replacing disks and rebuilding redundancy after replacement.<p>Best Regards,
Hetzner Cloud
Lost an EBS snapshot on AWS once. The only way to provide some assurance in this space is to make sure your data is stored in more than one physically and commercially separate location.
Folks, remember the 3-2-1 rule. IMO, it’s still extremely relevant, even in today’s cloud-centric world.<p>For data you can’t afford to loose, please, for the love of ${DEITY}, don’t store it with just one vendor. You never know what happens.<p><a href="https://www.backblaze.com/blog/the-3-2-1-backup-strategy/" rel="nofollow">https://www.backblaze.com/blog/the-3-2-1-backup-strategy/</a>
That's not good but you do get what you pay for. Should never put all your eggs in one basket like that.<p>The €20 of credit made air accelerate out of my nostrils.
Anybody have more color on the nature of the "recent, very unfortunate series of events in one of our clusters [such that] multiple disks failed in short succession"? What kind of "events"? A failed climate-control system? A berserk employee with a sledge hammer? Russian hackers? The Spanish Inquisition?<p>It seems odd that they'd be quite so vague and circumspect about it. Why not just say what it was, so we're not all left to speculate, perhaps for the worst?
> All snapshots you create are stored on our highly available storage systems.<p>Highly available right until they are not. Nice to see that your data is worth 20 bucks to Hetzner. I always liked them but this is a bit rude to put it mildly.
It's really odd to see so many complaining when yesterday a thread about Atlassian was asking for precisely this type of communication.<p>They communicated what went wrong, how it happened (in a nutshell), apologized and stated how they're planning to do better.<p>They even threw in 20€ which when compared to the pricing of their snapshot storage is more than fair.<p>Shit happens, things go wrong. If you lost very important data because you only stored it in one location, then you're equally at fault. Especially when taking into account that they are a rather cheap service.<p>What more do you want?
Hetzner has been doing emergency maintenance on their backup drive systems (BX-XX) lately. Both of mine on separate accounts have had the notifications.
I've never trusted cloud provider snapshots because it's just paying more for something that might fail outside of your responsibility.<p>I have a local backup box and a remote one and they both sync stuff in from rsync on a cron job from multiple servers. Works great. In case of catastrophic failure (Which has happened a few times), putting my eggs in more than one basket has been extremely successful.
I remember being told years back with Digitalocean to not rely on their backups by one of the people that wrote their system. This made sense to me as they didn't even charge for it for the longest time. We do now use it some of the time and haven't had problems but there's always a second place we send images to (usually S3) as well as longer term offline archives.
I wonder if this is related to the emails I've been getting since the start of March regarding maintenance on Storage Box Hosts? There were urgent maintenance ones on the 11th April as well.
There is no reason to rely on "triple replication" for data integrity. This has long been a solved problem. An appropriate erasure encoding can reduce the probability of loss ~ten-fold while consuming physically less space (i.e. 2x worth of replication). Companies forego this technology because they feel confident in their operational ability to address failures quickly and competently. That's what we're relying on for data integrity, not the math.