TechEcho

Our company has about 200 TiB of mission-critical data stored in regional Google Cloud buckets. We are currently planning for better protection of this data and are wondering about any best practices for backups of object storage out there.Object versioning: our data is mostly written only once, so we plan to enable object versioning to prevent overwrite attacks (or accidental overwrites).Regional redundancy: we are considering moving all data to multi-regional buckets. This would help protect against natural disasters, fires, etc. It would not protect against systematic errors in Google Cloud Storage, though, so we are also considering an offsite backup (for example to AWS S3 Deep Glacier). Both options would come with significant upfront cost (15_000$ each). We are unsure whether we should choose both of these options, or only one (and which one).Archival for external backup: about half of our data consists of frames extracted from videos (approx 300KiB each), and the other half are the original videos. Some of the frames have additional processing applied to them and reproduction from the videos is not fully deterministic, so we would prefer to keep the extracted files in the backup instead of recreating them if lost. However, the huge amount of small files (currently 200 Mio) causes significant cost for operations during backup, so we are considering adding a compression worker. This worker would take all the frames for a video and put them in a shared archive, which would then be sent to the backup bucket. This mechanism introduces an additional risk of failure, though, so we are also unsure about doing this.Syncing between Google Cloud buckets and AWS S3: if we decide to go with backups to AWS S3, we would have to put in place a mechanism that keeps the original buckets in sync with the mirror on S3. Are there any good tools for this out there? If we had to roll our own, we would probably go for some sort of event-triggered serverless function to copy files when they are created / changed on the original bucket, combined with a regular full sync of files (similar to gsutil rsync).Finally, disaster recovery should be quick but doesn't need to be instant (e.g. within 24 hours).We're thankful for any input based on your experience!

3 comments

piterrroover 1 year ago

200tb is little, depending on your constraints you can even fit it on a hard drive array within your office. There are so many options, if you would list your requirements, constraints or the problem you're trying to solve exactly, that would help come up with a proper advice.

评论 #39345119 未加载

keep320909over 1 year ago

I would go with two approaches. One in-cloud backup, maybe the same provider (to save traffic, but under different account).Second low-tech offline onsite backup. You do not even need large server to plug all hdds at once, just swap harddrives like CDs or VHS tapes...

评论 #39345130 未加载

blahman24over 1 year ago

its a good startegy, but you have to examine cost here more, google, aws...easy to get in but never get out, and they charge so many hidden fees, I would look at idrive e2 as a cheaper option

3 comments

piterrroover 1 year ago

评论 #39345119 未加载

keep320909over 1 year ago

评论 #39345130 未加载

blahman24over 1 year ago

its a good startegy, but you have to examine cost here more, google, aws...easy to get in but never get out, and they charge so many hidden fees, I would look at idrive e2 as a cheaper option

Ask HN: Best practices for backing up object storage?

3 comments

Ask HN: Best practices for backing up object storage?

3 comments