TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

How to do cheap backups

134 pointsby suhailover 13 years ago

15 comments

vlucasover 13 years ago
The quoted numbers per GB are honest calculations, but they can be a little misleading because they don't reflect the REAL costs you are going to end up paying PER MONTH and UP FRONT. As with anything, you always have to run the calculations first.<p>An illustrative example: My co-founders and I recently looked at a bunch of new office spaces and were doing similar comparisons with costs per square foot for each space. We ended up in a situation where option A (a much larger space) looked MUCH cheeper at $17/sq.ft, but we ended up going with option B (a small space) at nearly $40/sq.ft. because we just didn't need all the space in option A, and option B was in an office facility where we wouldn't have to buy any additional furniture or appliances like couches, chairs, a coffee maker, fridge, etc. (they were supplied to all tenants in a large common area as part of the cost). So the REAL cost difference to us PER MONTH was about $400 LESS with option B (the smaller more "expensive" space) and came with a lot of extra conveniences to boot.<p>So in the example given in the article, if you're not using the full 45TB of backup storage space (24 x 2TB in RAID-6), you could actually end up paying significantly more per GB for what you have stored (what you're actually using) - ESPECIALLY when you include the UP FRONT costs of buying and co-locating the server and the maintenance costs that go along with it.<p>Moral of the story: Just because something LOOKS more expensive per unit doesn't mean it's actually GOING TO BE when it comes to cashflow. ALWAYS do the math for your own situation before making decisions like these.
评论 #3620124 未加载
评论 #3619526 未加载
sdfjklover 13 years ago
You need to hire a sysadmin (or operations engineer or whatever they're called these days). You're talking about developers spending time on implementing backup, which is wrong. That's not what developers should be doing. Sysadmins may occasionally write software to solve a problem, but they're not software developers. Software developers may occasionally do sysadmin type work, but in my experience most of them are notoriously bad it it (setting up mod_python because they have a blog post with step-by-step instructions from 2004 bookmarked where that was the best way to do stuff - or leaving gigantic virtualenv turds with complete python binaries in your SCM because they don't understand what virtualenv is for).<p>Sysadmins also have toolkits (you know, screwdrivers, torx bits, zip-ties) and have scars on their arms that prove they aren't afraid of sharp-edged hardware stuff that sometimes starts smoking for no discernible reason and doesn't turn on any blinkenlights when you push the button (many software developers panic at this point). This comes in very handy when you just "left the cloud" and you're experiencing first-hand the reasons why people moved into the cloud in the first place (hardware sucks).<p>Sysadmins also know about this backup stuff and will tell you to shut up when you start talking about doing it with cobbled together shell scripts. They'll probably recommend using something like Amanda (or a commercial equivalent), that makes sure your backups happen regularly, are complete and actually contain the stuff you needed to backup. Good ones may even know to test the backup occasionally by restoring a server just to see if it actually works afterwards.<p>(Apologies to any software developers who know their sysadmin stuff.)
dexenover 13 years ago
Cheap backups? <i>Use de-duplication.</i><p>I have cronjob with daily dumps of several MySQL databases, in the usual textual format. One day woth of dumps takes about 470MB now. Two years ago, we started at about 20MB/day and it was growing ever since.<p>Each dump is committed into one common Git repo. After two years (that's just over 700 dumps), this whole Git repo is about 180MB.<p>Yep, much less that one daily dump. Git performs thorough de-duplication and delta-compression.<p>Cheap backups? <i>Use de-duplication.</i>
评论 #3619960 未加载
brucehartover 13 years ago
I looked into backup options for my company and ended up rolling our own solution using an open-source program called Duplicity: <a href="http://duplicity.nongnu.org/" rel="nofollow">http://duplicity.nongnu.org/</a> . I've been really impressed with Duplicity. Incremental backups are fast, the data is encrypted and you can target many different source/destination types (local file system, ssh, Amazon S3, ftp).
评论 #3619785 未加载
arsover 13 years ago
Security reminder:<p>The live server should not have write access to the backup machine.<p>Instead the backup machine should have read access to the live server.<p>This prevents disaster in case of hacks.
评论 #3619669 未加载
glebover 13 years ago
Tarsnap pricing is for deduplicated data. You need to divide it by orders of magnitude for a proper comparison.<p>Apples and oranges - what you are doing is on site backup, Tarsnap is offsite. Both are needed.
评论 #3619369 未加载
评论 #3618675 未加载
评论 #3618841 未加载
PanManover 13 years ago
Anybody else disappointed that an article "How to do cheap backups" didn't at ALL describe how they, well, actually do the backups? I was expecting some smart copy algorithm, not a post about the price of hardware. Also, they compare their hardware when at full capacity, to AWS and others scaling pricing model. Their first GB will cost a lot more than stated here in price / GB.
ck2over 13 years ago
Takes about 25 lines in bash to do rotating, encrypted, s3 backups with Timkay's awesome aws script<p>I actually had to throttle our servers when sending to amazon as they seem to be able to receive at impossible maximum speeds and eat the whole pipe!
评论 #3618695 未加载
评论 #3618694 未加载
评论 #3618876 未加载
underwaterover 13 years ago
I hope the backup machine is connecting to the live machine and they're sealed off from each other. I've heard of cases where hackers have managed to get into a machine and then access backup machines to completely wipe all copies of a database.
评论 #3619039 未加载
评论 #3619547 未加载
评论 #3620298 未加载
rmcover 13 years ago
If all your servers are on Amazon EC2, and your backups are in S3, then you have all your eggs in one basket. One billing dispute with Amazon and your servers <i>and</i> backups are gone.<p>Backups are there to protect you in case the worst happens.
评论 #3620621 未加载
评论 #3620430 未加载
sehuggover 13 years ago
I use a nice OS X app called Arq which does encrypted backups to my own S3 bucket. Since I only have a few GB of stuff that warrants offline backup (git repos, etc) the ~$0.25 storage fees per month is well worth the convenience.
评论 #3618691 未加载
nico_hover 13 years ago
It's nice, but they still need someone to keep an eye on the backup machine.<p>Also S3 is expensive because it keeps many copies of your data (though they appear as one) and check them for corruption, so it would be more reliable than a single backup machine.
wazooxover 13 years ago
One important feature they didn't mention: if you have a decently powerful backup server hosting all of your data, it may be relatively easy in case of an emergency to use it to serve production data directly. For instance, you could start a mysql instance with the backup data directly from the backup server, if your production server (or datacenter) is fried.<p>There is no easy way to achieve that from S3 or tarsnap.
评论 #3619928 未加载
suhailover 13 years ago
Small plug but we're always looking for awesome people to join our ops team! Link: <a href="http://mixpanel.theresumator.com/apply/Xm0tLy/Software-Engineer-Operations.html" rel="nofollow">http://mixpanel.theresumator.com/apply/Xm0tLy/Software-Engin...</a>
latchover 13 years ago
And softlayer is by no means the cheapest provider of dedicated hosting in the US (though, there are few comparable providers (in terms of price and quality) with multiple regions).