The quoted numbers per GB are honest calculations, but they can be a little misleading because they don't reflect the REAL costs you are going to end up paying PER MONTH and UP FRONT. As with anything, you always have to run the calculations first.<p>An illustrative example: My co-founders and I recently looked at a bunch of new office spaces and were doing similar comparisons with costs per square foot for each space. We ended up in a situation where option A (a much larger space) looked MUCH cheeper at $17/sq.ft, but we ended up going with option B (a small space) at nearly $40/sq.ft. because we just didn't need all the space in option A, and option B was in an office facility where we wouldn't have to buy any additional furniture or appliances like couches, chairs, a coffee maker, fridge, etc. (they were supplied to all tenants in a large common area as part of the cost). So the REAL cost difference to us PER MONTH was about $400 LESS with option B (the smaller more "expensive" space) and came with a lot of extra conveniences to boot.<p>So in the example given in the article, if you're not using the full 45TB of backup storage space (24 x 2TB in RAID-6), you could actually end up paying significantly more per GB for what you have stored (what you're actually using) - ESPECIALLY when you include the UP FRONT costs of buying and co-locating the server and the maintenance costs that go along with it.<p>Moral of the story: Just because something LOOKS more expensive per unit doesn't mean it's actually GOING TO BE when it comes to cashflow. ALWAYS do the math for your own situation before making decisions like these.
You need to hire a sysadmin (or operations engineer or whatever they're called these days). You're talking about developers spending time on implementing backup, which is wrong. That's not what developers should be doing. Sysadmins may occasionally write software to solve a problem, but they're not software developers. Software developers may occasionally do sysadmin type work, but in my experience most of them are notoriously bad it it (setting up mod_python because they have a blog post with step-by-step instructions from 2004 bookmarked where that was the best way to do stuff - or leaving gigantic virtualenv turds with complete python binaries in your SCM because they don't understand what virtualenv is for).<p>Sysadmins also have toolkits (you know, screwdrivers, torx bits, zip-ties) and have scars on their arms that prove they aren't afraid of sharp-edged hardware stuff that sometimes starts smoking for no discernible reason and doesn't turn on any blinkenlights when you push the button (many software developers panic at this point). This comes in very handy when you just "left the cloud" and you're experiencing first-hand the reasons why people moved into the cloud in the first place (hardware sucks).<p>Sysadmins also know about this backup stuff and will tell you to shut up when you start talking about doing it with cobbled together shell scripts. They'll probably recommend using something like Amanda (or a commercial equivalent), that makes sure your backups happen regularly, are complete and actually contain the stuff you needed to backup. Good ones may even know to test the backup occasionally by restoring a server just to see if it actually works afterwards.<p>(Apologies to any software developers who know their sysadmin stuff.)
Cheap backups? <i>Use de-duplication.</i><p>I have cronjob with daily dumps of several MySQL databases, in the usual textual format. One day woth of dumps takes about 470MB now. Two years ago, we started at about 20MB/day and it was growing ever since.<p>Each dump is committed into one common Git repo. After two years (that's just over 700 dumps), this whole Git repo is about 180MB.<p>Yep, much less that one daily dump. Git performs thorough de-duplication and delta-compression.<p>Cheap backups? <i>Use de-duplication.</i>
I looked into backup options for my company and ended up rolling our own solution using an open-source program called Duplicity: <a href="http://duplicity.nongnu.org/" rel="nofollow">http://duplicity.nongnu.org/</a> . I've been really impressed with Duplicity. Incremental backups are fast, the data is encrypted and you can target many different source/destination types (local file system, ssh, Amazon S3, ftp).
Security reminder:<p>The live server should not have write access to the backup machine.<p>Instead the backup machine should have read access to the live server.<p>This prevents disaster in case of hacks.
Tarsnap pricing is for deduplicated data. You need to divide it by orders of magnitude for a proper comparison.<p>Apples and oranges - what you are doing is on site backup, Tarsnap is offsite. Both are needed.
Anybody else disappointed that an article "How to do cheap backups" didn't at ALL describe how they, well, actually do the backups?
I was expecting some smart copy algorithm, not a post about the price of hardware. Also, they compare their hardware when at full capacity, to AWS and others scaling pricing model. Their first GB will cost a lot more than stated here in price / GB.
Takes about 25 lines in bash to do rotating, encrypted, s3 backups with Timkay's awesome aws script<p>I actually had to throttle our servers when sending to amazon as they seem to be able to receive at impossible maximum speeds and eat the whole pipe!
I hope the backup machine is connecting to the live machine and they're sealed off from each other. I've heard of cases where hackers have managed to get into a machine and then access backup machines to completely wipe all copies of a database.
If all your servers are on Amazon EC2, and your backups are in S3, then you have all your eggs in one basket. One billing dispute with Amazon and your servers <i>and</i> backups are gone.<p>Backups are there to protect you in case the worst happens.
I use a nice OS X app called Arq which does encrypted backups to my own S3 bucket. Since I only have a few GB of stuff that warrants offline backup (git repos, etc) the ~$0.25 storage fees per month is well worth the convenience.
It's nice, but they still need someone to keep an eye on the backup machine.<p>Also S3 is expensive because it keeps many copies of your data (though they appear as one) and check them for corruption, so it would be more reliable than a single backup machine.
One important feature they didn't mention: if you have a decently powerful backup server hosting all of your data, it may be relatively easy in case of an emergency to use it to serve production data directly. For instance, you could start a mysql instance with the backup data directly from the backup server, if your production server (or datacenter) is fried.<p>There is no easy way to achieve that from S3 or tarsnap.
Small plug but we're always looking for awesome people to join our ops team! Link: <a href="http://mixpanel.theresumator.com/apply/Xm0tLy/Software-Engineer-Operations.html" rel="nofollow">http://mixpanel.theresumator.com/apply/Xm0tLy/Software-Engin...</a>
And softlayer is by no means the cheapest provider of dedicated hosting in the US (though, there are few comparable providers (in terms of price and quality) with multiple regions).