Hi there. We're building a system which stores voice audio files. We're looking at good compression codecs for it (speex) but we're looking at 30 million minutes of audio a month which is looking like 40Tb of data storage a year. These kind of figures scare me a little!! Wondering if you would use Amazon for this or go another route? Just interested to hear from anyone who's doing anything of a similar scale
SmugMug currently hosts 600TB of pictures on Amazon S3.<p><a href="http://gigaom.com/2008/06/25/structure-08-werner-vogels-amazon-cto/" rel="nofollow">http://gigaom.com/2008/06/25/structure-08-werner-vogels-amaz...</a><p>So yeah I'd probably use them.
40TB?!<p>Look into other CDNs like Akamai or Limelight. I think the bulk price deal you're going to get with an established CDN is better than the rates you'll get with Amazon flat rates.
We run a small specialized storage company and the things that seem to matter most are: storage capacity, availability, reliability, transfer rates for both current data usage and new data addition.<p>40Tb can be handled pretty well by S3 and other storage services and they have pretty good pricing information to model your costs. Note that they don't (yet) provide very specific SLA's for data availability, so keep that in mind when designing your system.<p>Maintaining your own drives with some sort of redundancy (RAID, automatic copies, etc.) or using something like (bias alert) our open-source project <a href="http://allmydata.org" rel="nofollow">http://allmydata.org</a> which is effectively a software RAID layer both require some IT and systems energy, so this has to be bundled into your operational costs if you choose that route.<p>Just to emphasize what others have mentioned, it is important to incorporate the new data influx rate into your model. If you are successful, 40Tb this year might turn to 120Tb next year, so make sure that your cashflow model can support the underlying cost of whatever system you choose.
I havent done anything similar. But just my 2c - try to see if you can make use of existing file sharing systems like rapidshare/megaupload etc and link to them. I believe hotornot guys used free yahoo photos hosting and just linked to them to save on bandwidth(it worked then)
Do not underestimate the transfer costs. The information you have here - size of data "stored" - is only one factor. You need to have some estimates about your transfer in and out, and that will tell you whether Amazon makes sense or you have to go with a CDN.