I run an image sharing website and AWS recently billed us over $1k for bandwidth costs:<p>http://i.imgur.com/sGrYboT.png<p>What cheaper alternatives do you recommend?
While you might be able to reduce costs through dedicated services, it also means you are going to have to write more code, maintain it and also maintain the machines etc. Don't underestimate that there are significant costs to do this reliably and maintain it over time as things grow. If you do need to do it though, I might suggest trying at least at first to use a csync2 with lsync on linux to manage the distribution of the files across the machines. We use this even in AWS for a site and it works really reliably, quick and was easier then managing a distributed FS.<p>But before I would go that route I'd probably first try setting up a proper CDN for the images and take advantage of the caching settings so you can reduce your S3 bandwidth charges.<p>Also, if you don't already have it, I'd also setup some monitoring on your AWS account to give you a weekly update on account charges & usage so you can see what is going on.
Set up a cheap CDN (MaxCDN is a good bet as someone else has mentioned) this website will help - <a href="http://www.cdncalc.com/" rel="nofollow">http://www.cdncalc.com/</a>
this will half your monthly costs.<p>I would stick with S3 for the origin to start with, setting up a bunch of servers to reliably store and serve data is a pain, and one you should avoid it if you can. On the CDN enable origin shielding, and set TTL of the images to never expire if you can. This will lead to images only to be served once from s3.
When they upload, link to the CDN'd asset, it will pre-warm the CDN for subsequent requests.<p>Without knowing the trends in your traffic, how much data you are storing etc, its hard to give you really good advice.
What we do is run Varnish in front of S3 to cache files on machines with cheaper bandwidth.<p><a href="https://github.com/hummingbird-me/ansible-deploy/blob/master/roles/varnish/templates/default.vcl" rel="nofollow">https://github.com/hummingbird-me/ansible-deploy/blob/master...</a><p>It is very little work to set up, and you don't need to modify your code and migrate your files to a different service. Setting up a CDN might have been easier but this works out cheaper for us.
Host it yourself with dedicated servers in a colocation center.<p>S3 is expensive for use cases like an image sharing service. Running your own servers with dedicated, unmetered bandwidth (or at least metered bandwidth in the 20TB+ range) is cheaper.<p>If 11TB of network transfer is spread out evenly over the month, a 100mbit uplink would handle it with plenty of room to spare. (Your traffic is probably not evenly distributed, it's probably very bursty.)
Setup an origin box (or a few and use a load balancer) running nginx. Then use a CDN (we love MaxCDN) to pull from the origin. Make sure you setup the cache headers right in nginx. Something like:<p><pre><code> location ~* \.(?:ico|js|css|gif|jpe?g|png|xml)$ {
expires 30d;
add_header Pragma public;
add_header Cache-Control "public, must-revalidate, proxy-revalidate";
}</code></pre>
This is a lot of data to transfer. As suggested caching and cloud front. And start figuring out how to pay for it (advertising or charging customers-- I would think with that much data being transferred there should be a revenue stream already to cover $1k a month or perhaps it isn't scalable)
Perhaps consider an enterprise agreement with AWS? We've saved $large without needing to switch providers just by committing to a minimum monthly volume of data transfer.<p>Worth seeing the numbers before you invest in dev & operations.
Go buy some cheap dedicated servers from places like OVH and create your own fairly simple CDN/Hosting. You could easily chop that cost by 70%+<p>Or is there anything specific in S3 feature that you need replicated?
Not exactly an alternative but you can get $1000 off by finishing two entrepeneurship courses on EdX.<p><a href="https://www.edx.org/AWS-activate" rel="nofollow">https://www.edx.org/AWS-activate</a>
Thanks for all the ideas. For now we are going to route traffic via cloud front and set cache to almost never expire. We've also compressed images and it looks good so far.
Other suggestions here are good. In the meantime, start saving money instantly by fronting S3 with AWS CloudFront instead of serving images directly from S3.