Here's another one for you: distribute your s3 paths/names<p>Because of the way s3 is designed, the place files are stored on the physical infrastructure is dependant on the prefix of the key name. I'm not exactly sure how much of the key name is used, but for example if you prefixed all you images with imges/....jpg it's highly likely they will all be stored on the same physical hardware.<p>I know of at least two companies for whom this has caused large problems for, one of them is netflix. Imagine all the videos in a single bucket with key names "/video/breaking_bad_s1_e1.mp4" (a crude example I know), all requests hit the same physical hardware and under high load the hardware just can't keep up and this exact issue has apparently been the cause of more than one Netflix outage.<p>The solution is simple, ensure your files have a random prefix ({uuid}.breaking_bad_s1_e1.mp4) and they will be spread around the datacentre :)
Another tip: IAM roles for EC2 instances.<p><a href="http://aws.typepad.com/aws/2012/06/iam-roles-for-ec2-instances-simplified-secure-access-to-aws-service-apis-from-ec2.html" rel="nofollow">http://aws.typepad.com/aws/2012/06/iam-roles-for-ec2-instanc...</a><p>Basically, the apps you run on EC2 often need to access other AWS services. So you need to get AWS credentials onto your EC2 instances somehow, which is a nontrivial problem if you are automatically spinning up servers. IAM roles solve this by providing each EC2 instance with a temporary set of credentials in the instance metadata that gets automatically rotated. Libraries like boto know to transparently fetch, cache, and refresh the temporary credentials before making API calls.<p>When you create an IAM role, you give it access to only the things it needs, e.g., read/write access to a specific S3 bucket, rather than access to everything.
I would temper the suggestion to use Glacier with a warning: make sure you thoroughly understand the pricing structure. If you don't read the details about retrieval fees in the FAQ, it's easy to shoot yourself in the foot and run up a bill that's vastly higher than necessary. You get charged based on the <i>peak</i> retrieval rate, not just the total amount you retrieve. Details here: <a href="http://aws.amazon.com/glacier/faqs/" rel="nofollow">http://aws.amazon.com/glacier/faqs/</a><p>For example, suppose you have 1TB of data in Glacier, and you need to restore a 100GB backup file. If you have the luxury of spreading out the retrieval in small chunks over a week, you only pay about $4. But if you request it all at once, the charge is more like $175.<p>In the worst case, you might have a single multi-terabyte archive and ask for it to be retrieved in a single chunk. I've never been foolhardy enough to test this, but according to the docs, Amazon will happily bill you tens of thousands of dollars for that single HTTP request.
OP here. Really cool to see people enjoying the write up.<p>It started off as a list of misc AWS topics that I found myself repeatedly explaining to people. It seemed like a good idea to write them down.<p>I'm planning on listing out more in follow up posts.
This is all good stuff. AWS takes a while to grok but once you do, it offers so many new possibilities.<p>The Aha! moment for me came when playing with SimianArmy, the wonderful Netflix OSS project and in particular, Chaos Monkey.<p>Rather than build redundancy into your system, build failure in and force failures early and often. This will surface architectural problems better than any whiteboard.<p><a href="https://github.com/Netflix/SimianArmy" rel="nofollow">https://github.com/Netflix/SimianArmy</a><p>Also, check out boto and aws cli.<p><a href="http://aws.amazon.com/sdkforpython/" rel="nofollow">http://aws.amazon.com/sdkforpython/</a><p><a href="http://aws.amazon.com/cli/" rel="nofollow">http://aws.amazon.com/cli/</a>
I for one think that S3's server side encryption is amazing for most if not all data. We backup hundreds of gigabytes of data every day to S3 and enabling server side encryption enables us to not worry about generating, rotating, and managing keys and encryption strategies. It also saves us time since we don't have to compute the encryption or the decryption. The best part is that the AWS AES-256 server side Encryption at Rest suffices for compliance.<p>Of course, the data that we store, while confidential, isn't super-mission-critical-sensitive data. We trust AWS to not peek into it, but nothing will be lost if they do.
Cloud66 is a good cautionary tale of refusing the temptation to double-dip, when it comes to your AWS tokens:<p><a href="https://news.ycombinator.com/item?id=5685406" rel="nofollow">https://news.ycombinator.com/item?id=5685406</a><p><a href="https://news.ycombinator.com/item?id=5669315" rel="nofollow">https://news.ycombinator.com/item?id=5669315</a>
I know this is about AWS, but it might be helpful to mention that DreamObjects (<a href="http://www.dreamhost.com/cloud/dreamobjects/" rel="nofollow">http://www.dreamhost.com/cloud/dreamobjects/</a>) is an API-compatible cheaper alternative to S3.
Question about the underlying SaaS product. I'm not understanding how a database client on the cloud can connect to a database server on my own machine?<p>Or am I misunderstanding what it does?
If you're running an application that runs on more than one server it's definitely worth checking out AWS OpsWorks. It's a huge time saver and extremely useful in integrating and managing setup, configuration, and deployment across a server/db/cache etc without any loss of control or customization.
So, in older reddit threads, I read about how you need to build Raid-1 EBS for all your ec2 servers as well as test your EBS storage, because they could be really bad.<p>Is anybody doing this for their EC2 deployments and more importantly, automating this?