Fantastic list with much more depth than I expected. Some surprises that others might be interested in from this article and comments below:<p><pre><code> [1] Keeping buckets locked down and allowing direct client -> S3 uploads
[2] Using ALIAS records for easier redirection to core AWS resources instead of CNAMES.
[3] What's an ALIAS?
[-] Using IAM Roles
[4] Benefits of using a VPC
[-] Use '-' instead of '.' in S3 bucket names that will be accessed via HTTPS.
[-] Automatic security auditing (damn, entire section was eye-opening)
[-] Disable SSH in security groups to force you to get automation right.
</code></pre>
[1] <a href="http://docs.aws.amazon.com/AmazonS3/latest/dev/PresignedUrlUploadObject.html" rel="nofollow">http://docs.aws.amazon.com/AmazonS3/latest/dev/PresignedUrlU...</a><p>[2] <a href="http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/CreatingAliasRRSets.html" rel="nofollow">http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/Cre...</a><p>[3] <a href="http://blog.dnsimple.com/2011/11/introducing-alias-record/" rel="nofollow">http://blog.dnsimple.com/2011/11/introducing-alias-record/</a><p>[4] <a href="http://www.youtube.com/watch?v=Zd5hsL-JNY4" rel="nofollow">http://www.youtube.com/watch?v=Zd5hsL-JNY4</a>
I'd also add to the list - make sure that AWS is right for your workload.<p>If you don't have an elastic workload and are keeping all of your servers online 24/7, then you should investigate dedicated hardware from another provider. AWS really only makes sense ($$) when you can take advantage of the ability to spin up and spin down your instances as needed.
One thing the article mentions is terminating SSL on your ELB. If you want more control over your SSL setup AND want to get remote IP information (e.g. X-Forwarded-For) ELB now supports PROXY protocol. I wrote a little introduction on how to set it up[0]. They haven't promoted it very much, but it is quite useful.<p>[0]: <a href="http://jud.me/post/65621015920/hardened-ssl-ciphers-using-aws-elb-and-haproxy" rel="nofollow">http://jud.me/post/65621015920/hardened-ssl-ciphers-using-aw...</a>
Be very careful with assigning IAM roles to EC2 instances. Many web applications have some kind of implicit proxying, e.g. a function to download an image from a user-defined URL. You might have remembered to block 127.0.0.*, but did you remember 169.254.169.254? Are you aware why 169.254.169.254 is relevant to IAM roles? Did you consider hostnames pointed to to 169.254.169.254? Did you consider that your HTTP client might do a separate DNS look-up? etc.<p>There are other subtleties which make roles hard to work with. The same policies can have different effects for roles and users (e.g., permission to copy from other buckets).<p>IAM Roles can be useful, especially for bootstrapping (e.g. retrieving an encrypted key store at start-up), but only use them if you know what you're doing.<p>Conversely, tips like disabling SSH have negligible security benefit if you're using the default EC2 setup (private key-based login). It's really quite useful to see what's going on in an individual server when you're developing a service.<p>Also, it does matter whether you put a CDN in front of S3. Even when requesting a file from EC2, CloudFront is typically an order of magnitude faster than S3. Even when using the website endpoint, S3 is not designed for web sites and will serve 500s relatively frequently, and does not scale instantly.
> you pay the much cheaper CloudFront outbound bandwidth costs, instead of the S3 outbound bandwidth costs.<p>What? CloudFront bandwidth costs are, at best, the same as S3 outbound costs, and at worse much more expensive.<p>S3 outbound costs are 12 cents per GB worldwide. [1]<p>CloudFont outbound costs are 12-25 cents per GB, depending on the region. [2]<p>Not only that, but your cost-per-request on CloudFront way more than S3 ($0.004 per 10,000 requests on S3 vs $0.0075-$0.0160 per 10,000 requests on CloudFront)<p>[1] <a href="http://aws.amazon.com/s3/pricing/" rel="nofollow">http://aws.amazon.com/s3/pricing/</a>
[2] <a href="http://aws.amazon.com/cloudfront/pricing/" rel="nofollow">http://aws.amazon.com/cloudfront/pricing/</a>
Lots of very useful tips there!<p>There's one that I think could be improved on a little:<p><pre><code> Uploads should go direct to S3 (don't store on local filesystem and have another process move to S3 for example).
</code></pre>
You could even use a temporary URL[0,1] and have the user upload directly to S3!<p>[0]: <a href="http://stackoverflow.com/questions/10044151/how-to-generate-a-temporary-url-to-upload-file-to-amazon-s3-with-boto-library" rel="nofollow">http://stackoverflow.com/questions/10044151/how-to-generate-...</a>
[1]: <a href="http://docs.aws.amazon.com/AmazonS3/latest/dev/PresignedUrlUploadObject.html" rel="nofollow">http://docs.aws.amazon.com/AmazonS3/latest/dev/PresignedUrlU...</a>
Good article, but I think it touches too little about persistence. The trade-off of EBS vs ephemeral storage, for example, is not mentioned at all.<p>Getting your application server up and running is the easiest part in operation, whether you do it by hand via SSH, or automate and autoscale everything with ansible/chef/puppet/salt/whatever. Persistence is the hard part.
Really useful article, though I don't agree with not using a CDN instead of S3. There are multiple articles which proves the performance of S3 being quite bad, and not useful for serving assets, comparing to CloudFront.
Along these lines, I recommend installing New Relic server monitoring on all your EC2 instances.<p>The server-level monitoring is free, and it's super simple to install. (The code we use to roll it out via ansible: <a href="https://gist.github.com/drob/8790246" rel="nofollow">https://gist.github.com/drob/8790246</a>)<p>You get 24 hours of historical data and a nice webUI. Totally worth the effort.
<p><pre><code> > Use random strings at the start of your keys.
> This seems like a strange idea, but one of the implementation details
> of S3 is that Amazon use the object key to determine where a file is physically
> placed in S3. So files with the same prefix might end up on the same hard disk
> for example. By randomising your key prefixes, you end up with a better distribution
> of your object files. (Source: S3 Performance Tips & Tricks)
</code></pre>
This is great advice, but just a small conceptual correction. The prefix doesn't control where the file contents will be stored it just controls where the index to that file's contents is stored.
One painful to learn issue with AWS is the limits of services, which some of them are not so obvious. Everything has a hard limit and unless you have the support plan, it can take you days and weeks to get those lifted. They are all handled by the respective departments and lifted (or rejected) one by one. Many times we've encountered a Security Group limit right before a production push or other similar things. Last, but not least, RDS and CloudFront are extremely painful to launch. I have many incidents where RDS was taking nearly 2 hours to launch - blank multi-AZ instance! CloudFront distributions take 30 minutes to complete. I hate those two taking so long as my CloudFormation templates pretty much take an excess of an hour due to the blocking RDS and CloudFront. Last, but not least - VPC is nice, I love it, but it takes time to get what's the difference between Network ACL and Security groups and especially - why the neck do you need to run NATs?! Why isn't this part of the service?! They provide some outdated "high" availability scripts, which are, in fact, buggy, and support only 2 AZs. Also, a CloudFront "flush" takes over 20 minutes - even for empty distributions! Also, you can't do a hot switch from on distribution to another as it also take 30 minutes to change a CNAME and you cannot have two distributions having the same CNAME (it's a weird edge case scenario, but anyway).
> Have tools to view application logs.<p>Yes! Centralized logging is an absolute must: don't depend on the fact that you can log in and look at logs. This will grow so wearisome.
i'm a devops noob. what tools should i use to log / monitor all my servers?<p>i don't want to learn some complex stuff like cheff/puppet btw.... anything SIMPLE?
Can you (or somebody else) elaborate on disabling ssh access? Is this a dogma of "automation should do everything" or is there a specific security concern you are worried about? What is the downside of letting your ops people ssh into boxes, or for that matter of their needing to do so?
How hard is it to roll your own version of AWS's security groups? I want to set up a Storm cluster, but the methods I have come up with for firewalling it while preserving elasticity all seem a bit fragile.
As an Australian developer, using an EC2 instance seems to be the cheapest option if you want a server based in this country. Anyone got any other recommendations?
Can anyone explain how disabling ssh has anything to do with automation? We automate all our deployments through ssh and I was not aware of another way of doing.