So I'd modify these a bit. We run a very large AWS infrastructure as a engineering team (no dedicated ops).<p>1. Use CloudFormation only for infrastructure that largely doesn't change. Like VPC's, subnets/ internet gateways etc. Do not use it for your instances / databases etc, I can't recommend that enough, you'll get into a place where updating them is risky. We have a regional migration (like database migrations) that runs in each region we deploy to that sets up ASG, RDS etc. It allows us control over how things change. If we need to change a launch conf etc.<p>2. Use auto-scaling groups in your stateless front ends that don't have really bursty loads, it isn't responsive enough for really sharp spikes (though not much is). Otherwise do your own cluster management if you can (though you should probably default to autoscaling if you can't make a strong case not to use it).<p>3. Use different accounts for dev / qa / prod etc. Not just different regions. Force yourself to put in the correct automation to bootstrap yourself into a new account / region (we run in 5 regions in prod, and 3 in qa, and having automation is a lifesaver).<p>4. Don't use ip addresses for things if you can help it, just create a private hosted zone in Route53 and map it that way.<p>5. Use instance roles, and in dev force devs to put their credentials in a place where they get picked up by the provider chain, don't get into a place where you are copying creds everywhere, assume they'll get picked up from the environment.<p>6. Don't use DynamoDB (or any non-relational store) until oyu have to (even though it is great), RDS is a great service and you should stick with it as long as you can (you can make it scale a long way with the correct architecture and bumping instance sizes is easy). IMO a relational store is more flexible than others since you (at least with postgres) get transactional guarantees on DDL operations, so it makes it easier to build in correct migration logic.<p>6. If you are using cloudformation, use troposphere: <a href="https://github.com/cloudtools/troposphere" rel="nofollow">https://github.com/cloudtools/troposphere</a><p>7. Understand what instances need internet access and which ones don't, so you can either give them public ips, or put in a NAT. Sometimes security teams get grumpy (for good reason) when you open up machines that don't need to be to the internet, even if its just outbound.<p>8. Set up ELB logging, and pay attention to CloudTrail.<p>9. We use Cloudwatch Logs, it has its warts (and its a bit expensive), but it's better than a lot of the infrastructure you see out there (we don't generally index our logs, we just need them to be able to be viewed in a browser and exported for grep). It's also easy to get started with, just make sure your date formats are correct.<p>10. By default, stripe yourself across AZs if possible (and its almost always possible). Don't leave it for later, take the pain up front, you'll be happy about it later.<p>11. Don't try and be multi-region if you can at first, just replicate your infrastructure into different regions (other than users / accounts etc.). People get hung up on being able to flip back and forth between regions, and its usually not necessary.<p>edit: Track everything in cloudwatch, everything.