> Stripe your RDS disks for better performance<p>This is a fun hack to perform, but it opens you up to another problem: latency on any of the striped EBS volumes will lag out the entire striped array.<p>Attempts to mitigate this problem (including setting up raid 10) work in the short term, but it really is easier to just purchase a guaranteed iops volume if you want to run a database on EC2.
This is great and all, but I'd love to see a community wiki / discussion area for AWS tips, tricks, and gotchas. Something like Quora crossed with Wikipedia, just for AWS.<p>Is anyone aware of such a thing? If not, any advice for starting one?
"9) Use Virtual Private Cloud (VPC) from the start"<p>This is now a no-brainer. New registrations in certain zones will kick you into a basic VPC from the get-go.<p>The only inbound should be via an ELB (HTTP/HTTPS) and an non-DNS-resolvable SSH bastion/NAT host (m1.small is more than enough). Your bastion is the only host that is on the public Internet.<p>Setting up a bastion is reasonably straight-forward. There is an AMI that will do it all for you. Just make sure you've got "src/dst check" turned off in the EC2 panel for that server. Just that tip can save you hours of hair-tearing.<p>Outbound is via the bastion, which you can lock down to certain protocols via VPC security groups. I limit to HTTPS (no HTTP!).<p>The bastion should use ssh keys only (no username/password). Put in place fail2ban on the bastion. You can also add a firewall rule that backs off multiple fails SSH attempts. This all but nukes brute-force attacks. Also, make sure you patch regularly.<p>I go as far as to have separate keys for the bastion and then the hosts, but sensible policies should apply here (e.g. passphrase on your keys please).<p>Keep the bastion superuser login to a very select group. You can quickly remove employees' access by taking them out of the bastion. If you're in panic mode, you can turn off the bastion and isolate your network until you can regroup (similarly with the ELB for web-based attacks).<p>This is a pretty sound foundation for a secure setup.
>Use ZFS and RAIDZ with EBS<p>That's a really cool idea, and I'm curious to see what kind of performance losses will occur during normal usage.
I don't hear enough people talking about vendor lock-in with AWS, or any other cloud provider for that matter. Too many of us are building our company's infrastructure on AWS with little regard for the cost of switching.<p>I try to stick with Amazon's IaaS offerings, only employing PaaS products when they offer considerable advantages over anything I could roll economically on bare infrastructure.<p>Ask yourself, "in terms of time or money, what would it cost me if this cloud product were discontinued without notice?"<p>AWS might lose a patent battle, get shut down by the government, or get crippled by hackers or a long-term service interruption. If that happened, how long would you be down while you struggle to get up and running at another provider? What if Amazon raises their prices to the point where you want to change cloud providers? Would the cost of switching be prohibitive?
An important gotcha regarding Cloudformation - there is no way to recover from a failed rollback (except maybe to contact amazon support). So it's basically only safe for initial setup of resources.
Description of noisy neighbor problem #6 lacks some depth.<p>AWS noisy neighbors problem is very often misunderstood. CPU steal time under linux does NOT mean that somebody is stealing your CPU. It simply means that you wanted to use CPU and hypervisor has given it to another instance. This may happen because you have exceeded your quota or scheduling algorithm selected another pending instance at this very moment and it would give CPU back to you a bit later. In the end in both cases your instance gets fair share.<p>Great detailed explanations of steal time: <a href="https://support.cloud.engineyard.com/entries/22806937-Explanation-of-Steal-Time" rel="nofollow">https://support.cloud.engineyard.com/entries/22806937-Explan...</a> and: <a href="http://www.stackdriver.com/understanding-cpu-steal-experiment/" rel="nofollow">http://www.stackdriver.com/understanding-cpu-steal-experimen...</a>. The latter article is mentioned by OP but seems to be not fully read/understood.<p>Why does killing and restarting instance help? It likely moves instance to different hardware node with less active neighbors. When your neighbor is not active your instance can use CPU idle cycles of your neighbor! You sort of become the noisy one. Still hypervisor would prevent it once neighbor starts to fully utilize his CPU quota and you are back to square one.<p>Amazon does not oversubscribe CPU according to their CTO: <a href="http://itknowledgeexchange.techtarget.com/cloud-computing/amazon-does-not-oversubscribe/" rel="nofollow">http://itknowledgeexchange.techtarget.com/cloud-computing/am...</a><p>Amazon specifically states that t1.micro instances do not guarantee CPU performance:
"Micro instances are a very low-cost instance option, providing a small amount of CPU resources. Micro instances may opportunistically increase CPU capacity in short bursts when additional cycles are available. They are well suited for lower throughput applications and websites that require additional compute cycles periodically, but are not appropriate for applications that require sustained CPU performance."<p>While CPU sharing is pretty well documented noisy neighbor problem still exists for network and disk resources being shared by multiple instances on the same hardware node. The only way to detect these problems is to track network throughput/loss rate for network and IO stats for disk.<p>You are guaranteed to avoid noisy neighbors CPU problem by using AWS dedicated instances:
<a href="http://aws.amazon.com/dedicated-instances/" rel="nofollow">http://aws.amazon.com/dedicated-instances/</a><p>I work for APM ( Application Performance Management) vendor , I have no business praising AWS.<p>[Edit:spelling and clarity]<p>[Edit2: changed CPU scheduling description]
Does anyone understand the point they are making in #9 about VPC?<p>Are they suggesting using HAProxy in your public subnet and ELBs in your private subnets? Is this their reference to avoiding using a NAT box (actually PAT)? Don't you still need a HAProxy box in each of your AZs?
One more nice post about this topic: <a href="http://www.elekslabs.com/2013/11/aws-10-things-youre-probably-doing.html" rel="nofollow">http://www.elekslabs.com/2013/11/aws-10-things-youre-probabl...</a>
It looks like that this post has been partly inspired by a recent presentation that I made:
<a href="http://www.slideshare.net/simone.brunozzi/5-thingsyoudontknowaboutaws" rel="nofollow">http://www.slideshare.net/simone.brunozzi/5-thingsyoudontkno...</a>