<i>Mindless rambling ahead; I love this topic</i><p>Richard,<p>AWS is an amazing tool and you have a few options here, but the downside is the more options you use to be highly-available (HA), the more expensive AWS gets (as you would imagine).<p>Your first option is to be HA across a SINGLE region; to do this you make use of the elastic load balancers (ELB) + auto-scaling. You setup auto-scale rules to launch more instances in different availability zones (AZ) either in response to demand or in response to failures (e.g. "always keep at least 3 instances running").<p>You compliment that with an ELB to load-balance incoming requests automatically across those instances in the different AZs. This is all fairly straight forward through the web console (except auto-scaling is still done via CLI for some reason)<p>If you want to be HA ACROSS regions you can't just use ELBs anymore, you have some added complexity and an additional AWS feature you will likely want to use: Route 53.<p>Route 53 is Amazon's DNS service which offers a lot of slick DNS-features like removing dead points from DNS rotation, latency-based routings, etc. There are also something like 29 deployments of Route53 (and CloudFront) around the globe so you'll hopefully never have Route53 become a point of failure for you even disaster strikes.<p>In this scenario you would setup the HA configuration for a single-region as mentioned above, but you would do it in multiple regions. Put another way, 2+ servers in multiple AZs in <i>each</i> AWS region. Then a Route53 DNS configuration setup to point to each ELB in each region representing those individual pockets of servers.<p>Ontop of that you would use Route53 to manage all routing of client requests into your entire domain; you can leverage the new "latency-based routing" (effectively why everyone was asking for GeoDNS for years, but even better) and monitor capability to ensure you aren't routing anyone to a dead region.<p>SIMPLIFICATION<p>--------------<p>Here is what I would recommend given the size of your budget and need to stay up in the AWS cloud, in-order of expense:<p>1. Launch a single instance in a region with acceptable latency that has never had an outage before (e.g. Oregon has never completely gone down but Virginia has -- yes yes I know VA is older, but you understand my point). This solution will be cheaper than multiple instances in <i>any</i> region.<p>2. Launch multiple instances using the web console, in multiple AZs in US-EAST (cheapest option for multi-instances) and front them with an ELB. You skip any auto-scaling complexity here but you need to keep an eye on your servers. I think ELB fixed the issue where it would effectively route traffic into the void if all the instances in an AZ went down.<p>OPTIONAL: If you didn't mind spending a few $ more, you could do this strategy in the region that has never gone down for added piece of mind.<p>3. Launch single instances in multiple REGIONS and front them with Route53. This isn't really a recommended setup as entire regions will disappear if you lose a single instance, BUT I said I would list possibilities in order of price, so there you go. You could mitigate this by setting up auto-scaling policies to replace any dead instances quickly in the off chance you wanted to do exactly this but not babysit the web console all day.<p>4. Launch multiple instances in each region, across multiple AZs fronted by ELBs and then the entire collection fronted by Route53.<p>NOTE: The real cost comes from the additional instances and not from Route53 or the ELB; so if you can use smaller instances to help keep costs down (or reserved instances also) that might allow you to provide a larger HA setup.<p>What about my data?<p>-------------------------<p>Yes, yes... this is an issue that someone already touched on (data locality below).<p>You will have to decide on a single region to hold your data; in this case I would recommend using DB services that aren't based on EC2 and have never experienced outages (or rarely) -- this includes S3, SimpleDB and/or DynamoDB. AWS's MySQL offering (RDS) are just custom EC2 instances with MySQL running on them, so any time EC2 goes down, RDS goes down.<p>The other DB offerings are all custom and except for SimpleDB a long time ago, have never experienced outages that I am aware of.<p>Making this choice is all about latency and which DB store you are comfortable with (obviously don't choose SimpleDB if everything you do requires MySQL -- then use RDS); you'll want your data as close to your web tier as possible, so if you are spread across all regions you'll just want to pick a region with the smallest latency to MOST of your customers (typically West coast if you have a lot of Asia/Aus customers and East coast if you have a lot of European customers).<p>Want to Go to 11?<p>-----------------<p>If you have the money and desperately want to go to 11 with this regional-scale (which I love to do, so I am sharing this) you can combine services like DynamoDB and SQS to effectively create a globally distributed NoSQL datastore with behavior along the lines of:<p>1. Write operation comes into a region, immediately write it to the local DynamoDB instance, asynchronously queue the write command in SQS and return to the caller.<p>2. In 1+ additional EC2 instances running daemons, pull messages from SQS in chunk sizes that make sense and re-play them out to the other regions DynamoDB stores; erase the messages when processed or if the processing fails the next dameon to spin up will replay it.<p>3. On reads, just hit the local DynamoDB in any region and reply; we trust our reconciliation threads to do the work to keep us all in sync <i>eventually</i>.<p>NOTE: If you prefer to do read-repairs here you can, but it will increase complexity and inter-region communication which all costs money.<p>The challenges with this approach is that you pull up a lot of DB concerns into your code like conflict resolution, resync'ing entire regions after failure, bringing new regions online and ensuring they are synchronized, diffs, etc.<p>There is a reason AWS doesn't offer a globally-distributed data store: it is a really hard problem to get right once you make it past the 80% use case.<p>Your data will determine if this is an option or not; some data allows for certain amounts of inconsistency in which case this strategy is awesome and works great; while other data (e.g. banking data) cannot allow a single wiggle of inconsistency in which case pulling all this DB logic up into the application is a bad idea. Your failure scenarios become catastrophic (e.g. your conflict-resolution logic is wrong and wipes out the balance from an account; or keeps re-filling the balance on an empty account... something bad basically)<p>It is all a trade-off though; if you managed your own Cassandra cluster though, Cassandra does all this and much more for you automatically; but then you just put your time into Cassandra administration instead of developing the logic around DynamoDB (or SimpleDB, or MySQL, or whatever); just pick which devil you feel more comfortable with.<p>I am not aware of a services company yet that offers cross-region AWS datastore deployments yet; Datastax and Iris Couch will setup things for that like you via a consulting/custom arrangement, but there isn't a dashboard for launching something like that automatically.<p>Hope that helped (and didn't bring you to tears of boredom)