Poor man's architecture for mitigating outages: Avoid us-east-1 (N Virginia). It unequivocally is the fail-whale of AWS regions. Every service there seems to have abnormally high usage numbers and absurd amount of scale to deal with. As James Hamilton would like to remind us, at such scale, even rare events are frequent [0]. The curse of being the default region, I suppose?<p>us-east-2 (Ohio) and eu-west-1 (Dublin) are my go-to regions. Prices are the same and most new services (and new features) are almost always ready to go on launch days.<p>[0] <a href="https://perspectives.mvdirona.com/2017/04/at-scale-rare-events-arent-rare/" rel="nofollow">https://perspectives.mvdirona.com/2017/04/at-scale-rare-even...</a>
This is kind of... fluffy for an HN post. It makes no mention of distributing your application between availability zones as a first step, swats away at the issues involved in any of the suggestions, then ends with “just configure what you need to configure for RDS and S3 replication and restore the rest by hand from backup.”
As an SRE, we accept that different regions will fail every now and then. So how to mitigate them is the big question.<p>I think having a database replicated to 2 regions and load balance all traffic to both region has a big impact on performance and read after write consistency
The solution is what the last option suggests; the second region is prepared as a fallback solution only and not as a live solution. This way, you accept that you will still have some downtime and maybe some data loss too whenever a failover takes places. But this is much better than going down for hours, and is much better than having 2 regions in Active-Active mode where your system will suffer from performance and data consistency
So accept you will have failures, work on the better solution. The perfect solution does not exist. There is no 100%
This architecture is flawed, your DNS does not seems very resiliant. Also good luck with DB replication across region and the nasty side effects you can get with out of sync data.<p>My advice is: have a well designed architecture in a single region with multiple AZ and you will cover most problems.