AppEngine had the same problems - seemingly every week some component of the service would be down for some non-negligible amount of time (laughably it was often search -- we're talking about Google here).<p>I've generally found AWS more reliable than GCP - even when GCP isn't having downtime, you'll occasionally get 503's from their APIs, so you need to wrap all your calls to them in retries.<p>AWS has had multiple instances of cascading EBS backplane failures, but outside of that I've found their core services pretty reliable -- 400+ days of uptime on a lot of VMs in systems I've worked on -- I avoid EBS when I can.<p>My advice is to keep your stuff simple - PaaS might seem attractive, but you have so little control as you mention when something goes down. Embrace multi-cloud by using the lowest common denominator of tech available - virtual machines, dns, networking, and instance storage if that suits your needs. Treat vms as disposable - and make sure you have system, service, and data redundancy at that level to survive the failure of an entire availability zone across your application.