Are there any dependency graphs for cloud services? This seems to be pretty important info for determining how a system will degrade. For example, if Google Cloud Storage is down, will Google Artifact Registry go down? and if GAR is down, will Cloud Build be down? I have heard that there are "tiers" of services, with services in a given tier only using services from lower tiers, but I haven't been able to find any info on this, even unofficial blog posts or third party analysis. Does anyone know where I can find this info? Or have tips for figuring it out myself?
Forseti is an open-source project to build dependency graphs of your usage of GCP services. It's primarily designed for security but can be applied into other areas. You could extend the model to understand relationships between services and SLAs, but this would be limited to how you design and run services on top of GCP.<p><a href="https://forsetisecurity.org/docs/latest/concepts/" rel="nofollow">https://forsetisecurity.org/docs/latest/concepts/</a><p>To understand the underlying design of how each GCP service relates to each other is complex and definitely not available to the public. There is also a huge amount of nuance between GCP services relying on underlying Google services vs. other GCP services. Container Registry and Artifact Registry may both depend on the same underlying storage service, which isn't necessarily GCS, but could be an internal Google storage service. How this is specifically managed, partitioned and run is very hard to extract. Failure modes and scenarios are well designed and understood internally, but not shared publicly.<p>If you had a very specific use case you could approach your Google Cloud TAM/sales/customer engineer with the questions and they will be able to help you understand.<p>Source: Former Customer Engineer in Google Cloud for 4 years
I'm not sure even the cloud providers themselves could give you this information.<p>In Nov 2020, AWS Kinesis Firehose went down for a few hours and took down a slew of other services that depended on each other (Cloudwatch depends on kinesis, ec2 autoscaling and lambda depend on cloudwatch, everyone depends on ec2 and lambda...)<p>This was all sort of a large surprise internally that a "small" component like kinesis streams could take down so much.<p><a href="https://aws.amazon.com/message/11201/" rel="nofollow">https://aws.amazon.com/message/11201/</a>
For GCP you can enumerate and graph the publicly visible dependencies as per this blog post:<p><a href="https://binx.io/blog/2020/10/03/how-to-find-google-cloud-platform-services-dependencies/" rel="nofollow">https://binx.io/blog/2020/10/03/how-to-find-google-cloud-pla...</a><p>However, that does not take account of GCP services being implemented behind the scenes using other GCP technologies in Google-managed projects - e.g. Cloud SQL uses Compute Engine and GCR (search "speckle umbrella"). Cloud Functions relies on Cloud Build to compile the function into a container. AI Platform Training uses a GKE cluster internally.<p>You can often get hints about these things from the VPC-SC documentation, which explains on a per-service basis which APIs need to be enabled to protect the perimeter:<p><a href="https://cloud.google.com/vpc-service-controls/docs/supported-products" rel="nofollow">https://cloud.google.com/vpc-service-controls/docs/supported...</a>
You will never have information about it, even with account manager provided NDAs with cloud providers. And it's a surprise, like Kinesis took a lot of services with it, I would not imagine it. And Facebook disappeared from internet with BGP misconfiguration. My personal experience is that most AWS outages effect one service, one region, but in GCP there is more global outages. But AWS has fair share of some global outages themselves like Kinesis and S3 back then.
Microsoft Azure has a service called AppMap (Application Map) that does this for services running inside Azure : <a href="https://docs.microsoft.com/en-us/azure/azure-monitor/app/app-map" rel="nofollow">https://docs.microsoft.com/en-us/azure/azure-monitor/app/app...</a>
This information is confidential and only provided under NDA by cloud service providers. In the past, my employer was provided with detailed service dependency documents upon request. But unless you’re a major customer spending millions of dollar every month, you likely can’t request this information.
this is pure blatant self-promotion on my end, but I think we built - at least directionally - what you’re asking for:<p><a href="https://github.com/someengineering/cloudkeeper" rel="nofollow">https://github.com/someengineering/cloudkeeper</a><p>I’ll reply more in depth since I’m on the run right now, but for now I hope the link is sufficient.