Great story, and very recognizable.<p>Also many thanks for mentioning Checkly[0]. I can totally see why you guys opted for a more APM driven solution. In the end, you probably want white box and black box monitoring.<p>Disclaimer: I am the founder of Checkly.<p>0: <a href="https://checklyhq.com" rel="nofollow">https://checklyhq.com</a>
Observing synthetic endpoints is only going to give you an indication of whether or not your service is online and available - what if you deploy a change that breaks the endpoints your customers rely on, or some deep dependency starts failing (3rd party api, database, whatever).<p>You really need to look into distributed tracing solutions and instrument the endpoints which your customers rely on - and measure the response time, codes, and availability of those.<p>I work for an APM company, and that's what we do at instana.com -- what you've built here is cool, and it's inspiring, but it's not going to help anybody if your customers are getting 500's from their calls, but your "health check" is sending back 200's.
If you are exposing an API that is going to "double your revenue" as the article says, why wouldn't you use something like the AWS API Gateway [0] or Google Cloud Endpoints [1]? What were they doing here, running a paid service through an nginx reverse proxy?<p>[0] - <a href="https://aws.amazon.com/api-gateway/" rel="nofollow">https://aws.amazon.com/api-gateway/</a><p>[1] - <a href="https://cloud.google.com/endpoints/" rel="nofollow">https://cloud.google.com/endpoints/</a>
Maybe Moesif could be worth a look too:<p><a href="https://www.moesif.com/features/api-monitoring" rel="nofollow">https://www.moesif.com/features/api-monitoring</a><p>(Disclaimer: I blog for them)
I'm a huge fan of Datadog for this sort of thing. I'm not affiliated with them in any way, but they have the ability to do all the grafana / logstash / splunk stuff and are making lots of progress into the APM space with OpenAPM.