I'm researching the best ways to collect the execution duration of containers/serverless workloads for billing and chargeback use cases. Per second granularity, like AWS CPU-Hours. I'm finding that most engineers' journeys start with listening to start/stop events and calculating the delta between those lifecycle events. But over time, they find it challenging to handle lost events or long-running processes. It's also not always fair to customers when a container takes longer to start (cold start) or stop.<p>So far, I have heard two primary ways: 1. Lifecycle events like start/stop, 2. Heartbeat-style pinging.
I summarized them in this blog post: https://openmeter.io/blog/how-to-meter-workload-execution-duration<p>I would love to iterate on the post based on your thoughts and include other ways if any.
How do you meter container/serverless execution duration for billing?