The costs of observability/monitoring are obviously primarily the cost of the labor to add/upkeep these tools and the standard costs of running them.<p>While the costs are more clearly defined, the benefits of these tools is much less clear.<p>I'm currently building an open source continuous profiling library (https://github.com/pyroscope-io/pyroscope) and trying to understand more about how people justify adding another observability/monitoring tool to their workflow.<p>For some it seems like there is some sort of quantitative cost-to-benefit analysis where ultimately a tool that produces net benefit should be added to the system.<p>For others, it seems the justification is less quantifiable in that some companies just err on the side of anything that produces a "better" understanding of underlying infrastructure should be added to the system.<p>How do you justify adding or not adding tools (i.e. logs, tracing, profiling, metrics, etc.) that add more visibility into your systems? And how do you determine how much you're willing to pay when weighing the costs vs the benefits?
The importance of monitoring depends on your needs.<p>Do you need to know if your website or service goes down?<p>If your website or service went down how soon would you want to know?<p>Do you need to know if your server's CPU is running at 90%+ for more than 10 minutes?<p>Do you need to know if your server's disk usage has passed 90%?<p>The right monitoring tools check for these kinds of conditions and can alert you as soon as problems occur or when problems are about to occur.<p>Whether they are worth the cost or not depends on your needs.