科技回声

My company is using a central monitoring tool (Datadog) and the rule is that everything should be monitored there.However, the landscape for which I'm responsible has it's own monitoring tool which comes with a lot of out-of-the-box settings and advanced options including dedicated applications for different kind of monitoring (health status, performance, end-2-end, etc...). It also offers an OpenTelemetry API that can be called by the central monitoring tool to fetch all the data.If I use the dedicated monitoring tool, I benefit from a lot of options like filtering, analytics, drill-down, seamless navigation, almost ready E2E monitroing, etc... I only have to configure some thresholds and decide if they trigger an alert, create a ticket or start some automation. But I'm told that everything should be visible in the central tool and also only this one can create incident tickets and alerts for the 1st line support.If the central monitoring is to be used, then I basically have to manually replicate configuration/code to process the OpenTelemetry data. I also lose a lot of flexibility because I'm not the owner of the tool and the team responsible for it doesn't understand my landscape and doesn't react fast to any change I require.All in all I would still be using the dedicated tool to investigate the issues, because it provides much more detailed info with near-zero effort. Therefore the only benefit of the central tool is that 1st line support would see the status in their dashboard and also would have a bit more understanding of the tickets they get since they link history of tickets and resolution outcomes to their monitors.I don't want to go rogue monitoring my landscape and I also benefit from 1st line support having a bit of awareness of the landscape. But besides of that I would like to use the dedicated tool.Do you have an idea on how to better combine both options? My first idea is to aggregate the monitoring for the central tool to be much less granular and just detect something like "There is an issue with Health Monitoring for the system XXX". While the dedicated tool would provide the details like "Certificate YYY of system XXX is going to expire in 15 days".However I must be granular enough to control priorities and ensure the alert is sent to the correct support team. This already forces me to start reworking things that I have readily available when setting the threshold in my dedicated tool.

1 comment

fhwang11 个月前

How is the data getting into these tools in the first place? If your applications are instrumented with OpenTelemetry, you can use the OpenTelemetry Collector as the first hop, and then you might be able to send the data to both tools at the same time.<a href="https://opentelemetry.io/docs/collector/" rel="nofollow">https://opentelemetry.io/docs/collector/</a>

评论 #40809989 未加载

1 comment

fhwang11 个月前

评论 #40809989 未加载

Ask HN: Combine central monitoring with platform embedded monitoring

1 comment

Ask HN: Combine central monitoring with platform embedded monitoring

1 comment