It's kinda amazing that only now have I discovered how useful time series forecasting is for sysadmin/devops/SRE/whatever we call it now.<p>In effect, Icinga (Nagios) represents a local maxima. It's very easy to write checks without mathematical or statistical sophistication -- write a check in the language of your choice and return a status enum containing one of three values: OK, WARN, CRIT. A result there's a massive collection of checks written by others. Sure the language to configure it sucks, but mostly because it models so many things -- systems to be monitored, how to group systems, the checks to monitor systems with, who to alert, and on what schedule.<p>Instead of doing one thing well, Nagios does everything, kinda poorly. You can't easily schedule overrides for an on-call rotation. Event correlation is entirely manual. You have to restart the service to add or remove monitoring -- it's not prepared for autoscaling or clustering. It can't even scale itself!<p>There is a better way, using a layered approach. Break the task up into multiple steps: Sense, Store, Analzye, Alert, Route. Nagios effectively discards the Store step, and divy up the remaining work between checks and nagios master. Sense+Analyze+Alert are done by the client, and the Nagios master handles alert Routing.<p>(taking it home to the topic) Because Nagios has no Store step, analyze is limited in power. You cannot calculate standard deviations, or even rates of change. ARIMA is impossible, as is ETS. You typically define static thresholds. If there's any seasonality, you have to find a static threshold that catches real problems while minimizing false alarms.<p>The downside to the layered model is that there are many different solutions, and every decision point leads to a more fragmented community. How many people out there run statsite+graphite+grafana+opsgenie? A good layered approach needs to fully isolate layers, so that any integration feature or problem is limited to pairings. Usually easy for features, but for bugs, it's almost definitionally impossible to prevent a problem in an earlier layer from causing symptoms or or two layers later.<p>tl;dr: I've been mostly learning about ETS (holtwinters) on the job, but would love to learn more about predicting customer traffic.