Hi everyone - I've been playing with a new Chaos Engine from Maya Data called Litmus to create incidents in a Kubernetes cluster for an Autonomous monitoring solution to detect.<p>The purpose of this repository is to build a realistic app environment running multiple services on a Kubernetes cluster. And then run a series of chaos experiments to see if an Autonomous Monitoring solution (without any pre-configuration) can automatically detect any incidents caused by the chaos experiments.<p>For those of you wondering what Autonomous monitoring is - the key difference with current monitoring tools, is instead of you having to tell the monitoring tool what to look for through setting up and maintaining a long list of alert rules, the monitoring tool figures out what to alert on using Machine Learning. You just send your logs/metrics/traces to it and it figures out when an incident is occurring and its root cause without you having to tell it (configure) anything! This approach is becoming more important as environments become more distributed and complex and dynamic making it harder to know what to alert on.<p>You can get it up and running in just 2 commands to spin up the cluster and run all the Chaos experiments end to end. It provides a good example of running Chaos Experiments in Kubernetes, and also demonstrates where state of the art machine learning has got to in the monitoring space today!
Thanks for this. It's pretty smart of Zebrium IMO to reach out to and embrace chaos from Litmus to show off and I guess educate your smart monitoring.