Ask HN: Can ML/AI make system monitoring better?

7 pointsby ashtavakraover 7 years ago

If someone with enough experience in ML/AI were to build an application that aims to 1) Reduce the overall number of alerts 2) Figure out which alerts are actionable and which are not 3) Predict potential issues and suggest remedies - would he/she succeed? The question is to find whether someone can actually achieve this with ML/AI in its current state today. If something like this was possible is it safe to assume that engineering teams at Google, Amazon, Uber, AirBnB, DropBox, NetFlix would have already implemented this?I read about two different views on this subject today1. Why Machine Learning in Monitoring is BS: https://blog.opsee.com/machine-learning-in-monitoring-is-bs-134e362faee22. Why it is not BS: http://mabrek.github.io/blog/machine-learning-is-not-bs/I was curious to find what good people on HN think about this.

4 comments

PaulHouleover 7 years ago

It is a tough problem no doubt, largely because of the unbalanced sample size.People who work on anomaly detection in finance (anti-fraud) usually look at it as a "characterize a normal transaction and reject anything that is far from the center" problem as opposed to a "classify transactions as good or bad" problem.Would someone succeed or fail? I think it could go either way. Are you talking about the general problem or the problem for some particular environment? (ex. Netflix certainly does not need to solve the general problem)

solomatovover 7 years ago

From what I know, tt would work.Google uses DNN to optimize power consumption in their data centers: <a href="https://deepmind.com/blog/deepmind-ai-reduces-google-data-centre-cooling-bill-40/" rel="nofollow">https://deepmind.com/blog/deepmind-ai-reduces-google-data-ce...</a>

chamodaover 7 years ago

Anomaly detection is one use case. Unsupervised learning methods can put to good use of analysing large logs utilizing abnormal cpu usage, user activity, web traffic etc.

thevivekpandeyover 7 years ago

I believe it is not BS. However, currently there are not good tools which solve monitoring using AI in an elegant way.