TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Zebrium – ML that catches software incidents and shows you root cause

76 pointsby stochastimusalmost 5 years ago

12 comments

csearsalmost 5 years ago
Congrats on the launch. Having worked at a startup in the AIOps space, I can can offer a few suggestions.<p>1. No matter how good your AI, it will make mistakes. Users need ways to provide feedback or filtering to avoid bad alert fatigue. Giving users a sense of control is critical.<p>2. Most larger shops will have dozens of monitoring tools already generating alerts. Consider ingesting existing alerts as another algorithmic signal.<p>3. The real root cause of an incident often won&#x27;t show up in logs. Don&#x27;t assume that the earliest event in a cluster is causal.<p>4. The more context you can provide an operator looking at a potential incident, the better. Modern SEIM tools do an ok job here. Consider pulling in topology or other enrichment sources and matching entity names&#x2F;IDs to log data.<p>Good luck. Contact in profile if you&#x27;d like to chat further.
评论 #23491418 未加载
stochastimusalmost 5 years ago
Hey folks,<p>Larry, Ajay and Rod here!<p>We&#x27;re excited to share Zebrium&#x27;s autonomous incident detection software. Zebrium uses unsupervised machine learning to detect software incidents and show you root cause. It&#x27;s built to catch even &quot;Unknown Unknowns&quot; (problems you don&#x27;t have alert rules built for), the FIRST time you hit them. We believe autonomous incident detection is a important tool for defeating complexity and crushing resolution time.<p><i></i>* Get Started <i></i>*<p>1) Go to our website and click &quot;Get Started Free&quot;. Enter your name, email and set a password. 2) Install our collectors from a list of supported platforms. For K8s it&#x27;s a one command install. Join the newly created private Slack channel for alerts (or add a webhook for your own) 3) That&#x27;s it. Automatic incident detection starts within an hour and quickly gets good. You can drill down into logs &amp; metrics for more context if needed.<p>Getting started takes less than 2 minutes. It&#x27;s free for 30 days with larger limits and then free forever for up to 500MB&#x2F;day.<p><i></i>* Here&#x27;s what you WON&#x27;T have to do <i></i>*<p>Manual training, code changes, connectors, parsers, configuration, waiting, hunting, searching, alert rules, etc! It works with any app or stack.<p><i></i>* How It Works <i></i>*<p>We structure all the logs and metrics we collect in-line at ingest, leverage this structure to find normal and anomalous patterns of activity, then use a point process model to identify unusually correlated anomalies (across different data streams) to auto-detect incidents and find the relevant root-cause indicators. Experience with over a thousand real-world incidents across over a hundred stacks has confirmed that software behaves in certain fundamental ways when it breaks.<p>It turns out that we can detect important incidents automatically, with a root-cause indicator when the logs and metrics reflect it. Zebrium works well enough that our own team relies on it, and we believe you&#x27;ll want to use it, too.
评论 #23491017 未加载
评论 #23492927 未加载
lalaland1125almost 5 years ago
One question:<p>Where is the systematic evidence that this product actually works? What&#x27;s the general false positive and false negative rates in standard setup? Did you construct various failed environments and measure the quality of the reports? For this sort of thing I would expect a simulation of at least 10-20 failure environments with detailed false positive&#x2F;false negative rate measurements. Right now you have a lot of cherry picked examples without any sort of systematic setup (in particular, you don&#x27;t seem to talk about false positives anywhere).
评论 #23493123 未加载
dgildehalmost 5 years ago
As a founder in the monitoring space, and now heading up the core monitoring team at Netflix, I had a chance to work with Zebrium and have to say the technology is impressive. Unlike other anomaly detection services, they&#x27;ve done a lot of work to get decent incidents without too much noise completely unsupervised - this is definitely the next generation of observability and Zebrium has a clear head start in this space!
samdungalmost 5 years ago
Just ran through your intro video. If it does really what it says, this is a great product. I&#x27;ll have my team test this tomorrow. Good luck on your launch.
评论 #23490788 未加载
robiusalmost 5 years ago
This is a game changer. I&#x27;ve met the team and they&#x27;ve got something special here.<p>You can see one of their talks and a great discussion at a BayLISA.org meeting.<p><a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=gNiWtoxJ9iM" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=gNiWtoxJ9iM</a>
paridisoalmost 5 years ago
Cool! How does your software compare to other similar tools like BigPanda, Moogsoft, Splunk ITSI?
评论 #23501443 未加载
zumachasealmost 5 years ago
Very cool, would love something like this. Your video gives a fairly straightforward incident response which traditional tools would work equally well on. Can you describe a situation that Zebrium does better than legacy tools? Perhaps a hypothetical unknown unknown.
评论 #23492380 未加载
评论 #23491149 未加载
firefly77almost 5 years ago
Nice website, folks. The 2-minute intro video does a great job presenting the value-prop. It looks like the solution detects events with a high probability of being a problem automatically via ML. Can I define my own events using custom condition criteria as well?
评论 #23491476 未加载
forgingaheadalmost 5 years ago
Congrats on the launch, and good luck! Looks fascinating. Looking forward to the future release that <i>fixes</i> the incidents as well, and just notifies us afterwards as a courtesy. =)
评论 #23491499 未加载
gingerlimealmost 5 years ago
Looks really promising. Congrats on the launch. Any plans to integrate with Datadog? (or just make the transition &#x2F; co-existence easier)
评论 #23493144 未加载
sekka1almost 5 years ago
Had them at my meetup yesterday and they presented. Super interesting tool. Zero config!