TechEcho

12 comments

csearsalmost 5 years ago

Congrats on the launch. Having worked at a startup in the AIOps space, I can can offer a few suggestions.1. No matter how good your AI, it will make mistakes. Users need ways to provide feedback or filtering to avoid bad alert fatigue. Giving users a sense of control is critical.2. Most larger shops will have dozens of monitoring tools already generating alerts. Consider ingesting existing alerts as another algorithmic signal.3. The real root cause of an incident often won't show up in logs. Don't assume that the earliest event in a cluster is causal.4. The more context you can provide an operator looking at a potential incident, the better. Modern SEIM tools do an ok job here. Consider pulling in topology or other enrichment sources and matching entity names/IDs to log data.Good luck. Contact in profile if you'd like to chat further.

评论 #23491418 未加载

stochastimusalmost 5 years ago

Hey folks,Larry, Ajay and Rod here!We're excited to share Zebrium's autonomous incident detection software. Zebrium uses unsupervised machine learning to detect software incidents and show you root cause. It's built to catch even "Unknown Unknowns" (problems you don't have alert rules built for), the FIRST time you hit them. We believe autonomous incident detection is a important tool for defeating complexity and crushing resolution time.* Get Started *1) Go to our website and click "Get Started Free". Enter your name, email and set a password. 2) Install our collectors from a list of supported platforms. For K8s it's a one command install. Join the newly created private Slack channel for alerts (or add a webhook for your own) 3) That's it. Automatic incident detection starts within an hour and quickly gets good. You can drill down into logs & metrics for more context if needed.Getting started takes less than 2 minutes. It's free for 30 days with larger limits and then free forever for up to 500MB/day.* Here's what you WON'T have to do *Manual training, code changes, connectors, parsers, configuration, waiting, hunting, searching, alert rules, etc! It works with any app or stack.* How It Works *We structure all the logs and metrics we collect in-line at ingest, leverage this structure to find normal and anomalous patterns of activity, then use a point process model to identify unusually correlated anomalies (across different data streams) to auto-detect incidents and find the relevant root-cause indicators. Experience with over a thousand real-world incidents across over a hundred stacks has confirmed that software behaves in certain fundamental ways when it breaks.It turns out that we can detect important incidents automatically, with a root-cause indicator when the logs and metrics reflect it. Zebrium works well enough that our own team relies on it, and we believe you'll want to use it, too.

评论 #23491017 未加载

评论 #23492927 未加载

lalaland1125almost 5 years ago

One question:Where is the systematic evidence that this product actually works? What's the general false positive and false negative rates in standard setup? Did you construct various failed environments and measure the quality of the reports? For this sort of thing I would expect a simulation of at least 10-20 failure environments with detailed false positive/false negative rate measurements. Right now you have a lot of cherry picked examples without any sort of systematic setup (in particular, you don't seem to talk about false positives anywhere).

评论 #23493123 未加载

dgildehalmost 5 years ago

As a founder in the monitoring space, and now heading up the core monitoring team at Netflix, I had a chance to work with Zebrium and have to say the technology is impressive. Unlike other anomaly detection services, they've done a lot of work to get decent incidents without too much noise completely unsupervised - this is definitely the next generation of observability and Zebrium has a clear head start in this space!

samdungalmost 5 years ago

Just ran through your intro video. If it does really what it says, this is a great product. I'll have my team test this tomorrow. Good luck on your launch.

评论 #23490788 未加载

robiusalmost 5 years ago

This is a game changer. I've met the team and they've got something special here.You can see one of their talks and a great discussion at a BayLISA.org meeting.<a href="https://www.youtube.com/watch?v=gNiWtoxJ9iM" rel="nofollow">https://www.youtube.com/watch?v=gNiWtoxJ9iM</a>

paridisoalmost 5 years ago

Cool! How does your software compare to other similar tools like BigPanda, Moogsoft, Splunk ITSI?

评论 #23501443 未加载

zumachasealmost 5 years ago

Very cool, would love something like this. Your video gives a fairly straightforward incident response which traditional tools would work equally well on. Can you describe a situation that Zebrium does better than legacy tools? Perhaps a hypothetical unknown unknown.

评论 #23492380 未加载

评论 #23491149 未加载

firefly77almost 5 years ago

Nice website, folks. The 2-minute intro video does a great job presenting the value-prop. It looks like the solution detects events with a high probability of being a problem automatically via ML. Can I define my own events using custom condition criteria as well?

评论 #23491476 未加载

forgingaheadalmost 5 years ago

Congrats on the launch, and good luck! Looks fascinating. Looking forward to the future release that fixes the incidents as well, and just notifies us afterwards as a courtesy. =)

评论 #23491499 未加载

gingerlimealmost 5 years ago

Looks really promising. Congrats on the launch. Any plans to integrate with Datadog? (or just make the transition / co-existence easier)

评论 #23493144 未加载

sekka1almost 5 years ago

Had them at my meetup yesterday and they presented. Super interesting tool. Zero config!

12 comments

csearsalmost 5 years ago

评论 #23491418 未加载

stochastimusalmost 5 years ago

评论 #23491017 未加载

评论 #23492927 未加载

lalaland1125almost 5 years ago

评论 #23493123 未加载

dgildehalmost 5 years ago

samdungalmost 5 years ago

Just ran through your intro video. If it does really what it says, this is a great product. I'll have my team test this tomorrow. Good luck on your launch.

评论 #23490788 未加载

robiusalmost 5 years ago

paridisoalmost 5 years ago

Cool! How does your software compare to other similar tools like BigPanda, Moogsoft, Splunk ITSI?

评论 #23501443 未加载

zumachasealmost 5 years ago

评论 #23492380 未加载

评论 #23491149 未加载

firefly77almost 5 years ago

评论 #23491476 未加载

forgingaheadalmost 5 years ago

Congrats on the launch, and good luck! Looks fascinating. Looking forward to the future release that fixes the incidents as well, and just notifies us afterwards as a courtesy. =)

评论 #23491499 未加载

gingerlimealmost 5 years ago

Looks really promising. Congrats on the launch. Any plans to integrate with Datadog? (or just make the transition / co-existence easier)

评论 #23493144 未加载

sekka1almost 5 years ago

Had them at my meetup yesterday and they presented. Super interesting tool. Zero config!

Show HN: Zebrium – ML that catches software incidents and shows you root cause

12 comments

Show HN: Zebrium – ML that catches software incidents and shows you root cause

12 comments