Uh, yet another collector/grapher. That's nice but..<p>We have <i>tons</i> of collectors. And tons of graphers. What we have not is a little bit of smarts in that tools. Ability to predict and ability to react.<p>Predict. We have Holt-Winters Forecasting Algorithm implemented in RRDTool from 2005 and a couple of papers.<p>React. I'm not talking about 'fix it automagically'. But everyone wants to know 'wtf was that peak on this graph last night?'. Usually your never know, except the simplest cases. Because you cannot collect everything about everything all the time. But monitoring system could enable 'collect everything we can' for short period of time when it detects <i>something</i>. Something wrong or something <i>strange</i>, something out of the pattern. Does anybody hear about system with something like that?
Having implemented similar solutions, it's clear to me that developers did their homework and designed accordingly. I find myself agreeing with almost every decision I could see.
- Go: light, no dependencies. This is key. If you ever deployed something in a non homogeneous environment with 100s/1000s of servers, you'd know the pain.<p>- Plugin system: Only way to scale the development of the solution<p>- Lua for plugins: Yes! Language is not important, but not having to stop and restart the application for changes in logic, etc. is essential.<p>- Routing. Sounds great, can't wait to take a deeper look.<p>Kudos to devs. Nicely done!
I've been experimenting with a different metrics toolchain of shh + log-shuttle + l2met recently (also written in Go):<p><a href="https://github.com/freeformz/shh" rel="nofollow">https://github.com/freeformz/shh</a><p><a href="https://github.com/ryandotsmith/log-shuttle" rel="nofollow">https://github.com/ryandotsmith/log-shuttle</a><p><a href="https://github.com/ryandotsmith/l2met" rel="nofollow">https://github.com/ryandotsmith/l2met</a><p>shh can be extended with custom pollers written in Go, but focuses on collecting system-level metrics. log-shuttle is a general-purpose tool for shipping logs over HTTP. l2met receives logs over HTTP and can be extended with custom outlets written in Go, but requires log statements in a specific format ("measure.db.latency=20" or "measure=db.latency val=20").<p>It's great to see so many new tools in this space. Previously I had a bunch of one-off "carbonize" scripts running out of cron, each collecting a specific kind of metric and sending it to Graphite or statsd. This worked OK but required quite a bit of code to get things done. Heka's plugin system looks like a nice way to structure things.
Very interesting. Does this fit in conceptually with circus at all? It seems like there's a fair amount of overlap between the process/HTTP management done by circus and this stats/data collection/analysis (specifically hekad agent in the architecture diagram):
<a href="http://heka-docs.readthedocs.org/en/latest/architecture/index.html" rel="nofollow">http://heka-docs.readthedocs.org/en/latest/architecture/inde...</a><p>I'm curious if Mozilla is using these two tools in combination internally, and what that architecture looks like.<p><a href="https://github.com/mozilla-services/circus" rel="nofollow">https://github.com/mozilla-services/circus</a>
Off the top of my head this is a reimplementation of the following
* SNMP
* CollectD
* Carbon
* JMX
* WMI
* CMIP<p>And a whole host of other proprietary transports. So its cool and looks awesome, but what does it give me that the entirety of other monitoring protocols doesn't
i see it more as a syslog replacement. It does a lot more than syslog of course, but tit doesnt do what "collectd" and whatever else does. Heka seems to "just" do logging/routing/etc and be extremely fast and reliable doing so. And has no dependencies/small footprint.<p>Which is what syslog can't do.