Looks like another faster horse. A pretty GUI on /proc is not the most burning issue to solve in Linux performance monitoring. I wish anyone making these tools would spend 30 minutes watching my Monitorama talk about instance monitoring requirements at Netflix: <a href="http://www.brendangregg.com/blog/2015-06-23/netflix-instance-analysis-requirements.html" rel="nofollow">http://www.brendangregg.com/blog/2015-06-23/netflix-instance...</a> . I still hate gauges.<p>Where is the PMC support? At Facebook a few days ago, they said their number one issue was memory bandwidth. Try analyzing that without PMCs. You can't. And that's their number one issue. And it shouldn't be a surprise that you need PMC access to have a decent Linux monitoring/analysis tool. If that's a surprise to you, you're creating a tool without actual performance expertise.<p>Should front BPF tracing as well... Maybe it will in the future and I can check again.
Don't use the red-green combination in charts as it makes it really hard to read for those of us with a degree of red-green color blindness (which is the most common type in the ~5% of the male and ~1% female population that has it).<p>Other than that it looks AWESOME.
Well, it's pretty. It's probably great if you have one to five machines you care about, or you really want a pretty dashboard.<p>Notable features that I would need all relate to multi-server usage:<p>- central config across hosts<p>- alerting when values go over or under thresholds<p>- a mode for automatically selecting and viewing the machines which are working hardest, or not working<p>- a mode for viewing of a few stats across all machines<p>- a mode for slide-show viewing of a few stats across all machines
<a href="https://github.com/firehol/netdata/wiki/Installation#nodejs" rel="nofollow">https://github.com/firehol/netdata/wiki/Installation#nodejs</a><p>> I believe the future of data collectors is node.js<p>:(
Is there such a thing as 95% threshold CPU monitoring?<p>Consider an application spikes (close to 100% on a core) for 2-3s on some web requests -- let's assume this is normal (nothing can be done about it). Now, let's consider the average user of the system is idle for 2 minutes per web request. So, users won't see performance degradation unless $(active-users) > $(cores) during a 2-3 minute window.<p>For most monitoring systems, CPU is reported as an average over a minute, and, even if it's pinned only 2-3s per 60s, that's only 5% usage. Presume a 2 CPU system with 5 users, who all happen to be in a conference call... and hitting the system at exactly the same time (but are otherwise mostly idle). The CPU graph might show 10-15% usage (no flag). Yet, those 5 users will report significant application performance issues (one of the users will have to wait 6-9s).<p>What I'd like to monitor, as a system administration, is the 95% utilization of the CPUs -- that is, over the minute, throw away the bottom 94% (mostly idle cycles) and report to me the CPU utilization of the next highest percentile. This should show me those pesky CPU spikes. Anything do that?
Gave it a try. Definitely not useful for running the daemon and view the UI on the same machine. Chrome at least eats 50% of one of the cores to show the realtime data.<p>On my RPi B, the daemon eats 4% average on all four cores, with almost all the time spent in the kernel. I assume polling the various entries under /proc/ is costly.
The dashboard is gorgeous, one of the prettiest I've ever seen.<p>But I wish it were a Riemann/Graphite/whatever dashboard instead of reimplementing its own data collection system.<p>There is a need for great dashboards, but I don't feel any need for yet another format of data collection plugins.
Interesting! Really gorgeously rendered dashboards.<p>But also weird. The fact that both the collectors, the storage <i>and</i> the UI runs on each box makes this more like a small-scale replacement for top and assorted command-line tools such as iostat than for a scalable, distributed monitoring system. Lack of central collection means you cannot get a cluster-wide view of a given metric, nor can you easily build alerting into this.<p>I'm also disappointed that it reimplements a lot of collectors that already exists in mature projects like Collectd and Diamond (and, more recently, Influx's Telegraf). I understand that fewer external dependencies can be useful, but still, does every monitoring tool really need to write its own collector for reading CPU usage? You'd think there would be some standardization by now.<p>For comparison, we use Prometheus + Grafana + a lot of custom metrics collectors. Grafana is less than stellar, though. I'd love to have this UI on top of Prometheus.
At the moment (like, literally now, just took a break and saw this), I am configuring graphite + collectd + grafana (and probably cabot on top for alerts), using ansible to set up collectd and sync the configuration across the nodes.<p>After some time of using graphite + statsd and friends, I came to really appreciate the benefits of using widely adopted open source components and the flexibility it gives over all-in-one solutions such as this. On the other hand, solutions like this are much easier to configure, especially the first time when you are not familiar with the tools yet.
It's great that they've got all that explanatory prose for the metrics. That would help when reviewing data with other team members who aren't familiar with the context of each of these.<p>I have less of a realtime system review need than a post-mortem need. Today, I'll use kSar to do that, but this tool looks much more capable.<p>It's too bad that it doesn't provide an init script or other startup feature. The installer, while it doesn't seem to follow typical distribution patterns, is otherwise fairly complete.
Did some spot checking. Found a race condition in the dictionary code in less than five minutes of poking around. Ugh.<p>Edit: code to add an entry to the dictionary releases its lock, whereupon you can wind up with duplicate NV pairs.
It would be nice if it could show the processes that were running at the time of a peak in the graph.<p>Also, it would be nice if this could be run over multiple machines and show combined results.<p>Further, it appears that this tool shows information that other tools currently do not show. Perhaps nice if this tool allowed scripting and/or a CLI.
netdata is perfect for single server monitoring it's perfectly suited to integration into my Centmin Mod LEMP stack installer <a href="https://community.centminmod.com/threads/addons-netdata-sh-new-system-monitor-addon.7022/" rel="nofollow">https://community.centminmod.com/threads/addons-netdata-sh-n...</a>.<p>For folks wanting multiple servers, the wiki does mention that i believe at <a href="https://github.com/firehol/netdata/wiki#how-it-works" rel="nofollow">https://github.com/firehol/netdata/wiki#how-it-works</a>
Impressive. The dashboard can be a bit condensed though, put all details on one page is a little overwhelming, maybe have some tabs(cpu,memory,disk,network,etc)?
nmon gives you console beauty without external dependencies. You can watch it in console mode and cron schedule it in batch mode for long-term data collection.