TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Netdata – Linux performance monitoring, done right

477 pointsby cujanovicabout 9 years ago

25 comments

brendangreggabout 9 years ago
Looks like another faster horse. A pretty GUI on &#x2F;proc is not the most burning issue to solve in Linux performance monitoring. I wish anyone making these tools would spend 30 minutes watching my Monitorama talk about instance monitoring requirements at Netflix: <a href="http:&#x2F;&#x2F;www.brendangregg.com&#x2F;blog&#x2F;2015-06-23&#x2F;netflix-instance-analysis-requirements.html" rel="nofollow">http:&#x2F;&#x2F;www.brendangregg.com&#x2F;blog&#x2F;2015-06-23&#x2F;netflix-instance...</a> . I still hate gauges.<p>Where is the PMC support? At Facebook a few days ago, they said their number one issue was memory bandwidth. Try analyzing that without PMCs. You can&#x27;t. And that&#x27;s their number one issue. And it shouldn&#x27;t be a surprise that you need PMC access to have a decent Linux monitoring&#x2F;analysis tool. If that&#x27;s a surprise to you, you&#x27;re creating a tool without actual performance expertise.<p>Should front BPF tracing as well... Maybe it will in the future and I can check again.
评论 #11392437 未加载
评论 #11391103 未加载
评论 #11391120 未加载
评论 #11392003 未加载
评论 #11393804 未加载
sputrabout 9 years ago
Don&#x27;t use the red-green combination in charts as it makes it really hard to read for those of us with a degree of red-green color blindness (which is the most common type in the ~5% of the male and ~1% female population that has it).<p>Other than that it looks AWESOME.
评论 #11389608 未加载
评论 #11393536 未加载
dsr_about 9 years ago
Well, it&#x27;s pretty. It&#x27;s probably great if you have one to five machines you care about, or you really want a pretty dashboard.<p>Notable features that I would need all relate to multi-server usage:<p>- central config across hosts<p>- alerting when values go over or under thresholds<p>- a mode for automatically selecting and viewing the machines which are working hardest, or not working<p>- a mode for viewing of a few stats across all machines<p>- a mode for slide-show viewing of a few stats across all machines
评论 #11389174 未加载
评论 #11389678 未加载
评论 #11390384 未加载
评论 #11396215 未加载
评论 #11390865 未加载
评论 #11388843 未加载
sagichmalabout 9 years ago
<a href="https:&#x2F;&#x2F;github.com&#x2F;firehol&#x2F;netdata&#x2F;wiki&#x2F;Installation#nodejs" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;firehol&#x2F;netdata&#x2F;wiki&#x2F;Installation#nodejs</a><p>&gt; I believe the future of data collectors is node.js<p>:(
评论 #11388883 未加载
评论 #11390058 未加载
clarkevansabout 9 years ago
Is there such a thing as 95% threshold CPU monitoring?<p>Consider an application spikes (close to 100% on a core) for 2-3s on some web requests -- let&#x27;s assume this is normal (nothing can be done about it). Now, let&#x27;s consider the average user of the system is idle for 2 minutes per web request. So, users won&#x27;t see performance degradation unless $(active-users) &gt; $(cores) during a 2-3 minute window.<p>For most monitoring systems, CPU is reported as an average over a minute, and, even if it&#x27;s pinned only 2-3s per 60s, that&#x27;s only 5% usage. Presume a 2 CPU system with 5 users, who all happen to be in a conference call... and hitting the system at exactly the same time (but are otherwise mostly idle). The CPU graph might show 10-15% usage (no flag). Yet, those 5 users will report significant application performance issues (one of the users will have to wait 6-9s).<p>What I&#x27;d like to monitor, as a system administration, is the 95% utilization of the CPUs -- that is, over the minute, throw away the bottom 94% (mostly idle cycles) and report to me the CPU utilization of the next highest percentile. This should show me those pesky CPU spikes. Anything do that?
评论 #11390911 未加载
评论 #11389699 未加载
评论 #11390115 未加载
oxplotabout 9 years ago
Gave it a try. Definitely not useful for running the daemon and view the UI on the same machine. Chrome at least eats 50% of one of the cores to show the realtime data.<p>On my RPi B, the daemon eats 4% average on all four cores, with almost all the time spent in the kernel. I assume polling the various entries under &#x2F;proc&#x2F; is costly.
Wilyaabout 9 years ago
The dashboard is gorgeous, one of the prettiest I&#x27;ve ever seen.<p>But I wish it were a Riemann&#x2F;Graphite&#x2F;whatever dashboard instead of reimplementing its own data collection system.<p>There is a need for great dashboards, but I don&#x27;t feel any need for yet another format of data collection plugins.
lobster_johnsonabout 9 years ago
Interesting! Really gorgeously rendered dashboards.<p>But also weird. The fact that both the collectors, the storage <i>and</i> the UI runs on each box makes this more like a small-scale replacement for top and assorted command-line tools such as iostat than for a scalable, distributed monitoring system. Lack of central collection means you cannot get a cluster-wide view of a given metric, nor can you easily build alerting into this.<p>I&#x27;m also disappointed that it reimplements a lot of collectors that already exists in mature projects like Collectd and Diamond (and, more recently, Influx&#x27;s Telegraf). I understand that fewer external dependencies can be useful, but still, does every monitoring tool really need to write its own collector for reading CPU usage? You&#x27;d think there would be some standardization by now.<p>For comparison, we use Prometheus + Grafana + a lot of custom metrics collectors. Grafana is less than stellar, though. I&#x27;d love to have this UI on top of Prometheus.
sleepyheadabout 9 years ago
Can we please stop with the &quot;done right&quot;?
glittersharkabout 9 years ago
Having a custom plugin architecture for this is a total dealbreaker. We already have statsd, why not just use that?
评论 #11392844 未加载
thesorrowabout 9 years ago
Monitoring without alerting is kinda useless. How can I aggregate multiple servers ?
评论 #11389442 未加载
评论 #11390877 未加载
评论 #11394215 未加载
gedrapabout 9 years ago
At the moment (like, literally now, just took a break and saw this), I am configuring graphite + collectd + grafana (and probably cabot on top for alerts), using ansible to set up collectd and sync the configuration across the nodes.<p>After some time of using graphite + statsd and friends, I came to really appreciate the benefits of using widely adopted open source components and the flexibility it gives over all-in-one solutions such as this. On the other hand, solutions like this are much easier to configure, especially the first time when you are not familiar with the tools yet.
wyldfireabout 9 years ago
It&#x27;s great that they&#x27;ve got all that explanatory prose for the metrics. That would help when reviewing data with other team members who aren&#x27;t familiar with the context of each of these.<p>I have less of a realtime system review need than a post-mortem need. Today, I&#x27;ll use kSar to do that, but this tool looks much more capable.<p>It&#x27;s too bad that it doesn&#x27;t provide an init script or other startup feature. The installer, while it doesn&#x27;t seem to follow typical distribution patterns, is otherwise fairly complete.
评论 #11392733 未加载
guiyeabout 9 years ago
very nice look and feel, but it&#x27;s doing http polling each second, maybe using websockets or SSE could perfom better, great work!
kabdibabout 9 years ago
Did some spot checking. Found a race condition in the dictionary code in less than five minutes of poking around. Ugh.<p>Edit: code to add an entry to the dictionary releases its lock, whereupon you can wind up with duplicate NV pairs.
评论 #11389336 未加载
评论 #11392674 未加载
评论 #11389318 未加载
ameliusabout 9 years ago
It would be nice if it could show the processes that were running at the time of a peak in the graph.<p>Also, it would be nice if this could be run over multiple machines and show combined results.<p>Further, it appears that this tool shows information that other tools currently do not show. Perhaps nice if this tool allowed scripting and&#x2F;or a CLI.
评论 #11392756 未加载
vbtechguyabout 9 years ago
netdata is perfect for single server monitoring it&#x27;s perfectly suited to integration into my Centmin Mod LEMP stack installer <a href="https:&#x2F;&#x2F;community.centminmod.com&#x2F;threads&#x2F;addons-netdata-sh-new-system-monitor-addon.7022&#x2F;" rel="nofollow">https:&#x2F;&#x2F;community.centminmod.com&#x2F;threads&#x2F;addons-netdata-sh-n...</a>.<p>For folks wanting multiple servers, the wiki does mention that i believe at <a href="https:&#x2F;&#x2F;github.com&#x2F;firehol&#x2F;netdata&#x2F;wiki#how-it-works" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;firehol&#x2F;netdata&#x2F;wiki#how-it-works</a>
ausjkeabout 9 years ago
Impressive. The dashboard can be a bit condensed though, put all details on one page is a little overwhelming, maybe have some tabs(cpu,memory,disk,network,etc)?
rodionosabout 9 years ago
nmon gives you console beauty without external dependencies. You can watch it in console mode and cron schedule it in batch mode for long-term data collection.
notinventedhearabout 9 years ago
This looks really useful, although it doesn&#x27;t seem to have a dashboard for showing the aggregated results from multiple running daemons.
评论 #11392689 未加载
brndnabout 9 years ago
Would implementing something like this on a server have any noticeable performance impact?
评论 #11388711 未加载
romanovcodeabout 9 years ago
Pretty cool, does it also auto-update itself? I also think it&#x27;s a bit cluttered.
评论 #11388676 未加载
igamaabout 9 years ago
Looks pretty cool, going to test it soon.
jjuhlabout 9 years ago
I&#x27;d recommend people to also check out SysOrb : <a href="http:&#x2F;&#x2F;sysorb.com&#x2F;" rel="nofollow">http:&#x2F;&#x2F;sysorb.com&#x2F;</a>
crudbugabout 9 years ago
+1 great work.. would love to see React port.
评论 #11388500 未加载