Show HN: Homelab Monitoring Setup with Grafana

155 点作者 conor_f将近 2 年前

17 条评论

I self host for years about 30 services, out of these 3 are vital (bitwarden, home assistant and pihole).I work in IT, I am a geek so I tried a few monitoring systems and wrote two myself.Then I realized that I have self-sustaining, 24/7 monitoring agents: wife and children.I gave up trying to have the right stack and just wait for them to yell.Seriously: it works great and it made me wonder WHY I am trying to monitor. Turns out this is more for the fun, discovery of tools than a real need at home.

评论 #36231363 未加载

评论 #36231659 未加载

评论 #36232144 未加载

sjsdaiuasgdia将近 2 年前

This confirms to me what I suspected when I was trying to determine whether to host my own Grafana stack or use the Grafana Cloud free tier - that I'd end up spending a ton of time fiddling with a constellation of services I didn't actually care about that I could spend on the projects and services I do care about.I've not found it too hard to stay within the limits of the free tier. The 10 dashboards limit is the main one that actually constrains me, but I just put more stuff on each dashboard and live with the scrolling. The free retention is not great but it's good enough for my purposes.

评论 #36227178 未加载

评论 #36231449 未加载

评论 #36227011 未加载

评论 #36227067 未加载

评论 #36226549 未加载

bovermyer将近 2 年前

I'm in the process of building out a Grafana stack (Prometheus, Loki, Tempo, Mimir, Grafana) for my day job right now....and also for one of my side projects, OSRBeyond.It's easy to get overwhelmed by all the moving pieces, but it's also a lot of _fun_ to set up.

评论 #36227598 未加载

评论 #36227622 未加载

评论 #36227080 未加载

adql将近 2 年前

I've found VictoriaMetrics all-in-one binary to be perfect size for home at the very least for metrics gathering.Supports Prometheus querying and few other formats for ingesting so any knowledge bout "how to get data into prometheus" applies pretty much 1:1 + their own vmagent is pretty advanced. Not related to company in any way, just a happy user.<a href="https://victoriametrics.com/" rel="nofollow">https://victoriametrics.com/</a>

评论 #36227343 未加载

评论 #36231850 未加载

conor_f将近 2 年前

Hey everyone, this is a post I've been working on the past few months about setting up my own monitoring stack with Grafana for my home server.I'd love your feedback on how this process could be easier for me, some resources on learning the Grafana query languages, and general comments.Thanks for taking the time to read + engage!

评论 #36227345 未加载

tacker2000将近 2 年前

I have been using Zabbix to monitor my servers for the last years, since I wanted something simple and this Grafana/Prometheus stack always scared me because, as the OP says, of the amount of “moving parts”.Zabbix has been quite solid and has lots of templates for different servers (linux, windows, etc), triggers and can also monitor docker containers (although i never tried that).The only thing Zabbix cant do well is log file monitoring, so I am considering something like an ELK stack as an addition.

评论 #36228449 未加载

评论 #36234682 未加载

评论 #36229530 未加载

评论 #36232412 未加载

shrx将近 2 年前

Mildly related: can anyone recommend a time series database that supports easy aggregation by week (with the ability to configure the start of the week) and month? I'm looking for something to switch from InfluxDB which I'm currently using. The linked article is using Prometheus which also doesn't appear to support this functionality.

评论 #36233257 未加载

评论 #36232119 未加载

majkinetor将近 2 年前

Is there anything easier for logs? Basically glorified ripgrep?

评论 #36240989 未加载

评论 #36237300 未加载

whalesalad将近 2 年前

check out netdata if y'all haven't already - incredible software

评论 #36241112 未加载

codetrotter将近 2 年前

I recently set up packet loss monitoring on a Raspberry Pi, using Prometheus for logging and graphing.<a href="https://video.nstr.no/w/hjTH3Vggn2fvpTrQitMmVP" rel="nofollow">https://video.nstr.no/w/hjTH3Vggn2fvpTrQitMmVP</a>I would like to set up Grafana and more monitoring as well, on some of my other machines. But for now this is what I have :D

czzzzz将近 2 年前

Shameless plug for AppScope (<a href="https://github.com/criblio/appscope">https://github.com/criblio/appscope</a>) which is designed for exactly this. Capturing observability data from processes in your environment without code modification, and shipping the data off to tools like grafana for monitoring.

评论 #36231893 未加载

hardwaresofton将近 2 年前

Has anyone had lots of trouble configuring Grafana via YAML from the documentation? A lot of it is kind of hard to follow.I've found that the ability to (pre)configure Grafana without clicking around in it is pretty difficult.

评论 #36233371 未加载

guybedo将近 2 年前

shameless plug for uptimeFunk (<a href="https://uptimefunk.com" rel="nofollow">https://uptimefunk.com</a>) that i soft launched a some time ago. I wanted some uptime monitoring with nice ui and a few advanced features that i didn't find anywhere: - monitoring mongo db/replicaset status- monitoring sql databases with basic sql queries- monitoring host cpu, ram and disk usage- monitoring docker containers- and being able to monitor all of this through ssh tunnels because not all my services are on the internet

评论 #36237616 未加载

shashasha2将近 2 年前

We've been using nagios and munin for years, this stack is rock solid. We added recently ELK. This feels overkill, heavyweight and fragile.

评论 #36227910 未加载

评论 #36228612 未加载

artisin将近 2 年前

I went down the Grafana rabbit hole, and without a doubt, it's a fantastic tool. It can handle just about any kind of data you throw at it, and when it comes to visualizing time series data, it's second to none. That said, it's a slog to set up and configure, but once finished, I had a beautiful dashboard for my home media server, and life was good. Unfortunately, a few months later, I was forced to upgrade and lacked the time to reconfigure Grafana. So, as a stopgap, I installed Netdata... fast-forward two years, and today I still haven't reconfigured Grafana, nor do I plan to.For my use case, a home media server, Netdata turned out to be way simpler to set up, and, most importantly, way less of a hassle/dink-around. It's a basic plug-and-play operation with auto-discovery. While the dashboard isn't nearly as beautiful or configurable, it gets the job done and provides everything I pretty much need or want. It offers a quick overview, historical metrics (over a year of data) to analyze trends or spot potential issues, and push/email notifications if something goes awry.If you decide to go down this route, there are two major items:1. You'll need to configure the dbengine[1] database to save and store historical metric data. However, I found the dbengine configuration documentation to be a bit confusing, so I'll spare you the trouble - just use this Jupyter Notebook[2]. If needed, adjust the input, run it, scroll down, and you'll see a summary of the number of days, the maximum dbengine size, and the yaml config, which you can copy, paste, and voila.2. If you're hoarding data, you'll probably want to set up smartmontools/smartd[3] in a separate Docker container for better disk monitoring metrics. However, I think you can enable hddtemp[4] with Netdata through the config if you don't want or need the extra hassle. You can have Netdata to query this smartd container, but with a handful of disks, it ends up timing out frequently, so I found it's best to simply set up smartd/smartd.conf to log out the smartd data independently. Then all you need to do is tell Netdata where to find the smartd_log[5], and Netdata handles the rest.Boom, home media server metrics with historical data, done. It still takes a bit of time to set up, but way less than Grafana. Anywho, hopefully, this saves you from wasting as much time as I did. And if you're looking for a smartd reference, shoot me a reply, and I'll tidy up and share my Docker config/scripts and notes.[1] <a href="https://learn.netdata.cloud/docs/typical-netdata-agent-configurations/optimizing-metrics-database/database-modes-for-parent-child-setups#choose-your-database-mode" rel="nofollow">https://learn.netdata.cloud/docs/typical-netdata-agent-confi...</a> [2] <a href="https://colab.research.google.com/github/andrewm4894/netdata-storage-calculator/blob/main/calculator.ipynb#scrollTo=XNOCVoIMBR8xe" rel="nofollow">https://colab.research.google.com/github/andrewm4894/netdata...</a> [3] <a href="https://www.smartmontools.org/wiki" rel="nofollow">https://www.smartmontools.org/wiki</a> [4] <a href="https://github.com/vitlav/hddtemp">https://github.com/vitlav/hddtemp</a> [5] <a href="https://learn.netdata.cloud/docs/data-collection/storage,-mount-points-and-filesystems/hardware-storage/s.m.a.r.t.-attributes#configuration" rel="nofollow">https://learn.netdata.cloud/docs/data-collection/storage,-mo...</a>

评论 #36233635 未加载

revskill将近 2 年前

Just push to github and people will contribute the rest for you. Easy!

Demmme将近 2 年前

With 40 containers I would go kubernetes and with Kube stack you basically have this up and running in 5 minutes.Aligning metric endpoints for fine-tuning.Add tracing to it in a few more clicks