TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Systemd: Enable indefinite service restarts

126 pointsby secureover 1 year ago

11 comments

deathanatosover 1 year ago
&gt; <i>Why does systemd give up by default?</i><p>&gt; <i>I’m not sure. If I had to speculate, I would guess the developers wanted to prevent laptops running out of battery too quickly because one CPU core is permanently busy just restarting some service that’s crashing in a tight loop.</i><p><i>sigh</i> … bounded randomized exponential backoff retry.<p>(exponential: double the maximum time you might wait each iteration. Randomized: the time you want is a random amount, between [0, current maximum] (yes, zero.). Bounded: you stop doubling at a certain point, like 5 minutes, so that we&#x27;ll never wait longer than 5 minutes; otherwise, at some point you&#x27;re waiting for ∞s, which I guess is like giving up.)<p>(The concern about logs filling up is a worse one. It won&#x27;t directly solve this, but a high enough max wait usually slows the rate of log generation enough that it becomes small enough to not matter. Also do your log rotations on size.)
评论 #39048526 未加载
评论 #39048719 未加载
评论 #39048279 未加载
评论 #39048926 未加载
ElectricSpoonover 1 year ago
&gt; I would guess the developers wanted to prevent laptops running out of battery too quickly<p>And I would guess sysadmins also don&#x27;t like their logging facilities filling the disks just because a service is stuck in a start loop. There are many reasons to think a service failing to start multiple times in a row won&#x27;t start. Misconfiguration is probably the most frequent reason for that.
评论 #39047411 未加载
评论 #39047580 未加载
评论 #39051465 未加载
tadfisherover 1 year ago
This must be a different philosophy. When I see something like this happening, I investigate to find out <i>why</i> the service is failing to start, which usually uncovers some dependency that can be encoded in the service unit, or some bug in the service.
评论 #39045965 未加载
评论 #39046439 未加载
评论 #39045841 未加载
评论 #39048478 未加载
评论 #39045875 未加载
评论 #39051088 未加载
PhilipRomanover 1 year ago
I can understand avoiding infinite restarts when there is something clearly wrong with configuration, but I can&#x27;t figure out why they made the &quot;systemctl restart&quot; command also limited by this. For services which don&#x27;t support dynamic reloading, restarting them is a substitute for that. This makes &quot;systemctl restart&quot; extremely brittle when used from scripts.<p>Nobody accidentally runs &quot;systemctl restart&quot; too fast, when such a command is issued it is clearly intentional and should be always respected by systemd.
评论 #39073576 未加载
评论 #39050028 未加载
twinpeakover 1 year ago
Recently discovered while making a monitoring script that systemd exposes a few properties that can be used to alert on a service that is continuously failing to start if it&#x27;s set to restart indefinitely.<p><pre><code> # Get the number of restarts for a service to see if it exceeds an arbitrary threshold. systemctl show -p NRestarts &quot;${SYSTEMD_UNIT}&quot; | cut -d= -f2 # Get when the service started, to work out how long it&#x27;s been running, as the restart counter isn&#x27;t reset once the service does start successfully. systemctl show -p ActiveEnterTimestamp &quot;${SYSTEMD_UNIT}&quot; | cut -d= -f2 # Clear the restart counter if the service has been running for long enough based on the timestamp above systemctl reset-failed &quot;${SYSTEMD_UNIT}&quot;</code></pre>
o11cover 1 year ago
It would be nice if `RestartSec` weren&#x27;t constant.<p>Then you could have the default be 100ms for one-time blips, but (after a burst of failures) fall back gradually to 10s to avoid spinning during longer outages.<p>That said, beware of failure <i>chains</i> causing the interval to add up. AFAIK there&#x27;s no way to have the kernel notify you of when a different process starts listening on a port.
评论 #39046801 未加载
评论 #39047136 未加载
评论 #39051552 未加载
评论 #39047128 未加载
akira2501over 1 year ago
I&#x27;ve always preferred daemontools and runit&#x27;s ideology here. If a service dies, wait one second, then try starting it. Do this forever.<p>The last thing I need is emergent behavior out of my service manager.
评论 #39048129 未加载
franknord23over 1 year ago
I believe this allows you to have cascading restart strategies, similar to what can be done in Erlang&#x2F;OTP: Only after the StartLimit= has been reached, systemd considers the service as failed. Then services that have Required= set on the failed service will be restarted&#x2F;marked failed as well.<p>I think you can even have systemd reboot or move the system into a recovery mode (target) if an essential unit does not come up. That way, you can get pretty robust systems that are highly tolerant to failures.<p>(Now after reading `man systemd.unit`, i am not fully sure how exactly restarts are cascaded to requiring units.)
评论 #39051458 未加载
mise_en_placeover 1 year ago
I’ve been bitten by the restart limit many times. Our application server (backend) was crash looping, newest build fixed the crash, but systemd refused to restart the service due to the limit. A subtle but very annoying default behavior.
评论 #39046879 未加载
评论 #39048142 未加载
halyconWaysover 1 year ago
Seems reasonable if the service is failing due to a transient network issue, which takes many minutes to resolve.
bravetravelerover 1 year ago
&gt; And then you need to remember to restart the dependent services later, which is easy to forget.<p>You missed the other direction of the relationship.<p>I posted elsewhere in the thread on this, don&#x27;t rely on entropy. Define your dependencies (well)<p>After=&#x2F;Requires= are obvious. People forget PartOf=.