TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Systemd: Enable indefinite service restarts

126 点作者 secure超过 1 年前

11 条评论

deathanatos超过 1 年前
&gt; <i>Why does systemd give up by default?</i><p>&gt; <i>I’m not sure. If I had to speculate, I would guess the developers wanted to prevent laptops running out of battery too quickly because one CPU core is permanently busy just restarting some service that’s crashing in a tight loop.</i><p><i>sigh</i> … bounded randomized exponential backoff retry.<p>(exponential: double the maximum time you might wait each iteration. Randomized: the time you want is a random amount, between [0, current maximum] (yes, zero.). Bounded: you stop doubling at a certain point, like 5 minutes, so that we&#x27;ll never wait longer than 5 minutes; otherwise, at some point you&#x27;re waiting for ∞s, which I guess is like giving up.)<p>(The concern about logs filling up is a worse one. It won&#x27;t directly solve this, but a high enough max wait usually slows the rate of log generation enough that it becomes small enough to not matter. Also do your log rotations on size.)
评论 #39048526 未加载
评论 #39048719 未加载
评论 #39048279 未加载
评论 #39048926 未加载
ElectricSpoon超过 1 年前
&gt; I would guess the developers wanted to prevent laptops running out of battery too quickly<p>And I would guess sysadmins also don&#x27;t like their logging facilities filling the disks just because a service is stuck in a start loop. There are many reasons to think a service failing to start multiple times in a row won&#x27;t start. Misconfiguration is probably the most frequent reason for that.
评论 #39047411 未加载
评论 #39047580 未加载
评论 #39051465 未加载
tadfisher超过 1 年前
This must be a different philosophy. When I see something like this happening, I investigate to find out <i>why</i> the service is failing to start, which usually uncovers some dependency that can be encoded in the service unit, or some bug in the service.
评论 #39045965 未加载
评论 #39046439 未加载
评论 #39045841 未加载
评论 #39048478 未加载
评论 #39045875 未加载
评论 #39051088 未加载
PhilipRoman超过 1 年前
I can understand avoiding infinite restarts when there is something clearly wrong with configuration, but I can&#x27;t figure out why they made the &quot;systemctl restart&quot; command also limited by this. For services which don&#x27;t support dynamic reloading, restarting them is a substitute for that. This makes &quot;systemctl restart&quot; extremely brittle when used from scripts.<p>Nobody accidentally runs &quot;systemctl restart&quot; too fast, when such a command is issued it is clearly intentional and should be always respected by systemd.
评论 #39073576 未加载
评论 #39050028 未加载
twinpeak超过 1 年前
Recently discovered while making a monitoring script that systemd exposes a few properties that can be used to alert on a service that is continuously failing to start if it&#x27;s set to restart indefinitely.<p><pre><code> # Get the number of restarts for a service to see if it exceeds an arbitrary threshold. systemctl show -p NRestarts &quot;${SYSTEMD_UNIT}&quot; | cut -d= -f2 # Get when the service started, to work out how long it&#x27;s been running, as the restart counter isn&#x27;t reset once the service does start successfully. systemctl show -p ActiveEnterTimestamp &quot;${SYSTEMD_UNIT}&quot; | cut -d= -f2 # Clear the restart counter if the service has been running for long enough based on the timestamp above systemctl reset-failed &quot;${SYSTEMD_UNIT}&quot;</code></pre>
o11c超过 1 年前
It would be nice if `RestartSec` weren&#x27;t constant.<p>Then you could have the default be 100ms for one-time blips, but (after a burst of failures) fall back gradually to 10s to avoid spinning during longer outages.<p>That said, beware of failure <i>chains</i> causing the interval to add up. AFAIK there&#x27;s no way to have the kernel notify you of when a different process starts listening on a port.
评论 #39046801 未加载
评论 #39047136 未加载
评论 #39051552 未加载
评论 #39047128 未加载
akira2501超过 1 年前
I&#x27;ve always preferred daemontools and runit&#x27;s ideology here. If a service dies, wait one second, then try starting it. Do this forever.<p>The last thing I need is emergent behavior out of my service manager.
评论 #39048129 未加载
franknord23超过 1 年前
I believe this allows you to have cascading restart strategies, similar to what can be done in Erlang&#x2F;OTP: Only after the StartLimit= has been reached, systemd considers the service as failed. Then services that have Required= set on the failed service will be restarted&#x2F;marked failed as well.<p>I think you can even have systemd reboot or move the system into a recovery mode (target) if an essential unit does not come up. That way, you can get pretty robust systems that are highly tolerant to failures.<p>(Now after reading `man systemd.unit`, i am not fully sure how exactly restarts are cascaded to requiring units.)
评论 #39051458 未加载
mise_en_place超过 1 年前
I’ve been bitten by the restart limit many times. Our application server (backend) was crash looping, newest build fixed the crash, but systemd refused to restart the service due to the limit. A subtle but very annoying default behavior.
评论 #39046879 未加载
评论 #39048142 未加载
halyconWays超过 1 年前
Seems reasonable if the service is failing due to a transient network issue, which takes many minutes to resolve.
bravetraveler超过 1 年前
&gt; And then you need to remember to restart the dependent services later, which is easy to forget.<p>You missed the other direction of the relationship.<p>I posted elsewhere in the thread on this, don&#x27;t rely on entropy. Define your dependencies (well)<p>After=&#x2F;Requires= are obvious. People forget PartOf=.