TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

PSI: Pressure stall information for CPU, memory, and IO

123 pointsby henridfalmost 7 years ago

7 comments

Scaevolusalmost 7 years ago
Handling OOM livelocks is exciting, and they have a good explanation for why the current OOM-killer fails:<p>&gt; One usecase is avoiding OOM hangs&#x2F;livelocks. The reason these happen is because the OOM killer is triggered by reclaim not being able to free pages, but with fast flash devices there is <i>always</i> some clean and uptodate cache to reclaim; the OOM killer never kicks in, even as tasks spend 90% of the time thrashing the cache pages of their own executables. There is no situation where this ever makes sense in practice.
SomeHacker44almost 7 years ago
Interesting!<p>After reading it, I realized I was actually hoping for information at a lower level than VM for memory pressure. Actually finding live, actionable information about DRAM bandwidth usage, delays caused by the hardware system of caches including TLBs, L1&#x2F;2&#x2F;3 caches and main memory contention, etc. I have not found that existing tools are insufficient in monitoring&#x2F;dealing with VM swapping - OTOH I usually seek to keep that at zero and leave a little swap just to allow for some chance of alerting and recovery before OOM killer kicks in.
评论 #17582360 未加载
zokieralmost 7 years ago
Digging the LKML thread, this appears to be the corresponding userland component for the OOM use-case:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;facebookincubator&#x2F;oomd" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;facebookincubator&#x2F;oomd</a><p>There was also more minimal proof-of-concept example posted by Endless OS guys:<p><a href="https:&#x2F;&#x2F;gist.github.com&#x2F;dsd&#x2F;a8988bf0b81a6163475988120fe8d9cd" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;dsd&#x2F;a8988bf0b81a6163475988120fe8d9cd</a>
teddyhalmost 7 years ago
Sounds good. The next step would be to start using this instead of load average in all the appropriate places, like batch(1), etc.
everybodyknowsalmost 7 years ago
Curious that there is no mention of the existing &quot;memory&quot; cgroup. On some desktop Linux, you&#x27;ll find it here:<p><pre><code> ls -l &#x2F;sys&#x2F;fs&#x2F;cgroup&#x2F;memory&#x2F; </code></pre> The 000-permission &#x27;pressure_level&#x27; file controls asynchronous notifications to apps, advising prompt shedding of load. This is apparently the mechanism alluded to in a Googler&#x27;s recent blog post, writing from the point of view of Go server coding: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=17551012" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=17551012</a>
politicianalmost 7 years ago
I&#x27;m happy to see a new take on trying to produce a meaningful load metric.
评论 #17581410 未加载
cjhanksalmost 7 years ago
I have long looked for an efficient metric for measuring VM pressure. Hope to see this, or something like this merged.