TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Linux /proc/pid/stat parsing bugs

88 点作者 ototot超过 2 年前

15 条评论

xeeeeeeeeeeenu超过 2 年前
In my opinion, the fact that procfs is the only API for so many things is one of the biggest problems with Linux. BSDs have sysctl(), macOS has mach_* functions and, of course, Windows has a real API too.<p>Plain text interfaces lead to complicated, potentially insecure code (especially in C!), they&#x27;re prone to race conditions and slow.<p>I wish it was possible to retrieve that information using real syscalls. I think it&#x27;s a better approach than, for example, inventing a faster way to read procfs: <a href="https:&#x2F;&#x2F;lwn.net&#x2F;Articles&#x2F;813827&#x2F;" rel="nofollow">https:&#x2F;&#x2F;lwn.net&#x2F;Articles&#x2F;813827&#x2F;</a>
评论 #34096063 未加载
评论 #34096122 未加载
评论 #34095714 未加载
评论 #34097167 未加载
评论 #34095491 未加载
评论 #34098004 未加载
评论 #34108158 未加载
评论 #34098549 未加载
ilyt超过 2 年前
I wish &#x2F;proc|&#x2F;sys would just agree on serialization format and just serialize the data into some defined format instead of having a bunch of files that all need their own parser
评论 #34095356 未加载
评论 #34095414 未加载
评论 #34095413 未加载
评论 #34096986 未加载
评论 #34095039 未加载
评论 #34095029 未加载
评论 #34101187 未加载
评论 #34095174 未加载
mzs超过 2 年前
&gt; sudo was bitten by this back in the day (CVE-2017-1000367):<p>&gt; <a href="https:&#x2F;&#x2F;www.openwall.com&#x2F;lists&#x2F;oss-security&#x2F;2017&#x2F;05&#x2F;30&#x2F;16" rel="nofollow">https:&#x2F;&#x2F;www.openwall.com&#x2F;lists&#x2F;oss-security&#x2F;2017&#x2F;05&#x2F;30&#x2F;16</a><p><a href="https:&#x2F;&#x2F;www.openwall.com&#x2F;lists&#x2F;oss-security&#x2F;2022&#x2F;12&#x2F;22&#x2F;5" rel="nofollow">https:&#x2F;&#x2F;www.openwall.com&#x2F;lists&#x2F;oss-security&#x2F;2022&#x2F;12&#x2F;22&#x2F;5</a>
评论 #34118023 未加载
woodruffw超过 2 年前
The &#x2F;proc&#x2F;&lt;pid&gt;&#x2F;* hierarchy has always been a bit of a mess to parse.<p>&#x2F;proc&#x2F;&lt;pid&gt;&#x2F;maps is similarly frustrating: there&#x27;s no clear distinction between &quot;special&quot; maps (like the stack) and a file that might just happen to be named `[stack]`. Similarly, the handling for a mapped region on a deleted file is simply to append &quot; (deleted)&quot;[1].<p>[1]: <a href="https:&#x2F;&#x2F;github.com&#x2F;woodruffw&#x2F;procmaps.rs&#x2F;blob&#x2F;79bd474104e9b3c853e49765e6ee9945fdf833ed&#x2F;src&#x2F;lib.rs#L206-L225">https:&#x2F;&#x2F;github.com&#x2F;woodruffw&#x2F;procmaps.rs&#x2F;blob&#x2F;79bd474104e9b3...</a>
esprehn超过 2 年前
The system level fix is to create a structured record format. That could mean quoting all the records or maybe Linux should finally adopt a standardized format like JSON.
评论 #34095244 未加载
jbverschoor超过 2 年前
Why do you have to parse this kind of stuff at all?<p>Time to let go of the everything is a stream of unorganized characters
kbrazil超过 2 年前
Fortunately `jc`[0] does parse `&#x2F;proc&#x2F;&lt;pid&gt;&#x2F;stat` correctly. I, of course, originally implemented it the naive&#x2F;incorrect way until a contributor fixed it. :)<p><pre><code> $ cat &#x2F;proc&#x2F;2001&#x2F;stat | jc --proc {&quot;pid&quot;:2001,&quot;comm&quot;:&quot;my program with\nsp&quot;,&quot;state&quot;:&quot;S&quot;,&quot;ppid&quot;:1888,&quot;pgrp&quot;:2001,&quot;session&quot;:1888,&quot;tty_nr&quot;:34816,&quot;tpg_id&quot;:2001,&quot;flags&quot;:4202496,&quot;minflt&quot;:428,&quot;cminflt&quot;:0,&quot;majflt&quot;:0,&quot;cmajflt&quot;:0,&quot;utime&quot;:0,&quot;stime&quot;:0,&quot;cutime&quot;:0,&quot;cstime&quot;:0,&quot;priority&quot;:20,&quot;nice&quot;:0,&quot;num_threads&quot;:1,&quot;itrealvalue&quot;:0,&quot;starttime&quot;:75513,&quot;vsize&quot;:115900416,&quot;rss&quot;:297,&quot;rsslim&quot;:18446744073709551615,&quot;startcode&quot;:4194304,&quot;endcode&quot;:5100612,&quot;startstack&quot;:140737020052256,&quot;kstkeep&quot;:140737020050904,&quot;kstkeip&quot;:140096699233308,&quot;signal&quot;:0,&quot;blocked&quot;:65536,&quot;sigignore&quot;:4,&quot;sigcatch&quot;:65538,&quot;wchan&quot;:18446744072034584486,&quot;nswap&quot;:0,&quot;cnswap&quot;:0,&quot;exit_signal&quot;:17,&quot;processor&quot;:0,&quot;rt_priority&quot;:0,&quot;policy&quot;:0,&quot;delayacct_blkio_ticks&quot;:0,&quot;guest_time&quot;:0,&quot;cguest_time&quot;:0,&quot;start_data&quot;:7200240,&quot;end_data&quot;:7236240,&quot;start_brk&quot;:35389440,&quot;arg_start&quot;:140737020057179,&quot;arg_end&quot;:140737020057223,&quot;env_start&quot;:140737020057223,&quot;env_end&quot;:140737020059606,&quot;exit_code&quot;:0,&quot;state_pretty&quot;:&quot;Sleeping in an interruptible wait&quot;} </code></pre> [0] <a href="https:&#x2F;&#x2F;kellyjonbrazil.github.io&#x2F;jc&#x2F;docs&#x2F;parsers&#x2F;proc_pid_stat" rel="nofollow">https:&#x2F;&#x2F;kellyjonbrazil.github.io&#x2F;jc&#x2F;docs&#x2F;parsers&#x2F;proc_pid_st...</a>
smasher164超过 2 年前
makes you wonder if it&#x27;s really that valuable to have all our infrastructure built on parsing text
评论 #34095729 未加载
avar超过 2 年前
I noticed this around a year ago when writing a &#x2F;proc&#x2F;paid&#x2F;stat parser for git (for logging the chain of parent processes).<p>Here&#x27;s that commit, it has a comment with an overview of the kernel limits and caveats involved: <a href="https:&#x2F;&#x2F;github.com&#x2F;git&#x2F;git&#x2F;commit&#x2F;2d3491b117c6dd08e431acc3904a546c4304d276">https:&#x2F;&#x2F;github.com&#x2F;git&#x2F;git&#x2F;commit&#x2F;2d3491b117c6dd08e431acc390...</a>
评论 #34103146 未加载
bigcat12345678超过 2 年前
We are pixie.io ran into exact problem, we fixed that by parsing the braces, ugly but seems working<p><a href="https:&#x2F;&#x2F;github.com&#x2F;pixie-io&#x2F;pixie&#x2F;blob&#x2F;bd82bb48ef4da7d6b05f27fe7728dfff6687a5c6&#x2F;src&#x2F;common&#x2F;system&#x2F;proc_parser.cc#L227">https:&#x2F;&#x2F;github.com&#x2F;pixie-io&#x2F;pixie&#x2F;blob&#x2F;bd82bb48ef4da7d6b05f2...</a>
评论 #34095941 未加载
评论 #34098893 未加载
cryptonector超过 2 年前
The process name should have been last. Now parsers have to split on space and then take the first token and the last N-2 tokens to leave behind the tokens that make up the second field, then join those with spaces to reconstruct the second field (or use the length of the first and the offset of the third fields to re-parse the second).
评论 #34096149 未加载
inetknght超过 2 年前
It&#x27;s almost as if there should be an API for procfs instead of having everyone write their own reader and parser...
YesThatTom2超过 2 年前
If there is exactly one field with the “may contain spaces” problem there’s a better solution: parse the line forwards for the fields up to that one, parse the line backwards for the remaining.
评论 #34098769 未加载
mort96超过 2 年前
I&#x27;ve been bitten by and tried to work around this as well. From what I can tell, the best you can really do is to parse by matching up parens, but someone could totally make a program with braces in its name. If I make a binary called &quot;foo) R 10 20 30&quot;, the &#x2F;proc&#x2F;&lt;pid&gt;&#x2F;stat entry will contain &quot;1715376 (foo) R 10 20 30) 1544883 1715376 1544883...&quot;. It&#x27;s terribly non-obvious how to deal with correctly.
评论 #34095048 未加载
评论 #34095122 未加载
graymatters超过 2 年前
Aside from bashing a paradigm one is not used to&#x2F;doesn’t like&#x2F;didn’t grow up with, what are the real cases where dealing with the textual output of procfs creates serious realistic performance issues? The argument about insecurity of a hand rolled C parser for that is utterly unconvincing.