TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

File Descriptor Transfer over Unix Domain Sockets

117 点作者 talonx超过 4 年前

17 条评论

rwmj超过 4 年前
The article doesn&#x27;t mention that file descriptors &quot;in flight&quot; over sockets are garbage collected if the listening process doesn&#x27;t pick them up. This has been the subject of serveral bugs&#x2F;security issues: <a href="https:&#x2F;&#x2F;nvd.nist.gov&#x2F;vuln&#x2F;detail&#x2F;CVE-2008-5029" rel="nofollow">https:&#x2F;&#x2F;nvd.nist.gov&#x2F;vuln&#x2F;detail&#x2F;CVE-2008-5029</a> <a href="https:&#x2F;&#x2F;lwn.net&#x2F;Articles&#x2F;779472&#x2F;" rel="nofollow">https:&#x2F;&#x2F;lwn.net&#x2F;Articles&#x2F;779472&#x2F;</a><p>Al Viro&#x27;s description sums up one of the recent problems (which was fixed):<p><i>Among the features provided by io_uring is the ability to &quot;register&quot; one or more files with an open ring; that speeds I&#x2F;O operations by eliminating the need to acquire and release references to the registered files every time. When a file is registered with an io_uring, the kernel will create and hold a reference for the duration of that registration. This is a useful feature but it contained a problem that, seemingly, only somebody with a Viro-level understanding of the VFS could spot, describe, and fix; it is a new variant on the cycle problem described above. In short: a process could create a Unix-domain socket and register both ends with an io_uring. If it were then to pass the file descriptor corresponding to the io_uring itself over that socket, then close all of the file descriptors, a cycle would be created. The io_uring code was unprepared for that eventuality. </i>
评论 #24966659 未加载
jmgao超过 4 年前
This doesn&#x27;t list the biggest gotcha of all, the fact that the cmsg API is incredibly sharp and there are at least 7 ways you can screw it up. Nearly every single use of cmsg in Android&#x27;s source tree was buggy in at least one of these ways:<p><pre><code> - not aligning the cmsg buffer - leaking fds if more fds are received than expected - blindly dereferencing CMSG_DATA without checking the header - using CMSG_SPACE(fd_count) instead of CMSG_SPACE(fd_count * sizeof(int)) - using CMSG_SPACE instead of CMSG_LEN for .cmsg_len - using CMSG_LEN instead of CMSG_SPACE for .msg_controllen - using a length specified in number of fds instead of bytes </code></pre> It&#x27;s possible that Android is uniquely bad at this, but I&#x27;m skeptical.
评论 #24966869 未加载
评论 #24966150 未加载
评论 #24967880 未加载
notaplumber超过 4 年前
File descriptor passing has been a common technique used in privilege separated design, used quite extensively in OpenBSD software. Other notable examples include Google&#x27;s Chrome browser. It&#x27;s quite telling how it&#x27;s not mentioned once in this article.<p>Combined with OS security features like pledge(2) on OpenBSD, which has separate sendfd&#x2F;recvfd promises, and unveil(2), an unprivileged process can have its access to the filesytem and other system attack surfaces (system calls, ioctls) removed completely or restricted and only be able to act on file descriptors passed by a privileged parent.<p><a href="https:&#x2F;&#x2F;man.openbsd.org&#x2F;pledge.2" rel="nofollow">https:&#x2F;&#x2F;man.openbsd.org&#x2F;pledge.2</a><p><a href="https:&#x2F;&#x2F;man.openbsd.org&#x2F;unveil.2" rel="nofollow">https:&#x2F;&#x2F;man.openbsd.org&#x2F;unveil.2</a><p>A skeleton example of a common style for OpenBSD privsep daemons, which uses 3 processes<p><a href="https:&#x2F;&#x2F;github.com&#x2F;krwesterback&#x2F;newdctl" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;krwesterback&#x2F;newdctl</a><p><a href="https:&#x2F;&#x2F;github.com&#x2F;krwesterback&#x2F;newd" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;krwesterback&#x2F;newd</a><p>This uses OpenBSD&#x27;s imsg(3) API, an abstraction around the underlying Unix sendmsg&#x2F;SCM_RIGHTS functionality, along with other IPC abstractions.<p><a href="https:&#x2F;&#x2F;man.openbsd.org&#x2F;imsg_init.3" rel="nofollow">https:&#x2F;&#x2F;man.openbsd.org&#x2F;imsg_init.3</a><p><a href="https:&#x2F;&#x2F;github.com&#x2F;tmux&#x2F;tmux&#x2F;blob&#x2F;master&#x2F;compat&#x2F;imsg.c" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;tmux&#x2F;tmux&#x2F;blob&#x2F;master&#x2F;compat&#x2F;imsg.c</a><p><a href="https:&#x2F;&#x2F;github.com&#x2F;tmux&#x2F;tmux&#x2F;blob&#x2F;master&#x2F;compat&#x2F;imsg.h" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;tmux&#x2F;tmux&#x2F;blob&#x2F;master&#x2F;compat&#x2F;imsg.h</a>
apankrat超过 4 年前
FD transfer is also used when a process needs to work with files (or devices) that are out of its reach due to the account restrictions.<p>In this case, the process will talk to another process that <i>does</i> have required access, the latter would open the file of interest and pass the handle back.<p>This is needed very rarely, but in cases where it&#x27;s a good fit, it provides a very elegant and simple solution for an otherwise hairy problem.<p>One such case is when the program uses an Engine + UI model, whereby the engine runs under a system account and the UI is under an interactive user. As the engine runs, it writes logs and the UI needs to display them. So one solution is to tweak permissions on the log files to make them universally readable. It&#x27;s not hard to do, but it makes the whole thing more fragile - these permissions may get inadvertently stripped off, the UI process may be sandboxed by an antivirus, etc. That is, the program may end up in a state when the UI cannot access the logs, but the engine can.<p>The alternative here is for the UI process to ask the engine to open the logs and pass the handles back. Very simple to do and resistant to accidental breakage.<p>Another case was when we had to ship a pre-built Linux binary (a VPN client) that needed to open a TAP device. The latter normally requires a root access, but the client was closed-source, so it had to be able to run under restricted user accounts, because asking people to run it under the root was not an option. The solution was to make a small open-source daemon that listened on a domain socket for requests to open &#x2F;dev&#x2F;tapx, did that and passed tap FDs back to the requesting process.
secure超过 4 年前
I’m using this technique in manpages.debian.org, which uses mandocd, a daemon which allows converting many manpages (using mandoc) without the exec overhead: <a href="https:&#x2F;&#x2F;github.com&#x2F;Debian&#x2F;debiman&#x2F;commit&#x2F;3715b1eaf9c1793b9a8c7b1787e2d6511ca2b004" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;Debian&#x2F;debiman&#x2F;commit&#x2F;3715b1eaf9c1793b9a8...</a><p>I’m transferring the stdin, stdout and stderr file descriptors instead of starting new processes :)
wallstprog超过 4 年前
FWIW this technique is described in &quot;Unix Network Programming&quot; by W. Richard Stevens (<a href="http:&#x2F;&#x2F;www.kohala.com&#x2F;start&#x2F;unp.html" rel="nofollow">http:&#x2F;&#x2F;www.kohala.com&#x2F;start&#x2F;unp.html</a>) in section 6.1: &quot;Passing File Descriptors&quot;
kortilla超过 4 年前
Unless you have really long-lived connections you cannot drop, let your load balancer layer above the machine drain old connections and then just restart after some threshold. I’ve seen too many bugs in socket handover scenarios to make it worth it in basically every normal use case.<p>Remember, just handing over your sockets means you get connections in all kinds of different phases of your protocol’s state machine. So now you need to bolt on some more context transfer mechanisms as well...
评论 #24965357 未加载
bhawks超过 4 年前
Android makes liberal use of exchanging file descriptors between processes in it&#x27;s ipc mechanisms. Slightly different use case then what the article discusses but it&#x27;s an interesting pattern available to multiprocess same host ipc.
kevinoid超过 4 年前
Another real-world use case is zero-copy interprocess communication, as in the Wayland protocol &lt;<a href="https:&#x2F;&#x2F;wayland-book.com&#x2F;surfaces&#x2F;shared-memory.html" rel="nofollow">https:&#x2F;&#x2F;wayland-book.com&#x2F;surfaces&#x2F;shared-memory.html</a>&gt;. It can also be combined with sealed files &lt;<a href="https:&#x2F;&#x2F;lwn.net&#x2F;Articles&#x2F;593918&#x2F;" rel="nofollow">https:&#x2F;&#x2F;lwn.net&#x2F;Articles&#x2F;593918&#x2F;</a>&gt; to avoid some of the pitfalls of shared memory.
评论 #24971440 未加载
评论 #24967962 未加载
peter_d_sherman超过 4 年前
&gt;&quot;Socket Takeover enables Zero Downtime Restarts for Proxygen by spinning up an updated instance in parallel that takes over the listening sockets, whereas the old instance goes into graceful draining phase. The new instance assumes the responsibility of serving the new connections and responding to health-check probes from the L4LB Katran. Old connections are served by the older instance until the end of draining period, after which other mechanism (e.g.,Downstream Connection Reuse) kicks in.&quot;<p>Great idea!<p>Now, that being said, this idea, or rather, this specific solution to this specific problem -- is actually a subset of a much broader problem in Computer Science, and that is:<p><i>How to move any OS component (up to and including running programs that may have many files, locks, sockets and other shared OS objects open) to another OS on another machine,<p>without causing any problems!</i><p>That is, how to move such entities <i>robustly</i>.<p>There are various ideas in this field (of which the above paper&#x2F;article is one) -- but due to the complexities involved, there are no easy answers (at least, not as far as moving whole running programs with many shared OS objects go).<p>At least one, and probably several experimental OS&#x27;s have been created in the past which attempt to do this -- but they aren&#x27;t mainstream, and without doing more research, I&#x27;m not sure how robust (which is always a subjective term!) they were...<p>But, it&#x27;s a fascinating area of Computer Science, to be sure.<p>Anyway, great idea and great article in this area!
neomantra超过 4 年前
Using UDS to seamlessly move Proxygen workloads like that is so slick!<p>Here&#x27;s how we&#x27;ve used this file descriptor transfer feature:<p>We made a transport which can accept local connections on a Unix Domain Socket and then &quot;upgrades&quot; that connection to two pipes (read and write). Those pipes are passed over the UDS and the client&#x2F;server communicate over them.<p>We use a kernel-bypass library (OpenOnload) that implement pipes as shared memory in user-space. Very low latency and high throughput.<p>We made a `boost::asio` implementation of this available on GitHub. It is old and I&#x27;m not sure if it works with latest Boost, but it is quite readable for people to play with. We once bolted it onto Redis&#x27; Unix Socket transport for fun, but abandoned it as it was a hassle to maintain.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;neomantra&#x2F;asio-pipe-transport" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;neomantra&#x2F;asio-pipe-transport</a>
jfrunyon超过 4 年前
Also useful for operating one process which might provide access to certain files, to other processes.<p>You can further restrict it using the fact that you can easily verify the PID&#x2F;UID&#x2F;GID on the other end of the UNIX socket. (<a href="https:&#x2F;&#x2F;man7.org&#x2F;linux&#x2F;man-pages&#x2F;man7&#x2F;unix.7.html" rel="nofollow">https:&#x2F;&#x2F;man7.org&#x2F;linux&#x2F;man-pages&#x2F;man7&#x2F;unix.7.html</a> see SO_PEERCRED) You can also manually send your PID&#x2F;UID&#x2F;GID (which allows you to specify any of your real-, effective-, or saved set- *ID; or if you&#x27;re root anything).
jitl超过 4 年前
This can also be a handy technique when dealing with CLI tools that take a long time to boot&#x2F;warm. Leave a zygote&#x2F;server process running in the background that’s warmed, serving a UDS. When you want to invoke the tool, have a lightweight client connect to the server UDS and send argv, the env vars, cwd, and open file descriptors across the UDS to the zygote. The zygote forks, and the fork sets the env vars, argv, etc on itself and then runs the job.<p>The only problem with this is cancellation- you’ll need your client to propagate signals to to forked runner process, as well.
Canada超过 4 年前
This technique is also used to implement privilege separation in OpenSSH.
评论 #24972897 未加载
bogomipz超过 4 年前
Can someone say what method Nginx uses to pass file descriptors from the listening or master socket to the worker? Does the Nginx master proc process accept() new connection and use sendmsg() with SCM_RIGHTS to send those new connections to worker processes?
评论 #24972992 未加载
bla3超过 4 年前
Plan9 had the dedicated sendfd()&#x2F;recvfd () functions, which seems like a friendlier API.
touisteur超过 4 年前
I have one question. Could this be done with CRIU better? I mean transferring the socket from one process to another or from one netns to another? libsoccr is just a kind of socket serialisation library, right ? :-)