This also caused a lot of trouble for time libraries in Rust. The two foundational libraries, chrono and time, rely on localtime_r to get the local time instead of the clock value in UTC. localtime_r reads the TZ environment variable (and optionally others like TZ_DIR). Rust declares it safe to modify the environment, while POSIX declares it unsafe.<p>CVE-2020-26235, RUSTSEC-2020-0071 and RUSTSEC-2020-0159 where opened against the crates. That left the Rust ecosystem with a pretty much unsolvable issue for many months. Chrono went with the solution to parse the timezone database of the OS natively and read the environment using the Rust locks. Time tries to detect if the libc version has thread-safety guarantees to access the environment, and otherwise panics if there are multiple threads.<p>More reading: <a href="https://docs.rs/chrono/latest/chrono/#security-advisories" rel="nofollow noreferrer">https://docs.rs/chrono/latest/chrono/#security-advisories</a>
> C Doesn't Want to Fix It<p>Or: C knows that <i>it doesn't need fixing.</i><p>How often do I need to `setenv()` anything? The answer is "Never" in the vast majority of programs, because ENVVRS are usually read rather than set, so this issue is nonexistent for them.<p>For the vast majority of the small amount of programs that actually need to use `setenv()`, the answer is: "Maybe once or twice during the entire lifetime of the process, and then only at the very start, probably even before running any threads", meaning this issue is nonexistent for them as well.<p>So, is there a potential issue with thread safetey? Yes. Does it matter given where and under what circumstances it occurs? Not really.<p>> such as Go's os.Setenv (Go issue)<p>Here is the link to the "issue":<p><a href="https://github.com/golang/go/issues/63567">https://github.com/golang/go/issues/63567</a><p>What kind of actual real life production code would continuously set envvars while simutaneously calling a function that tries to read the environment?<p>Yes, this is a footgun. But even the issues author acknowledges, in the issue thread:<p><pre><code> Realistically: this is a pretty rare problem, and documenting
it is probably a fine solution. This is probably going to cost
someone else a couple of days of debugging every couple of
years
</code></pre>
> It has wasted thousands of hours of people's time, either debugging the problems, or debating what to do about it.<p>Source?
This is silly, setenv() isn't reentrant for the same reason that getopt() isn't reentrant: there's no valid reason to use it except at the very beginning of the program.<p>The most common misuse I see is changing env before forking a child: nobody has to do that, execve() lets you pass arbitrary envp to the new process without changing yours.<p>If you need to change env in threaded tests... frankly I think there was probably a better way to do whatever you're doing, but you can just declare a global lock and use it. I bet you could even LD_PRELOAD a custom setenv() that uses your lock.<p>Nobody is pointing at concrete problems outside of Rust. Rust is just wrong here, sorry, the manpage has said this for a long time:<p>> POSIX.1 does not require setenv() or unsetenv() to be reentrant.<p>I think a more intellectually honest version of this article would have been "POSIX should have made setenv() reentrant", not "C is buggy": it's not buggy, it obviously complies with the standard. There's nothing to "fix", he wants to change the standard.
> This is a list of some uses of environment variables from fairly widely used libraries and services. This shows that environment variables are pretty widely used.<p>Widely used, yes. Used as in read. Why do any of these need to change at runtime? And if they do - why are they environment variables?<p>(NB: starting a new process is not "at runtime")
The essential problem is that there is no thread-safe way to implement this while maintaining backwards compatibility -- applications can alter the environment block by changing the environ global pointer, applications can also alter the environment block by replacing individual pointers in the environ array, applications can also alter the environment block by altering the strings pointed to by the individual members of the environ array, applications can also alter the environment block by using setenv/putenv/etc.<p>Inserting a mutex into the setenv/getenv/etc. functions is pointless because applications are explicitly allowed to modify the environ pointer and array directly without any locking.
Eyra is a new implementation of libc in Rust that addresses this in its default configuration:<p><i>"Eyra solves this by having setenv etc. just leak the old memory. That ensures that it stays valid for as long as any thread needs it. Granted, leaking isn't great, and Eyra makes it configurable with the threadsafe-setenv cargo feature, so it can be disabled in favor of the thread-unsafe implementation. "</i><p><a href="https://blog.sunfishcode.online/eyra-does-the-impossible/" rel="nofollow noreferrer">https://blog.sunfishcode.online/eyra-does-the-impossible/</a>
<i>Should</i> apps orchestrate a super-global lock of a foreign namespace?<p>An environment variable's value, for a running process, is just what it is: an initial value from outside.<p>Adding complexity around it smells like an attempt to control a distributed mutex, like checking an API for real-time value changes in a while loop across several instances of the same app.<p>I thought there would be alternatives to this, like pubsub, Kafka, or other asynchronous event handling.<p>Imagine having to test an app for its ability to handle safe read-write of OS-level state. It's definitionally bankrupt: not really a unit, not easy to set up quickly, and not isolated.
Cool, ever since that rachelbythebay article [1] I was wondering how different libcs handle the issue! Nice to see that someone else confirming the behavior of apple'c libc. It's not mentioned in the article, but while apple's libc seems to suffer from the use-after-free issue, if I'm reading it right it does seem to have locking for setenv/getenv [3]<p>[1] <a href="https://news.ycombinator.com/item?id=37908655">https://news.ycombinator.com/item?id=37908655</a>
[2] <a href="https://news.ycombinator.com/item?id=37952916">https://news.ycombinator.com/item?id=37952916</a>
[3] <a href="https://github.com/apple-open-source-mirror/Libc/blob/master/stdlib/FreeBSD/getenv.c">https://github.com/apple-open-source-mirror/Libc/blob/master...</a>
What is the use case for a mutable environment past initialization?<p>It seems like a complicated and error prone thing to be using no matter if it is thread safe or not. You can set up your own environment before you launch threads, and you can launch child processes with a different environment from the current process without modifying your own. If you fork, you can modify the environment in the child without affecting the parent until you exec.<p>And even setenv() if it was reentrant and couldn't cause crashes, it wouldn't be thread safe, since threads share the environment and could get their environment changed under their feet.
The described problem (thread safety) for a global configuration seems mostly a misunderstanding by the author.<p>The usual case for modifying a global state is: modify once, then proceed (e.g. start new threads). Even if all the calls become thread safe, the behavior would be inconsistent, still.
And to re-iterate my point from another thread on setenv, no, "just don't call setenv() after creating threads" is not a solution because even if the code <i>you</i> wrote may be single-threaded, your application as a whole is not composed entirely of code written by you: the moment you link against <i>any</i> 3rd-party library, you program can have arbitrarily many threads before it even reaches main().
Come on! Of all C/Posix thread-safety issues in circulation this can plausibly be considered the most moot one.<p>Environment variables are not meant to be an inter-thread communication channel and the documentation that points out setenv() is not thread-safe is very much a fair shot.<p>You rarely, if ever, need to setenv() anything maybe unless you're a shell. For spawning children execve() already takes an envp parameter. For debugging I think I've mostly set in-process environment variables manually from gdb.<p>Further, because environment variables are an interface between the process and its environment you typically read environment variables at start and cache the parsed values in some internal location. If you need to change that global state on the go you should do it using your own internal variables instead of recycling it through the environment and having the program threads repeatedly getenv() the updated values.
Using environment variables for global mutable state isn't exactly good practice, is it?<p>I can't think of any time I had wanted to do that.<p>What exactly are the programs that break if this changes?
I may misunderstand what “Extension to the ISO C standard” can mean, but <i>getenv</i> isn’t thread-safe, either.<p><a href="https://pubs.opengroup.org/onlinepubs/9699919799/" rel="nofollow noreferrer">https://pubs.opengroup.org/onlinepubs/9699919799/</a>:<p><i>“The getenv() function need not be thread-safe”</i><p>I expect most if not all implementations are more robust.
So you make it thread-safe. Now what? Just because you don't get a data-race or undefined behavior, it doesn't make setenv/getenv usable across threads without any synchronization anyway.<p>My take on it is that global mutable state is owned by the application, library code should never ever mutate it. Applies to the environment variables, stdout/stderr, locale.<p>The application must ensure that when these are mutated they are not read concurrently by an other thread. As external libraries rarely document the exact conditions when they read environment variables, the best is to only update the environment when no other thread is running. The absolute best is to avoid mutating it altogether.
Interestingly, Windows developers made much better choices back in the 1990-s.<p>GetEnvironmentStrings() API comes with the FreeEnvironmentStrings() counterpart. Whoever calls GetEnvironmentStrings() to get the entire environment is then responsible to call FreeEnvironmentStrings(), which allows thread-safe GetEnvironmentStrings() API.<p>GetEnvironmentVariable() API is even simpler, it doesn’t return a pointer, instead it fills a caller-provided buffer.
> Since many libraries are configured through environment variables, a program may need to change these variables to configure the libraries it uses. This is common at application startup. This causes programs to need to call setenv(). Given this issue, it seems like libraries should also provide a way to explicitly configure any settings, and avoid using environment variables.<p>This is the only correct solution. Even threadsafe, setenv doesn't guarantee anything about when the variable will take effect. There is no way for consumers to tell be notifided of a changed variable. For that you need guarantees from the library and at that point the library can just as well provide a better configuration interface. Keep environment variables static for the process lifetime and there are no issues.
Lots of C isn't thread safe, and that's the point. It's supposed to be close to the metal, and it's supposed to be small. If you want to add additional functionality then that's on the user/library.
If we add thread safe/memory safe versions of every function, then we're moving away from C and into Rust territory.<p>C is a powerful language, but it's not a language designed to hold your hand.<p>Edit: Especially in this case where the environment is maintained by the OS, that puts the onus of safety on the OS to ensure that different processes can't modify and read the env simultaneously.<p>If you're worried about reading and writing the env in different threads within the same process, then you need to reconsider your design.
WG14 also doesn't want to provide safer string and array manipulation libraries for decades, even Dennis Ritchie failed to get fat pointers into ISO C, why should fixing this be any different?
Everytime I hear someone lament about something not being thread safe, what I actually hear is - I want this shared invariant global state to be modifiable but it isn't. Which makes me ask the question why would you want that ?<p>The process environment should not be the mechanism for threads to communicate with each other.
If you're doing setenv in multiple threads in parallel and can't afford the overhead of wrapping it in a mutex or whatever, then you're clearly doing something wrong...
>The argument is that the specification clearly documents that setenv() cannot be used with threads. Therefore, if someone does this, the crashes are their fault.<p>Oh, C is taking the php approach :)
I wonder if you could work around this by using LD_PRELOAD to load in a shim around get_env and set_env. You'd still have the problem of environ potentially getting mutated, but it very well may solve the problem if it's limited to those two functions.
On Windows you can use GetEnvironmentVariable/SetEnvironmentVariable (on XP and later), which do implement some locking and doesn't run into this issue because GetEnvironmentVariable copies the data out into a caller-supplied buffer. getenv_s was a nice effort, but it failed.<p>I don't really understand why other languages such as Go and Rust decided to call the weird POSIX API rather than implementing their own API, which matches the semantics they expect. In cross platform C you'll be stuck with the outdated POSIX API design, but there's no reason why other languages should accept those same limitations.<p>We're not running on PDP-11s anymore. You can afford a thread-safe hash map in your standard library. Ignore the limitations of the old C library. Twenty years ago, Microsoft released a better API, keep the crashy old API with tons of deprecation warnings (hell, add a compiler flag --enable-broken-c-api-designs) and just provide new APIs that are actually usable in modern programming environments.
I'm glad TFA mentions Solaris/Illumos' implementation, so I don't have to.<p>Java doesn't allow one to set environment variables for the running process, but it does allow setting env vars for processes being spawned. It would be better if all C libraries did what Illumos' does.
Changing any kind of global state is fundamentally not thread safe.<p>Sure you could use locks, stop the world, etc, but there is no way you can ensure that all the data and information you had derived from the old state is going to be valid.<p>A better solution is to not rely on global state like this.
> My understanding is the people responsible for the Unix POSIX standards did not like the design of these functions, so they refused to implement them.<p>And herein lies the actual issue: C has a sh*tton of API issues in its standard library, and people <i>really</i> want to fix as many of them as possible whenever possible, but doing so will destabilize the standard so most of them won't ever be accepted. In the case of Annex K many clearly felt that the size-restricted API alone is not enough, because it is still easy to desynchronize the buffer and the allocated size, and it's a good point if we ignore an obvious counterpoint of the lack of safe `getenv` alternatives in the standard library at all... I wonder about the alternative universe where we have two distinct standards for the C language and C standard library so that the library standard is much easier to fix and adapt.
The problem isn't setenv() so much as getenv returns raw pointers to the data structure that is potentially being manipulated.<p>But this is/was strictly a glibc/Linux bug because maintainers in the past didn't want to improve the situation (read add locks which weren't 100% reliable) nor add thread safe version of the calls: ex getenv_s/getenv_r as pretty much every other POSIX compliant system has done.<p>And so the situation I hit many years ago was a proprietary library doing setenv's before fork() has now been fixed (and me calling those routines from multiple threads), and setenv() on linux/glibc is now working if it's built with locking:<p><a href="https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/setenv.c;h=cc71287fcc2726701ac13cbfc402979e06a0d29d;hb=HEAD#l133" rel="nofollow noreferrer">https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/sete...</a><p>So the remaining issue is making the environment thread safe, which means adding a getenv_r/s call and assuring its being used everywhere, which is probably a more complex problem than tossing the setenv() lock. But then in the setenv/fork case the forked process is a crapshoot whether it gets the "right" environment. In my case above it didn't really matter because the library was doing the equivalent of `export YOURACHILD=1` so the value wasn't being changed from invocation to invocation.<p>But there are dozens and dozens of other similar gochas in the spec, where error conditions or threading races exist and aren't noticeable until one understands how it is being implemented. There isn't a way to fix it with the library itself because many of the calls need more stored context space (ala win32 handles). So one ends up doing things like building serialization locks into the application to assure certain subsets of the posix/c/etc libraries aren't being called in parallel. (in this case getenv/setenv/fork).
> First, we can make a thread-safe implementation, like Illumos/Solaris. This has some limitations: it leaks memory in setenv(), and is still unsafe if a program uses putenv() or the environ variable.<p>No, `putenv()` in Solaris/Illumos is thread-safe too. And the program should NOT write to `environ`, but as it happens Solaris/Illumos allows that too (but it can cause a lock to be taken in `getenv()`.<p>> Add a function to copy one single environment variable to a user-specified buffer, similar to getenv_s().<p>Because Windows has one? Fine, but adding a version that allocates the copy would be more convenient.<p>BTW, HN is having problems that are leading to dup comments. I get "We're having some trouble serving your request. Sorry!", but the action happens anyways.
Is anyone aware of a language that simply does not have globals like this? Or globals at all? The more I deal with other people's code, the more I want one.<p>Obviously there are some semantic-globals (ports, env, main thread, etc) that are unavoidable, but we have a way to deal with that: dependency injection. Allow it in main, everything else has zero access unless it is given the instance representing it.<p>Obviously that would be pretty painful in practice without some boilerplate-reducing tools, but... would it be worse <i>in aggregate</i>? Or would having <i>real</i> control over all this at last pay off? I'm quite curious.
I am trying to make sense of the argument of pushing configuration into a library:<p><pre><code> * if the library is just a dependency, the Linux loader will set it up. It will have the same environment as the other libraries and as the main program.
* if the library is set up by dlopen(), there is no way to provide an environment pointer
</code></pre>
Altering the global environment variable for child processes makes no sense, for<p><pre><code> execve()
</code></pre>
accepts an<p><pre><code> char* envp[]
</code></pre>
. So I guess we need to talk about issues with a specific use case of<p><pre><code> dlopen()</code></pre>
> First, we can make a thread-safe implementation, like Illumos/Solaris. This has some limitations: it leaks memory in setenv(), and is still unsafe if a program uses putenv() or the environ variable.<p>No, `putenv()` in Solaris/Illumos is thread-safe too. And the program should NOT write to `environ`, but as it happens Solaris/Illumos allows that too (but it can cause a lock to be taken in `getenv()`.
getenv/setenv is also part of the library, not the language. If it presents a problem for a particular program, it's easy enough to implement your own variety that behaves as you need, or to wrap the library call in something that provides the needed thread safety.<p>This seems like a bit of a tempest in a teapot to me.
>We should apparently read every function's specification carefully, not use software written by others, and not use threads. These are unrealistic assumptions in modern software.<p>The first of the three listed items you should certainly do. I hope this author is not writing medical software, or anything important.
Doesn’t the essay contradict itself?<p>It states that glibc “never free[s] environment variables], but then goes on to state<p>> [in glibc] if a thread calling setenv() needs to resize the array of pointers, it copies the values to a new array and frees the previous one<p>Since envvars cause crash under glibc, I assume the initial assertion is incorrect.
We need to fork libc/posix, name it like `libc2` etc. fix versioning and start making changes, while allowing legacy software deal with the old cruft.<p>It never gonna happen if we're waiting for some committee to deal with it, because they will be too afraid of breaking backward compatibility.
This is the difference between Rust and C like languages, one optimizes for the usual case and has a simple implementation and the other goes to great lengths to be correct in corner cases at the cost of a lot of complexity.
Yeah, it’s lame. Linters and static analyzers should probably warn in addition to updating as much living documentation as possible.<p>And, idk, suggest something like a method with a singleton mutex for getting/setting?
A lot of comments in here seem to be saying that "setenv is rarely used, so we can make setenv threadsafe even if it is costly", but I think that's missing the point.<p>The thread-safety is in the operators getenv/setenv (and putenv and unsetenv). Thread safety has to apply to all operators, and the functionality of getenv() (which is by far the most commonly used of these operators) is what has to be fixed.<p>You simply cannot make setenv() thread-safe so long as getenv() has its current interface. You need to make getenv() safe first; ideally with a getenv_r() call that fills in a user-supplied buffer. From that point making the rest of the calls thread safe is trivial.
I don't see the problem here. I don't see why you need setenv. Please tell me why it's used in conjunction with pthreads. I think whoever is doing this is designing their software under wrong assumptions.<p>Can you read/write to same fd socket across threads? No? So what's the issue then?
Hot take: if a crash can be made to disappear just by adding a mutex inside setenv(), this means code reading the environment is racing with code writing to it, and in this situation adding a mutex inside setenv() will generally make things worse instead of better: it may hide the immediate symptom, but the underlying race between reading and writing to the environment remains, your program will behave differently each run depending on who wins the race (with potentially catastrophic results depending on what the writes are doing), and the cause will be much harder to debug from cold due to the lack of a smoking gun pointing at environment manipulation.<p>The multithreaded program needs to be restructured so that the parts that communicate via the environment are properly serialised with respect to each other, just as would be needed for any other communication via global state and/or access to a shared resource.<p>This has to happen at a higher level than the individual getenv/setenv calls: entire blocks of logic containing the calls need to be made atomic (or otherwise refactored; perhaps you could do all the environment writes before spawning any threads) so that no other thread can blow away the environment contents in between the code that sets it up for some purpose and the code that implements that purpose; and once this is properly done, the individual calls themselves do not need further protection.
the env is unix. It's not C's job to fix. Keep studying it till you understand that.<p>and don't forget, you are using unix because it defeated all the other options, because it was better and they were worse, so also keep studying till you understand why that is too.<p>then this problem with env will fix itself.<p>unix gives you tools to handle threads. C gives you tools to handle threads. Learn them, use them.
I don't quite get why this is a problem per se.<p>I mean, the environment is just a chunk of memory made available to the process by the OS. It is no more, no less thread safe than any other chunk of memory.<p>Why would the libc need to protect it more than any other memory location?
Much ado about nothing. If you have to set environment, do it at the beginning of a program, before threading. When you execute an application, pass a new environment, without setting it. There's really no need to set environment from threads.<p>The only use case where this bug happens seems to be threaded programs that load libraries <i>after threading has initialized</i>, and want to configure those libraries with environment variables, that the user/parent program hasn't specified, rather than calling their APIs with specific arguments. If a library provides no way to set some option other than environment variables, their API is simply incomplete and needs fixing. An incomplete library is not a good enough reason to amend the C standard.