This can be accomplished more simply and reliably by marking modulus as const. The compiler currently has to reason about the whole compilation unit to determine that modulus is not modified, which works. However, if future code modifies modulus (either on purpose or accidentally) or something changes that prevents the compiler from performing global reasoning, the optimization will be lost. By marking the actual intention, any modifications of modulus turn into compiler errors. Plus, if it becomes important to expose modulus to another compilation unit, now that's possible.<p>This is a common issue with C code (including lots of code I've written). It's really easy to forget to const something, which forces the compiler to do global reasoning or to generate worse code. I've gotten into the habit of making things const unless I know I plan on mutating them, but I wish there was tooling that encouraged it. (BTW, this is something Rust does well by making things constant by default and requiring "mut" if it's mutable.)
Back in 2012 I observed exactly the same thing. I actually managed to get a warning for this added to Clang, called -Wmissing-variable-declarations. When set, warnings are generated in case a non-static global variable is defined without an external declaration that precedes it.<p>I worked on this as part of FreeBSD, which is why their base system is nowadays built with that flag enabled.
Without `static`, compiler exports a symbol.<p><pre><code> $ cat value.c
int value = 42;
int get_value() {
return value;
}
$ make value.o
gcc -c -o value.o value.c
$ nm value.o
0000000000000000 T get_value
U _GLOBAL_OFFSET_TABLE_
0000000000000000 D value
</code></pre>
This symbol, not being `const` can be modified by any other compilation unit.<p><pre><code> $ cat main.c
#include <stdio.h>
int value;
int get_value();
int main() {
value = 123456789;
printf("%d\n", get_value());
}
$ make main.o
gcc -c -o main.o main.c
$ cc value.o main.o
$ ./a.out
123456789
</code></pre>
Compiler when generating an object file has to assume the value of exported non-const symbol can change. It's necessary to tell the compiler that the value cannot change, either by not exporting the symbol by using `static` or making the value of it `const`. In example provided in your article `static` makes sense (or even `static const`) as I don't think there is a reason to export this global.
I agree with many of the sibling comments, static vs. non-static is almost a complete red-herring.<p>Static only means that the variable is local to the translation unit (the C file). The relevant difference in the example is actually the const-ness of the variable, which you may put explicitly, but which a powerful compiler can also infer here in the static case.<p>Other than this optimization possibility, const and static are orthogonal concepts. I'm not sure to what extent the article author is aware of this.<p>So the lesson should be: use const (or #define) if you mean to have a constant. It's still a good idea to also make things static, but the real reason for that is to avoid name collisions with variables in other C files.
I think a simple “const” would have also done the trick. Sometimes -O3 is clever enough to figure out that the value is never written, so it makes it an asm constant.
Author here -- just to be clear, I agree that marking the variable as const is the "right" thing to do here. I reported the investigation as-is because removing a static declaration made the code slower, which I then narrowed down to the isolated bit of code.
We have learned so much over the decades of using C. Lowest-visibility-by-default is just one of the many good choices of Rust and other C successors. The example here is for codegen benefits but reducing visibility benefits encapsulation too.
As an embedded programmer working on small micros I make every single function static (including third party code, which I modify and bring into the source tree and curate myself), gives you global/link time optimizations and dead code elimination for free, leads to better code even at -O0, -Og and -O1, static const configuration structs/values gets optimized away, and so on.<p>Really wish it was the default.
Const / static allows for "immediate" ASM instruction generation: which means the value is known at compile time so it can compare it directly inline as opposed to the overhead of comparing it to a labeled memory address. It's generally good practice whenever possible.
Wouldn’t making that const or using #define be a bit cleaner?<p>Honestly if I as the programmer knew that I was really trying to select bits from a number I’d just use a binary and directly. In that specific situation I think the intent is more clear that way. Like:<p>//select the bottom 8 bits<p>unsigned bottom8 = val & 0xff;
Title rephrase:<p>When the compiler can assume your values don't change magically, it can optimize their use.<p>This is true for restricted pointers, for global-scope variables which can only be accessed in the same translation unit, for stuff in inlined functions (often), etc.<p>--------------------------------------------<p>const is a bit shifty. const makes the compiler restrict what it allows you to write, but it can still not really assume other functions don't break constness via casting:<p><pre><code> void i_can_change_x_yeah_i_can_just_watch_me(const int* x)
{
*(int*) x = x + 1;
}
</code></pre>
now, if the compiler sees the code, then fine (maybe), but when all you see is:<p><pre><code> void sly(const int* x);
</code></pre>
You can't assume the value pointed to by x can change. See this on GodBolt: <a href="https://godbolt.org/z/fGEMj9Meo" rel="nofollow">https://godbolt.org/z/fGEMj9Meo</a><p>and it could well be the same for constants too. But somehow it isn't:<p><a href="https://godbolt.org/z/fqGzh7o8z" rel="nofollow">https://godbolt.org/z/fqGzh7o8z</a>
static can also make your code 10 times smaller: note that in the linked godbolt, there are actually two copies of both loop functions: one regular, and one inlined. this is because the compiler wants to inline the function, but is required to generate an additional one in case someone else will be calling it. what's more, at least on Linux, this copy cannot be removed from the final executable even if nobody calls it, unless a) the compilation is done with -ffunction-sections and the linking is done with --gc-sections, or b) LTO is enabled. adding static to the function declaration resolves this issue.<p>the situation is even worse with ELF dynamic libraries due to the interaction of two rules: a) by default, all functions are exported, and b) by default, all functions can be interposed, e.g. by LD_PRELOAD. here, if you specify -fPIC in the compilation arguments (as is required to produce a modern dynamic library), <i>inlining is totally disabled</i>. for small functions, the call overhead can be substantial.
Great write up: a precise problem that digs into the internals pointedly to teach a simple concept. This is exactly how I tell the junior devs where I work to do lunch talks that give people some concrete, memorable piece of learning that makes them better in their practical work.
Link time optimization would enable a similar change even without a code edit. Using static is good, but it’s a good idea to figure out how to just let other people’s code run fast too.
In ideal world const, constexpr, explicit (for constructors), and no default implicit conversions (and others that I'v missed) should've been the default...
I think this is beyond simply making the variable static/constant. It is the specific value of the constant that is allowing the division to be substituted with bitwise AND, which then makes it so much faster. I wonder how much the speedup would be if some other near-random value is there for the constant (which is likely beyond the purpose at hand).
If I could travel back in time I'd tell Dennis to make "static" the implicit default, and have a special keyword like "public" or "export" for items that are meant to be accessible from outside the compilation unit.<p>I'd also ask him to make "switch" break by default.<p>Then I'd go kill Hitler or something.
More programmers really should have a look at what sort of assembly the compiler generates for their code. Compilers aren't magic, and seeing what sort of code it generates does give authors more insight into how concise their code truly is.
I think every programmer should know, that logical AND is many times faster than Modulus (which is at least as hard as a division), and use & instead of % right in his code for powers of two (and not expect it to be done by a compiler).
>"When modulus is static, gcc / clang know that it is private to the current compilation unit, and therefore they can inline the value itself. Then, they turn the expensive <i>div</i> into a much cheaper <i>and</i> – since<p><i>mod’ing by a power of two -- is equal to bitwise and of that number minus one!</i><p>All you need to do is keep the bits lower than that power of two, which is what the and will do."
the biggest win here is informing the compiler sufficiently to swap out div for and.<p>the use of static is just a tool to inform the compiler that the value is a constant (which const /might/)
really? people mentioned const?<p>you can tell how much low-level optimisation they have done if they think its gonna change codegen reliably, or at ll.
Division is slow, which is something most programmers don't know. If you can binary AND instead of MOD this can be a huge win.<p>Multiplication is also very fast, usually one or two cycles on larger chips.