When static makes your C code 10 times faster

201 pointsby rostayobalmost 4 years ago

28 comments

ddulaneyalmost 4 years ago

This can be accomplished more simply and reliably by marking modulus as const. The compiler currently has to reason about the whole compilation unit to determine that modulus is not modified, which works. However, if future code modifies modulus (either on purpose or accidentally) or something changes that prevents the compiler from performing global reasoning, the optimization will be lost. By marking the actual intention, any modifications of modulus turn into compiler errors. Plus, if it becomes important to expose modulus to another compilation unit, now that's possible.This is a common issue with C code (including lots of code I've written). It's really easy to forget to const something, which forces the compiler to do global reasoning or to generate worse code. I've gotten into the habit of making things const unless I know I plan on mutating them, but I wish there was tooling that encouraged it. (BTW, this is something Rust does well by making things constant by default and requiring "mut" if it's mutable.)

评论 #27730363 未加载

评论 #27733306 未加载

评论 #27730347 未加载

评论 #27731082 未加载

评论 #27732071 未加载

评论 #27742616 未加载

评论 #27732276 未加载

EdSchoutenalmost 4 years ago

Back in 2012 I observed exactly the same thing. I actually managed to get a warning for this added to Clang, called -Wmissing-variable-declarations. When set, warnings are generated in case a non-static global variable is defined without an external declaration that precedes it.I worked on this as part of FreeBSD, which is why their base system is nowadays built with that flag enabled.

评论 #27731353 未加载

GlitchMralmost 4 years ago

Without `static`, compiler exports a symbol.<pre><code> $ cat value.c int value = 42; int get_value() { return value; } $ make value.o gcc -c -o value.o value.c $ nm value.o 0000000000000000 T get_value U _GLOBAL_OFFSET_TABLE_ 0000000000000000 D value </code></pre> This symbol, not being `const` can be modified by any other compilation unit.<pre><code> $ cat main.c #include <stdio.h> int value; int get_value(); int main() { value = 123456789; printf("%d\n", get_value()); } $ make main.o gcc -c -o main.o main.c $ cc value.o main.o $ ./a.out 123456789 </code></pre> Compiler when generating an object file has to assume the value of exported non-const symbol can change. It's necessary to tell the compiler that the value cannot change, either by not exporting the symbol by using `static` or making the value of it `const`. In example provided in your article `static` makes sense (or even `static const`) as I don't think there is a reason to export this global.

评论 #27731322 未加载

评论 #27731345 未加载

评论 #27742634 未加载

codefloalmost 4 years ago

I agree with many of the sibling comments, static vs. non-static is almost a complete red-herring.Static only means that the variable is local to the translation unit (the C file). The relevant difference in the example is actually the const-ness of the variable, which you may put explicitly, but which a powerful compiler can also infer here in the static case.Other than this optimization possibility, const and static are orthogonal concepts. I'm not sure to what extent the article author is aware of this.So the lesson should be: use const (or #define) if you mean to have a constant. It's still a good idea to also make things static, but the real reason for that is to avoid name collisions with variables in other C files.

评论 #27731946 未加载

kahlonelalmost 4 years ago

I think a simple “const” would have also done the trick. Sometimes -O3 is clever enough to figure out that the value is never written, so it makes it an asm constant.

评论 #27730256 未加载

评论 #27731954 未加载

rostayobalmost 4 years ago

Author here -- just to be clear, I agree that marking the variable as const is the "right" thing to do here. I reported the investigation as-is because removing a static declaration made the code slower, which I then narrowed down to the isolated bit of code.

wyldfirealmost 4 years ago

We have learned so much over the decades of using C. Lowest-visibility-by-default is just one of the many good choices of Rust and other C successors. The example here is for codegen benefits but reducing visibility benefits encapsulation too.

sdfdf4434r34ralmost 4 years ago

As an embedded programmer working on small micros I make every single function static (including third party code, which I modify and bring into the source tree and curate myself), gives you global/link time optimizations and dead code elimination for free, leads to better code even at -O0, -Og and -O1, static const configuration structs/values gets optimized away, and so on.Really wish it was the default.

评论 #27732219 未加载

评论 #27731990 未加载

nicetryguyalmost 4 years ago

Const / static allows for "immediate" ASM instruction generation: which means the value is known at compile time so it can compare it directly inline as opposed to the overhead of comparing it to a labeled memory address. It's generally good practice whenever possible.

评论 #27731717 未加载

评论 #27731964 未加载

fallingfrogalmost 4 years ago

Wouldn’t making that const or using #define be a bit cleaner?Honestly if I as the programmer knew that I was really trying to select bits from a number I’d just use a binary and directly. In that specific situation I think the intent is more clear that way. Like://select the bottom 8 bitsunsigned bottom8 = val & 0xff;

nyc_pizzadevalmost 4 years ago

My first C intuition would be defining this value as a macro. If I was in C++, then a const would make sense.

评论 #27735529 未加载

einpoklumalmost 4 years ago

Title rephrase:When the compiler can assume your values don't change magically, it can optimize their use.This is true for restricted pointers, for global-scope variables which can only be accessed in the same translation unit, for stuff in inlined functions (often), etc.--------------------------------------------const is a bit shifty. const makes the compiler restrict what it allows you to write, but it can still not really assume other functions don't break constness via casting:<pre><code> void i_can_change_x_yeah_i_can_just_watch_me(const int* x) { *(int*) x = x + 1; } </code></pre> now, if the compiler sees the code, then fine (maybe), but when all you see is:<pre><code> void sly(const int* x); </code></pre> You can't assume the value pointed to by x can change. See this on GodBolt: <a href="https://godbolt.org/z/fGEMj9Meo" rel="nofollow">https://godbolt.org/z/fGEMj9Meo</a>and it could well be the same for constants too. But somehow it isn't:<a href="https://godbolt.org/z/fqGzh7o8z" rel="nofollow">https://godbolt.org/z/fqGzh7o8z</a>

评论 #27732500 未加载

Hello71almost 4 years ago

static can also make your code 10 times smaller: note that in the linked godbolt, there are actually two copies of both loop functions: one regular, and one inlined. this is because the compiler wants to inline the function, but is required to generate an additional one in case someone else will be calling it. what's more, at least on Linux, this copy cannot be removed from the final executable even if nobody calls it, unless a) the compilation is done with -ffunction-sections and the linking is done with --gc-sections, or b) LTO is enabled. adding static to the function declaration resolves this issue.the situation is even worse with ELF dynamic libraries due to the interaction of two rules: a) by default, all functions are exported, and b) by default, all functions can be interposed, e.g. by LD_PRELOAD. here, if you specify -fPIC in the compilation arguments (as is required to produce a modern dynamic library), inlining is totally disabled. for small functions, the call overhead can be substantial.

lr4444lralmost 4 years ago

Great write up: a precise problem that digs into the internals pointedly to teach a simple concept. This is exactly how I tell the junior devs where I work to do lunch talks that give people some concrete, memorable piece of learning that makes them better in their practical work.

arthur2e5almost 4 years ago

Link time optimization would enable a similar change even without a code edit. Using static is good, but it’s a good idea to figure out how to just let other people’s code run fast too.

评论 #27734313 未加载

malkiaalmost 4 years ago

In ideal world const, constexpr, explicit (for constructors), and no default implicit conversions (and others that I'v missed) should've been the default...

alok-galmost 4 years ago

I think this is beyond simply making the variable static/constant. It is the specific value of the constant that is allowing the division to be substituted with bitwise AND, which then makes it so much faster. I wonder how much the speedup would be if some other near-random value is there for the constant (which is likely beyond the purpose at hand).

评论 #27733461 未加载

simiasalmost 4 years ago

If I could travel back in time I'd tell Dennis to make "static" the implicit default, and have a special keyword like "public" or "export" for items that are meant to be accessible from outside the compilation unit.I'd also ask him to make "switch" break by default.Then I'd go kill Hitler or something.

评论 #27730584 未加载

评论 #27735556 未加载

评论 #27731286 未加载

评论 #27731848 未加载

评论 #27731253 未加载

bcrlalmost 4 years ago

More programmers really should have a look at what sort of assembly the compiler generates for their code. Compilers aren't magic, and seeing what sort of code it generates does give authors more insight into how concise their code truly is.

IvanK_netalmost 4 years ago

I think every programmer should know, that logical AND is many times faster than Modulus (which is at least as hard as a division), and use & instead of % right in his code for powers of two (and not expect it to be done by a compiler).

评论 #27730577 未加载

评论 #27730767 未加载

daneel_walmost 4 years ago

Use a const instead. It's the right tool for the job.

peter_d_shermanalmost 4 years ago

>"When modulus is static, gcc / clang know that it is private to the current compilation unit, and therefore they can inline the value itself. Then, they turn the expensive div into a much cheaper and – sincemod’ing by a power of two -- is equal to bitwise and of that number minus one!All you need to do is keep the bits lower than that power of two, which is what the and will do."

jherikoalmost 4 years ago

the biggest win here is informing the compiler sufficiently to swap out div for and.the use of static is just a tool to inform the compiler that the value is a constant (which const /might/)

jherikoalmost 4 years ago

really? people mentioned const?you can tell how much low-level optimisation they have done if they think its gonna change codegen reliably, or at ll.

评论 #27731965 未加载

midjjialmost 4 years ago

Its better to make it constexpr.

synergy20almost 4 years ago

use const is a better choice, I changed static to const, the result is the same here.

jokoonalmost 4 years ago

So the compiler cannot always optimize...

apialmost 4 years ago

Division is slow, which is something most programmers don't know. If you can binary AND instead of MOD this can be a huge win.Multiplication is also very fast, usually one or two cycles on larger chips.

评论 #27733659 未加载

评论 #27731233 未加载

28 comments

ddulaneyalmost 4 years ago

评论 #27730363 未加载

评论 #27733306 未加载

评论 #27730347 未加载

评论 #27731082 未加载

评论 #27732071 未加载

评论 #27742616 未加载

评论 #27732276 未加载

EdSchoutenalmost 4 years ago

评论 #27731353 未加载

GlitchMralmost 4 years ago

评论 #27731322 未加载

评论 #27731345 未加载

评论 #27742634 未加载

codefloalmost 4 years ago

评论 #27731946 未加载

kahlonelalmost 4 years ago

I think a simple “const” would have also done the trick. Sometimes -O3 is clever enough to figure out that the value is never written, so it makes it an asm constant.

评论 #27730256 未加载

评论 #27731954 未加载

rostayobalmost 4 years ago

wyldfirealmost 4 years ago

sdfdf4434r34ralmost 4 years ago

评论 #27732219 未加载

评论 #27731990 未加载

nicetryguyalmost 4 years ago

评论 #27731717 未加载

评论 #27731964 未加载

fallingfrogalmost 4 years ago

nyc_pizzadevalmost 4 years ago

My first C intuition would be defining this value as a macro. If I was in C++, then a const would make sense.

评论 #27735529 未加载

einpoklumalmost 4 years ago

评论 #27732500 未加载

Hello71almost 4 years ago

lr4444lralmost 4 years ago

arthur2e5almost 4 years ago

Link time optimization would enable a similar change even without a code edit. Using static is good, but it’s a good idea to figure out how to just let other people’s code run fast too.

评论 #27734313 未加载

malkiaalmost 4 years ago

In ideal world const, constexpr, explicit (for constructors), and no default implicit conversions (and others that I'v missed) should've been the default...

alok-galmost 4 years ago

评论 #27733461 未加载

simiasalmost 4 years ago

评论 #27730584 未加载

评论 #27735556 未加载

评论 #27731286 未加载

评论 #27731848 未加载

评论 #27731253 未加载

bcrlalmost 4 years ago

IvanK_netalmost 4 years ago

评论 #27730577 未加载

评论 #27730767 未加载

daneel_walmost 4 years ago

Use a const instead. It's the right tool for the job.

peter_d_shermanalmost 4 years ago

jherikoalmost 4 years ago

the biggest win here is informing the compiler sufficiently to swap out div for and.the use of static is just a tool to inform the compiler that the value is a constant (which const /might/)

jherikoalmost 4 years ago

really? people mentioned const?you can tell how much low-level optimisation they have done if they think its gonna change codegen reliably, or at ll.

评论 #27731965 未加载

midjjialmost 4 years ago

Its better to make it constexpr.

synergy20almost 4 years ago

use const is a better choice, I changed static to const, the result is the same here.