TechEcho

7 comments

DannyBeeabout 12 years ago

Err, "GCC does this optimization, because strlen is a "built-in function:"No. I wrote the optimization pass that does this (GVN PRE with SCC based value numbering).It does it to any pure/const function. Const ones no matter what, and pure ones if it can prove the global memory state does not otherwise change between calls in a way that impacts that pure call (IE there are times it can prove the calls that happen in between don't matter, and will still do the elimination). You don't usually have to mark the functions const/pure if they are visible to GCC, it does interprocedural analysis to prove they are pure/const and mark them for you.It will even do it through function pointers if the value numbering or alias analysis can prove what they point to, or that they don't change in between the calls.<pre><code> double cos (double) __attribute__ ((const)); double sin (double) __attribute__ ((const)); double f(double a) { double b; double c,d; double (*fp) (double) __attribute__ ((const)); /* Partially redundant call */ if (a < 2.0) { fp = sin; c = fp (a); } else { c = 1.0; fp = cos; } d = fp (a); return d + c; } </code></pre> In this example, it will eliminate the unconditional fp(a) call at the end by reusing the value of c in the first if block, and storing the result of calling cos(a) in the else block. IE:<pre><code> double cos (double) __attribute__ ((const)); double sin (double) __attribute__ ((const)); double f(double a) { double b; double c,d; double (*fp) (double) __attribute__ ((const)); /* Partially redundant call */ if (a < 2.0) { fp = sin; c = fp (a); temp = c } else { c = 1.0; fp = cos; temp = cos(a) } d = temp; return d + c; } </code></pre> (attribute pure is ignored on function pointers, so you can't just s/const/pure/ and expect it to work)Loop code hoisting is actually a special case of this partial redundancy elimination.<pre><code> double sin (double) __attribute__ ((const)); double f (double a) { int i; double c, d; double (*fp) (double) __attribute__ ((const)); fp = sin; for (i = 0; i < 50; i++) { c = fp (a); } d = fp (a); return d + c; } </code></pre> The call to fp(a) in the loop will be hoisted above the loop through redundancy elimination, the call d = fp(a) will be eliminated in favor of that hoisted value.Basically, the optimization is a lot more powerful than he describes.

评论 #5508184 未加载

LeafStormabout 12 years ago

This reminds me of something my assembly instructor said in class once (paraphrased):"Once, I was interested in finding out how a C compiler would handle the div instruction. So, I opened up my editor, and wrote this C program:<pre><code> void main () { printf("%f", 14.0 / 3.0); } </code></pre> I compiled it, and took a look at the generated assembly. How many div instructions do you think it used?[Various guesses, mostly "one."]The correct answer is "zero." The compiler figured it out at compile time, and put a constant in instead. So, I decided to put it in a function:<pre><code> float divide (float a, float b) { return a / b; } void main () { printf("%f", divide(a, b)); } </code></pre> Guess what happened next.[Various guesses.]The f--king compiler optimized it out again!"

评论 #5507482 未加载

评论 #5507740 未加载

评论 #5507046 未加载

评论 #5508277 未加载

praptakabout 12 years ago

I did some similar experiments myself a long time ago. Something mildly surprising for me at that time was that this got optimized to a single roll instruction:<pre><code> uint32_t roll(uint32_t const x) { return (x << 15) | (x >> 17); } </code></pre> I haven't managed to make GCC generate a non-constant roll (i.e. (x << c) | (x >> (32-c)) ), probably because it couldn't infer that c is in the [0,31] range.

评论 #5507196 未加载

评论 #5507351 未加载

ISLabout 12 years ago

Question 4 points at a huge surprise. With a scientific computing project, I got (if I recall correctly), correct and expected output at no optimization and -O1, and patently incorrect output at -O2 and -O3.Cost: one sleepless night in college."What do you mean, optimization doesn't just make it faster?!?"

评论 #5509206 未加载

评论 #5509126 未加载

octo_tabout 12 years ago

I am surprised at number 3, I would have thought that it would be optimised to x << 1, rather than the addition?Edit: No. 5 explains why.

评论 #5507159 未加载

simarpreet007about 12 years ago

What an interesting read. Thanks OP. I got stumbled on the floating point question. I remember a few weeks ago in class our instructor taught us about floating point precision loss and I thought the compiler might take care into that and not optimize it, but I guess I was wrong.Would you explain a bit more on that? I'm really interested! Thanks!

gus_massaabout 12 years ago

Previous submission: <a href="https://news.ycombinator.com/item?id=1540567" rel="nofollow">https://news.ycombinator.com/item?id=1540567</a> (335 points, 989 days ago, 61 comments)

7 comments

DannyBeeabout 12 years ago

评论 #5508184 未加载

LeafStormabout 12 years ago

评论 #5507482 未加载

评论 #5507740 未加载

评论 #5507046 未加载

评论 #5508277 未加载

praptakabout 12 years ago

评论 #5507196 未加载

评论 #5507351 未加载

ISLabout 12 years ago

评论 #5509206 未加载

评论 #5509126 未加载

octo_tabout 12 years ago

I am surprised at number 3, I would have thought that it would be optimised to x << 1, rather than the addition?Edit: No. 5 explains why.

评论 #5507159 未加载

simarpreet007about 12 years ago

gus_massaabout 12 years ago

Previous submission: <a href="https://news.ycombinator.com/item?id=1540567" rel="nofollow">https://news.ycombinator.com/item?id=1540567</a> (335 points, 989 days ago, 61 comments)

Will It Optimize? See how well you can anticipate gcc's optimizer (2010)

7 comments

Will It Optimize? See how well you can anticipate gcc's optimizer (2010)

7 comments