Pointers Are More Abstract Than You Might Expect in C

269 pointsby ognyankulevalmost 7 years ago

15 comments

gilgoomeshalmost 7 years ago

This scratches the surface of why I hope C slowly fades away as the default low-level language. C sounds simple when you look through K&R C. C lets you feel like you understand the stack, ALU and memory. A pointer is just an integer and I can manipulate it like an integer.But the reality is filled with a staggering number of weird special cases that exist because memory doesn't work like a simple flat address space; or the compiler needs to optimise field layouts, loops, functions, allocations, register assignments and local variables; or your CPU doesn't use the lower 4 bits or upper 24 bits when addressing.C has no way to shield common language constructs from these problems. So everything in the language is a little bit compromised. Program in C for long enough and you'll hit a lot of special special cases – usually in the worst way: runtime misbehavior. Type punning corruption, pointers with equal bit representations that are not equal, values that change mysteriously between lines of code, field offsets that are different at debug and release time.When using fundamental language constructs, we shouldn't need to worry about these ugly special cases – but C is built around these ideas. The need to specify memory layouts, direct memory access and other low level access should be gated by barriers that ensure the language's representation and the machine representation don't tread on each other's toes.Rust has a long way to go but is so much more robust at runtime. I think there's room for other languages to occupy a similar space but they're need to focus on no-std-lib no-runtime operation (not always the sexiest target).

评论 #17440764 未加载

评论 #17442030 未加载

评论 #17441607 未加载

评论 #17442328 未加载

评论 #17440648 未加载

评论 #17440668 未加载

评论 #17441455 未加载

评论 #17442297 未加载

评论 #17447554 未加载

评论 #17440818 未加载

评论 #17441114 未加载

评论 #17443344 未加载

评论 #17441052 未加载

评论 #17442508 未加载

tzsalmost 7 years ago

Note that if the two pointers are passed to a function, and the comparison is done in the function, the results are different:<pre><code> #include <stdio.h> void pcmp(int *p, int *q) { printf("%p %p %d\n", (void *)p, (void *)q, p == q); } int main(void) { int a, b; int *p = &a; int *q = &b + 1; printf("%p %p %d\n", (void *)p, (void *)q, p == q); pcmp(p, q); return 0; } </code></pre> That is giving me:<pre><code> 0x7ffebac1483c 0x7ffebac1483c 0 0x7ffebac1483c 0x7ffebac1483c 1 </code></pre> That's compiled with '-std=c11 -O1' as in the article. The result is the same of pcmp is moved into a separate file so that when compiling it the compiler has no knowledge of the origins of the two pointers.I don't like this at all. It bugs me that I can get different results comparing two pointers depending on where I happen to do the comparison.

评论 #17441500 未加载

评论 #17476162 未加载

评论 #17456529 未加载

评论 #17445622 未加载

评论 #17444149 未加载

bluecalmalmost 7 years ago

I like the behavior of the compiler here. There is no guarantee that a and b are next to each other in memory. That's why the comparison fails, the alternative makes is runtime/compiler/optimization level dependent which would be a total mess.As usual with those C bashing articles you won't run into trouble if you don't try very hard to write contrived code. I mean, the moment you see:<pre><code> int *q = &b + 1; </code></pre> on your screen alarm bells should go off. Doing pointer arithmetic on something that is not an array is asking for trouble. If the standard should be amended in any way it should be undefined behavior right away you do pointer arithmetic on non-array objects.

评论 #17441469 未加载

评论 #17441002 未加载

评论 #17440929 未加载

评论 #17441329 未加载

foxhillalmost 7 years ago

the comparison at the start is nonsense - there is no specification for the ordering or location of stack variables. by taking the address of these variables, you could see that they actually are the same value, and so intuitively you’d think they might be the same, but a different compiler might put them in different locations. or they may be elided entirely through optimisation. it’s far safer to fail the equality test in this case - this is what the model specifies.this is not even the first example of this counter-initiative behaviour. imagine two floating point values with exactly the same bit-representation. it is possible, without any trickery for them to fail an equality check - i.e, they are both NaN.this is what IEE754 demands of a compliant floating point implementation. and indeed, it’s a sane choice when you understand why it was made.similarly, it’s perfectly reasonable for this example to fail.

评论 #17441000 未加载

mpweiheralmost 7 years ago

clang is a bit more sane:<pre><code> > cc pointereq.c > ./a.out 0x7ffee83163b8 0x7ffee83163b8 1 > cc -O pointereq.c > ./a.out 0x7ffeeeb8b3b8 0x7ffeeeb8b3c0 0 </code></pre> So without optimization, the pointers are the same and compare as equal. With optimization, the pointers compare as not equal. At first that seemed horrible, until I saw the pointers actually are not the same. Since I don't recall any guarantees about stack layout, that seems perfectly fine.<pre><code> > cat pointereq.c #include <stdio.h> int main(void) { int a, b; int *p = &a; int *q = &b + 1; printf("%p %p %d\n", (void *)p, (void *)q, p == q); return 0; }</code></pre>

tzaholaalmost 7 years ago

There's nothing surprising in the first example. Comparing the addresses of stack variables is undefined behaviour.The second one is more interesting:<pre><code> extern int _start[]; extern int _end[]; void foo(void) { for (int *i = _start; i != _end; ++i) { /* ... */ } } </code></pre> GCC optimized "i != _end" into "true". The kernel guys fixed this by turning "_start" and "_end" into "extern int*". I always thought [] was just syntactic sugar over a regular pointer, but seems like I was wrong.

评论 #17441305 未加载

评论 #17442800 未加载

评论 #17442047 未加载

评论 #17441670 未加载

评论 #17444265 未加载

nuriaionalmost 7 years ago

Basically you have no guarantee where the pointer q is pointing to. Some compiler/static code analyzer will yell at you with this code.

评论 #17440809 未加载

crehnalmost 7 years ago

Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.Can someone explain to me the rationale behind this? Why not just "two pointers compare equal if they point to the same address"?

评论 #17444401 未加载

评论 #17443055 未加载

评论 #17442626 未加载

评论 #17442611 未加载

joostetoalmost 7 years ago

When I compile the example, i get:<pre><code> 0x7ffd0b57ebd0 0x7ffd0b57ebd8 0 </code></pre> OK, so gcc reordered a & b; I'll fix this by chaning the initialisation of p and q to:<pre><code> int *p = &a + 1; int *q = &b; </code></pre> But when I now run the example, I get:<pre><code> gcc -o c c.c && ./c 0x7ffe55eb0aa4 0x7ffe55eb0aa4 1 gcc -O -o c c.c && ./c 0x7ffcfeffd914 0x7ffcfeffd914 0 </code></pre> So, p==q only evaluates to 1 if optimisation is enabled.<pre><code> $ gcc --version gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0</code></pre>

评论 #17440703 未加载

oconnor663almost 7 years ago

> If we step back from the standard and ask our self does it make sense to compare two pointers which are derived from two completely unrelated objects? The answer is probably always no.The one big counterexample I can think of is the difference between memcpy and memmove. The latter is supposed to be able to do arithmetic on memory regions, to see if they overlap. Is this article saying that the standard C implementation of memmove is relying on unspecified behavior?

评论 #17445037 未加载

评论 #17444628 未加载

Sharlinalmost 7 years ago

> ...or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.I was not aware of this special case. What's the rationale? Is there even a way in standard C to guarantee that two array objects are laid out in memory like that, with no padding?

评论 #17443905 未加载

评论 #17443743 未加载

bsderalmost 7 years ago

Ummm, when I run this on gcc 7.3.0 on OS X I actually get:0x7fff5dbd89fc 0x7fff5dbd89fc 1Which kind of shoots the whole article in the foot ...

评论 #17450852 未加载

mabynogyalmost 7 years ago

Not related to the content but I find the colored links in the nav a simple and very good idea.

andyjohnson0almost 7 years ago

Just a datapoint: VS2017 on Win10 x64 gives me 00AFF89C 00AFF894 0, which is what I naively expect.

评论 #17440877 未加载

_RPMalmost 7 years ago

if you're adding 1 to a pointer, why would you expect it to be equal? I would not expect it to be equal.

评论 #17442310 未加载