Proposal for a Friendly Dialect of C

218 点作者 0x09超过 10 年前

17 条评论

Verdex超过 10 年前

I think it's interesting that the famous tech companies are all developing programming languages that are less managed than java/c#, but more predictable (read: less undefined behavior) than c/c++. Facebook seems interested in D, Apple has Swift, Microsoft has some sort of secret project they are working on, Google has Go, and Mozilla has Rust. Even c++ seems to be attempting to modernize with the new additions to it's spec. And now we see a desire for c itself to change. I wonder if our industry is at a turning point where managed languages aren't quite cutting it, but no one is comfortable going back to the 'good old days'.On a personal note, I like the idea of friendly C so much that I finally made an HN account. One of my favorite things to do is to take things apart and understand them. I was mortified when I learned the real meaning of undefined behavior in c/c++. It seems like the only way to be sure you understand a C program is to check the generated machine code. Even worse is that when I try to talk to other developers about undefined behavior, I tend to get the impression that they don't actually understand what undefined behavior means. I can't think of a way to verify what they think it means without insulting their intelligence, but hopefully the existence of something like friendly C will make it an easier discussion to have.

评论 #8236410 未加载

评论 #8236069 未加载

评论 #8241827 未加载

评论 #8246764 未加载

userbinator超过 10 年前

I really like these suggestions since they can be summed up in one sentence: they are what C programmers who write code with UB would already expect any reasonably sane platform would do. I think it's definitely a very positive change in attitude from the "undefined behaviour, therefore anything can happen" that resulted in compilers' optimisations becoming very surprising and unpredictable.Rather, we are trying rescue the predictable little language that we all know is hiding within the C standard.Well said. I think the practice of UB-exploiting optimisation was completely against the spirit of the language, and that the majority of optimisation benefits happen in the compiler backend (instruction selection, register allocation, etc.) At least as an Asm programmer, I can attest that IS/RA can make a huge difference in speed/size.The other nice point about this friendly C dialect is that it still allows for much optimisation, but with a significant difference: instead of basing it on assumptions of UB defined by the standard, it can still be done based on proof; e.g. code that can be proved to be unneeded can be eliminated, instead of code that may invoke UB. I think this sort of optimisation is what most C programmers intuitively agree with.

评论 #8236994 未加载

simias超过 10 年前

I can fit the proposed changes in two categories:* Changes that replace undefined behaviours with undefined values. This makes it easier to catch certain types of coding errors at the cost of certain kinds of optimizations.* Changes that remove undefined behaviours (wrapping arithmetic, memcpy/memmove, aliasing rules).I'm comfortable with the first kind, although you can already achieve something very similar to that with most compiler (as far as I know) by building with optimizations disabled. Also stuff like missing return values generates a warning in any compiler worth using, if you ignore that kind of warnings you can only blame yourself.The 2nd kind bothers me more, because it makes otherwise invalid C code valid in this dialect. I'm worried this makes things even more difficult to explain to beginners (and not so beginners, I still have to check the aliasing rules from time to time to make sure the code I'm writing is valid).Even if you're very optimistic this friendly C is not going to replace daddy anytime soon. There'll be plenty of C code out there, plenty of C toolchains, plenty of C environment where the definition of friendliness is having a dump of the registers and stack on the UART in case of an error. Plenty of environments where memcpy is actually memcpy, not memmove.For that reason I'd be much more in favour of advocating the use of more modern alternatives to C (and there are a bunch of those) rather than risking blurring the lines some more about what is and isn't undefined behaviour in C.

评论 #8234375 未加载

twoodfin超过 10 年前

Can someone give a rationalization of why a "friendly" dialect of C should return unspecified values from reading uninitialized storage? Is the idea that all implementations will choose "0" for that unspecified value and allow programmers to be lazy?I'd much rather my "friendly" implementation immediately trap. Code built in Visual C++'s debug mode is pretty reliable (and useful) in this regard.EDIT: It occurs to me that this is probably a performance issue. Without pretty amazing (non-computable in the general case?) static analysis, it would be impossible to tell whether code initializes storage in all possible executions, and using VM tricks to detect all uninitialized access at runtime is likely prohibitively expensive for non-debug code.

评论 #8234192 未加载

评论 #8234187 未加载

评论 #8234180 未加载

评论 #8234200 未加载

cousin_it超过 10 年前

Sometime ago I came up with a simpler proposal: emit a warning if UB exploitation makes a line of code unreachable. That refers to actual lines in the source, not lines after macroexpansion and inlining. Most "gotcha" examples with UB that I've seen so far contain unreachable lines in the source, while most legitimate examples of UB-based optimization contain unreachable lines only after macroexpansion and inlining.Such a warning would be useful in any case, because in legitimate cases it would tell the programmer that some lines can be safely deleted, which is always good to know.

评论 #8234350 未加载

DSMan195276超过 10 年前

The kernel uses -fno-strict-aliasing because they can't do everything they need to do by adhering strictly to the standard, it has nothing to do with it being to hard (The biggest probably being treating pointers as integers and masking them, which is basically illegal to do with C).IMO, this idea would make sense if it was targeted at regular software development in C (And making it easier to not shoot yourself in the foot). It's not as useful to the OS/hardware people though because they're already not writing standards-compliant code nor using the standard library. There's only so much it can really do in that case without making writing OS or hardware code more annoying to do then it already is.

otikik超过 10 年前

I haven't done C in at least a decade, so bare with me.> 1. The value of a pointer to an object whose lifetime has ended remains the same as it was when the object was alive> 8. A read from uninitialized storage returns an unspecified value.Isn't that already the case in C?> 3. Shift by negative or shift-past-bitwidth produces an unspecified result.What is the meaning of "an unspecified result" there?> 4. Reading from an invalid pointer either traps or produces an unspecified value.What is the meaning of "traps" here? Is it the same as later on ("math- and memory-related traps")?

评论 #8234104 未加载

评论 #8233928 未加载

评论 #8233976 未加载

评论 #8233951 未加载

评论 #8233938 未加载

revelation超过 10 年前

Plenty of architectures do not trap at null-pointer dereferencing (they don't have traps). Some (like AVR) are not arcane, they are one of the best excuses for still writing C nowadays.

评论 #8234458 未加载

评论 #8234599 未加载

jhallenworld超过 10 年前

I have strong feelings that the C standard (and, by extension, C compilers) should directly support non-portable code. It means many behaviors are not "undefined"- instead they are "machine dependent". Thus overflow is not undefined- it is _defined_ to depend on the underlying architecture in a specific way.C is a more useful language if you can make machine specific code this way.I'm surprised that some of the pointer math issues come up. Why would the compiler assume that a pointer's value is invalid just because the referenced object is out of scope? That's crazy..Weird results from uninitialized variables can sometimes be OK. I would kind of accept strange things to happen when an uninitialized bool (which is really an int) is not exactly 1 or 0.Perhaps a better way to deal with the memcpy issue is this: make memcpy() safe (handles 0 size, allows overlapping regions), but create a fast_memcpy() for speed.

评论 #8235363 未加载

Roboprog超过 10 年前

Back in 1990, it didn't take long to figure out that (Borland) Turbo Pascal was much less insane than C. Unfortunately, it only ran on MS-DOS & Windows, whereas C was everywhere.Employers demanded C programmers, so I became a C programmer. (now I'm a Java programmer, for the same reason, and think it's also a compromised language in many ways)For anybody who is willing to run a few percent slower so that array bounds get checked, there is now an open source FreePascal environment available, so as not to be dependent on the scraps of Borland that Embarcadero is providing at some cost. Of course, nobody is going to hire you to use Pascal. (or any other freaky language that gets the job at hand done better than the current mainstream Java and C# languages)

评论 #8237468 未加载

sjolsen超过 10 年前

I don't really see what's to be accomplished by most of the points of this. A program that invokes undefined behaviour isn't just invalid; it's almost certainly _wrong_. Shifting common mistakes from undefined behaviour to unspecified behaviour just makes such programs less likely to blow up spectacularly. That doesn't make them correct; it makes it harder to notice that they're incorrect.Granted, not everything listed stops at unspecified behaviour. I'm not convinced that that's a good thing, though. Even something like giving signed integer overflow unsigned semantics is pretty effectively useless. Sure, you can reasonably rely on twos-complement representation, but that doesn't change the fact that you can't represent the number six billion in a thirty-two bit integer, and it doesn't make 32-bit code that happens to depend on the arithmetic properties of the number six billion correct just because the result of multiplying two billion by three is well-defined.Then there's portability. Strict aliasing is a good example of this. Sure, you can access an "int" aligned however you like on x86. It'll be slow, but it'll work. On MIPS, though? Well, the compiler could generate code to automate the scatter-gather process of accessing unaligned memory locations. This is C, though. It's supposed to be close to the metal; it's supposed to force-- I mean, let you do everything yourself, without hiding complexity from the programmer. How far should the language semantics be stretched to compensate for programmers' implicit, flawed mental model of the machine, and at what point do we realize that we already have much better tools for that level of abstraction?

评论 #8234961 未加载

评论 #8235147 未加载

评论 #8236109 未加载

评论 #8235495 未加载

Someone超过 10 年前

"Reading from an invalid pointer either traps or produces an unspecified value."That still leaves room for obscure behavior:<pre><code> if( p[i] == 0) { foo();} if( p[i] != 0) { bar();} </code></pre> Calling foo might change the memory p points at (p might point into the stack or it might point to memory in which foo() temporarily allocates stuff, or the runtime might choose to run parts of free() asynchronously in a separate thread), so one might see cases where both foo and bar get called. And yes, optimization passes in the compiler might or might not remove this problem.Apart from truly performance-killing runtime checks i do not see a way to fix this issue. That probably is the reason it isn't in the list.(Feel free to replace p[i] by a pointer dereference. I did not do that above because I fear HN might set stuff in italics instead of showing asterisks)

评论 #8234806 未加载

rwmj超过 10 年前

I would think also something like "if I write a piece of code, the compiler should compile it", perhaps "or else tell me with a warning that it isn't going to compile it".

Ono-Sendai超过 10 年前

Not sure this is a good idea. Since a lot of behaviour becomes implementation-defined, code written in this dialect will not be portable.

评论 #8235027 未加载

kazinator超过 10 年前

> 1. The value of a pointer to an object whose lifetime has ended remains the same as it was when the object was alive.This does not help anyone; making this behavior defined is stupid, because it prevents debugging tools from identifying uses of these pointers as early as possible. In practice, existing C compilers do behave like this anyway: though any use of the pointer (not merely dereferencing use) is undefined behavior, in practice, copying the value around does work.> 2. Signed integer overflow results in two’s complement wrapping behavior at the bitwidth of the promoted type.This seems like a reasonable request since only museum machines do not use two's complement. However, by making this programming error defined, you interfere with the abilty to diagnose it. C becomes friendly in the sense that assembly language is friendly: things that are not necessarily correct have a defined behavior. The problem is that then people write code which depends on this. Then when they do want overflow trapping, they will have to deal with reams of false positives.The solution is to have a declarative mechanism in the language whereby you can say "in this block of code, please trap overflows at run time (or even compile time if possible); in this other block, give me two's comp wraparound semantics".> 3. Shift by negative or shift-past-bitwidth produces an unspecified result.This is just word semantics. Undefined behavior, unspecified: it spells nonportable. Unspecified behavior may seem better because it must not fail. But, by the same token, it won't be diagnosed either.A friendly C should remove all gratuitous undefined behaviors, like ambiguous evaluation orders. And diagnose as many of the remaining ones which are possible: especially those which are errors.Not all undefined behaviors are errors. Undefined behavior is required so that implementations can extend the language locally (in a conforming way).One interpretation of ISO C is that calling a nonstandard function is undefined behavior. The standard doesn't describe what happens, no diagnostic is required, and the range of possibilities is very broad. If you put "extern int foo()" into a program and call it, you may get a diagnostic like "unresolved symbol foo". Or a run-time crash (because there is an external foo in the platform, but it's actually a character string!) Or you may get the expected behavior.> 4. Reading from an invalid pointer either traps or produces an unspecified value. In particular, all but the most arcane hardware platforms can produce a trap when dereferencing a null pointer, and the compiler should preserve this behavior.The claim here is false. Firstly, even common platforms like Linux do not actually trap null pointers. They trap accesses to an unmapped page at address zero. That page is often as small as 4096 bytes. So a null dereference like ptr[i] or ptr->memb where the displacement goes beyond the page may not actually be trapped.Reading from invalid pointers already has the de facto behavior of reading an unspecified value or else trapping. The standard makes it formally undefined, though, and this only helps: it allows advanced debugging tools to diagnose invalid pointers. We can run our program under Valgrind, for instance, while the execution model of that program remains conforming to C. We cannot valgrind the program if invalid pointers dereference to an unspecified value, and programs depend on that; we then have reams of false positives and have to deal with generating tedious suppressions.> 5. Division-related overflows either produce an unspecified result or else a machine-specific trap occurs.Same problem again, and this is already the actual behavior: possibilities like "demons fly out of your nose" does not happen in practice.The friendly thing is to diagnose this, always.Carrying on with a garbage result is anything but friendly.> It is permissible to compute out-of-bounds pointer values including performing pointer arithmetic on the null pointer.Arithmetic on null works on numerous compilers already, which use it to implement the offsetof macro.> memcpy() is implemented by memmove().This is reasonable. The danger in memcpy not supporting overlapped copies is not worth the microoptimization. Any program whose performance is tied to that of memcpy is badly designed anyway. For instance if a TCP stack were to double in performance due to using a faster memcpy, we would strongly suspect that it does too much copying.> The compiler is granted no additional optimization power when it is able to infer that a pointer is invalid.That's not really how it works. The compiler assumes that your pointers are valid and proceeds accordingly. For instance, aliasing rules tell it that an "int *" pointer cannot be aimed at an object of type "double", so when that pointer is used to write a value, objects of type double can be assumed to be unaffected.C compilers do not look for rule violations as an excuse to optimize more deeply, they generally look for opportunities based on the rules having been followed.> When a non-void function returns without returning a value, an unspecified result is returned to the caller.This just brings us back to K&R C before there was an ANSI standard. If functions can fall off the end without returning a value, and this is not undefined, then again, the language implementation is robbed of the power to diagnose it (while remaining conforming). Come on, C++ has fixed this problem, just look at how it's done! For this kind of undefined behavior which is erroneous, it is better to require diagnosis, rather than to sweep it under the carpet by turning it into unspecified behavior. Again, silently carrying on with an unspecified value is not friendly. Even if the behavior is not classified as "undefined", the value is nonportable garbage.It would be better to specify it a zero than leave it unspecified: falling out of a function that returns a pointer causes it to return null, out of a function that returns a number causes it to return 0 or 0.0 in that type, out of a function that returns a struct, a zero-initialized struct, and so on.Predictable and portable results are more friendly than nonportable, unspecified, garbage results.

评论 #8235494 未加载

dschiptsov超过 10 年前

Show us, please, how the dialect of C which is used for development of the Plan9 is unfriendly and not good-enough.

评论 #8235933 未加载

allegory超过 10 年前

It is friendly, if you're not an idiot. Not becoming an idiot is the best solution (practice, lots).Edit: perhaps I made my point badly but don't assume that you'll ever be good enough to not be an idiot. Try to converge on it if possible through. I'm still an idiot with C and I've been doing it since 1997.

评论 #8234138 未加载

评论 #8233710 未加载

评论 #8233741 未加载

评论 #8234124 未加载

评论 #8234150 未加载

评论 #8233706 未加载

评论 #8233822 未加载