"falsehoods 'falsehoods programmers believe about X' authors believe about X"...<p>All you need to know about null pointers in C or C++ is that dereferencing them gives undefined behaviour. That's it. The buck stops there. Anything else is you trying to be smart about it.
These articles are annoying because they try to sound smart by going through generally useless technicalities the average programmer shouldn't even be considering in the first place.
> asking for forgiveness (dereferencing a null pointer and then recovering) instead of permission (checking if the pointer is null before dereferencing it) is an optimization. Comparing all pointers with null would slow down execution when the pointer isn’t null, i.e. in the majority of cases. In contrast, signal handling is zero-cost until the signal is generated, which happens exceedingly rarely in well-written programs.<p>Is this actually a real optimization? I understand the principal, that you can bypass explicit checks by using exception handlers and then massage the stack/registers back to a running state, but does this actually optimize speed? A null pointer check is literally a single TEST on a register, followed by a conditional jump the branch predictor is 99.9% of the time going to know what to do with. How much processing time is using an exception actually going to save? Or is there a better example?
> In both cases, asking for forgiveness (dereferencing a null pointer and then recovering) instead of permission (checking if the pointer is null before dereferencing it) is an optimization. Comparing all pointers with null would slow down execution when the pointer isn’t null, i.e. in the majority of cases. In contrast, signal handling is zero-cost until the signal is generated, which happens exceedingly rarely in well-written programs.<p>At least from a C/C++ perspective, I can't help but feel like this isn't great advice. There isn't a "null dereference" signal that gets sent--it's just a standard SIGSEGV that cannot be distinguished easily from other memory access violations (memprotect, buffer overflows, etc). In principle I suppose you could write a fairly sophisticated signal handler that accounts for this--but at the end of the day it <i>must</i> replace the pointer with a not null one, as the memory read will be immediately retried when the handler returns. You'll get stuck in an infinite loop (READ, throw SIGSEGV, handler doesn't resolve the issue, READ, throw SIGSEGV, &c.) unless you do something to the value of that pointer.<p>All this to avoid the cost of an if-statement that almost always has the same result (not null), which is perfect conditions for the CPU branch predictor.<p>I'm not saying that it is definitely better to just do the check. But without any data to suggest that it is actually more performant, I don't really buy this.<p>EDIT: Actually, this is made a bit worse by the fact that dereferencing nullptr is undefined behavior. Most implementations set the nullptr to 0 and mark that page as unreadable, but that isn't a sure thing. The author says as much later in this article, which makes the above point even weirder.
I would add one more: the address you are dereferencing could be non-zero, it could be an offset from 0 because the code is accessing a field in a structure or method in a class. That offset can be quite large, so if you see an error accessing address 0x420, it's probably because you do have a null pointer and are trying to access a field. As a bonus, the offending offset may give you a hint as to which field and therefore where in your code the bad dereferencing is happening.
The article wasn't terrible. I give it a C+ (no pun intended).<p>Too general, too much trivia without explaining the underlying concepts. Questionable recommendations (without covering potential pitfalls).<p>I have to say that the discourse here is refreshing. I got a headache reading the 190+ comments on the /r/prog post of this article. They are a lively bunch though.
> Instead of translating what you’d like the hardware to perform to C literally, treat C as a higher-level language, because it is one.<p>Alternately, <i>stop writing code in C</i>.
That's not how you're supposed to write a "falsehoods programmers believe about X" article.<p>The articles that started this genre are about oversimplifications that make your program worse because <i>real people</i> will not fit into your neat little boxes and their user experience with degrade if you assume they do. It's about developers assuming "Oh, everyone has a X" and then someone who doesn't have a X tries to use their program and get stuck for no reason.<p>Writing a bunch of trivia about how null pointers work in theory which will almost never matter in practice (just assume that dereferencing them is always UB and you'll be fine) isn't in the spirit of the "falsehoods" genre, <i>especially</i> if every bit of trivia needs a full paragraph to explain it.
> <i>In ye olden times, the C standard was considered guidelines rather than a ruleset, undefined behavior was closer to implementation-defined behavior than dark magic, and optimizers were stupid enough to make that distinction irrelevant. On a majority of platforms, dereferencing a null pointer compiled and behaved exactly like dereferencing a value at address 0.</i><p>> <i>For all intents and purposes, UB as we understand it today with spooky action at a distance didn’t exist.</i><p>The first official C standard was from 1989, the second real change was in 1995, and the infamous “nasal daemons” quote was from 1992. So evidently the first C standard was already interpreted that way, that compilers were really allowed to do anything in the face of undefined behavior.
As far as I know
Nowadays, UB is used pretty much as a license to make optimizer go brrrr. But back in the day, I think it was used to allow implementations wiggle room on whether a particular construct was erroneous or not -- in contrast to other specifications like "it is an error" (always erroneous) or "implementation-defined behavior" (always legitimate; compiler must emit <i>something</i> sensible, exactly what is not specified). In the null pointer case, it makes sense for kernel-mode code to potentially indirect to address 0 (or 0xffffffff, or whatever your architecture designates as null), while user-space code can be reasonably considered never to legitimately access that address because the virtual memory never maps it as a valid address. So accessing null is an error in one case and perfectly cromulent in the other. So the standard shrugs its shoulders and says "it's undefined".
Dereferencing a null pointer is how I boot half of my systems. :D On Rockchip platforms address 0 is start of DRAM, and a location where [U-Boot] SPL is loaded after DRAM is initialized. :)
> In ye olden times, the C standard was considered guidelines rather than a ruleset, undefined behavior was closer to implementation-defined behavior than dark magic, and optimizers were stupid enough to make that distinction irrelevant. On a majority of platforms, dereferencing a null pointer compiled and behaved exactly like dereferencing a value at address 0.<p>Let me unpack that for you. Old compilers didn't recognise undefined behaviour, and so compiled the code that triggered undefined behaviour in exactly the same way they compiled all other code. The result was implementation defined, as the article says.<p>Modern compilers can recognise undefined behaviour. When they recognise it they don't warn the programmer "hey, you are doing something non-portable here". Instead they may take advantage of it in any way they damned well please. Most of those ways will be contrary to what the programmer is expecting, consequently yielding a buggy program.<p>But not in all circumstances. The icing on the cake is some undefined behaviour (like dereferencing null pointers) is tolerated (ie treated in the old way), and some not. In fact most large C programs will rely on undefined behaviour of some sort, such as what happens when integers overflow or signed is converted to unsigned.<p>Despite that, what is acceptable undefined behaviour and what is not isn't defined by the standard, or anywhere else really. So the behaviour of most large C programs is it legally allowed to to change if you use a different compiler, a different version of the same compiler, or just different optimisation flags. Consequently most C programs depend on the compiler writers do the same thing with some undefined behaviour, despite there being no guarantees that will happen.<p>This state of affairs, which is to say having a language standard that doesn't standardise major features of the language, is apparently considered perfectly acceptable by the C standards committee.