Falsehoods programmers believe about null pointers

103 pointsby HeliumHydride4 months ago

13 comments

mcdeltat4 months ago

"falsehoods 'falsehoods programmers believe about X' authors believe about X"...All you need to know about null pointers in C or C++ is that dereferencing them gives undefined behaviour. That's it. The buck stops there. Anything else is you trying to be smart about it. These articles are annoying because they try to sound smart by going through generally useless technicalities the average programmer shouldn't even be considering in the first place.

评论 #42899302 未加载

评论 #42895339 未加载

评论 #42894978 未加载

评论 #42895446 未加载

评论 #42897527 未加载

评论 #42897289 未加载

metalcrow4 months ago

> asking for forgiveness (dereferencing a null pointer and then recovering) instead of permission (checking if the pointer is null before dereferencing it) is an optimization. Comparing all pointers with null would slow down execution when the pointer isn’t null, i.e. in the majority of cases. In contrast, signal handling is zero-cost until the signal is generated, which happens exceedingly rarely in well-written programs.Is this actually a real optimization? I understand the principal, that you can bypass explicit checks by using exception handlers and then massage the stack/registers back to a running state, but does this actually optimize speed? A null pointer check is literally a single TEST on a register, followed by a conditional jump the branch predictor is 99.9% of the time going to know what to do with. How much processing time is using an exception actually going to save? Or is there a better example?

评论 #42895468 未加载

评论 #42895485 未加载

评论 #42898641 未加载

评论 #42897111 未加载

评论 #42896071 未加载

评论 #42895380 未加载

评论 #42895581 未加载

评论 #42897846 未加载

评论 #42897390 未加载

评论 #42895394 未加载

heraclius17294 months ago

> In both cases, asking for forgiveness (dereferencing a null pointer and then recovering) instead of permission (checking if the pointer is null before dereferencing it) is an optimization. Comparing all pointers with null would slow down execution when the pointer isn’t null, i.e. in the majority of cases. In contrast, signal handling is zero-cost until the signal is generated, which happens exceedingly rarely in well-written programs.At least from a C/C++ perspective, I can't help but feel like this isn't great advice. There isn't a "null dereference" signal that gets sent--it's just a standard SIGSEGV that cannot be distinguished easily from other memory access violations (memprotect, buffer overflows, etc). In principle I suppose you could write a fairly sophisticated signal handler that accounts for this--but at the end of the day it must replace the pointer with a not null one, as the memory read will be immediately retried when the handler returns. You'll get stuck in an infinite loop (READ, throw SIGSEGV, handler doesn't resolve the issue, READ, throw SIGSEGV, &c.) unless you do something to the value of that pointer.All this to avoid the cost of an if-statement that almost always has the same result (not null), which is perfect conditions for the CPU branch predictor.I'm not saying that it is definitely better to just do the check. But without any data to suggest that it is actually more performant, I don't really buy this.EDIT: Actually, this is made a bit worse by the fact that dereferencing nullptr is undefined behavior. Most implementations set the nullptr to 0 and mark that page as unreadable, but that isn't a sure thing. The author says as much later in this article, which makes the above point even weirder.

评论 #42897338 未加载

alain940404 months ago

I would add one more: the address you are dereferencing could be non-zero, it could be an offset from 0 because the code is accessing a field in a structure or method in a class. That offset can be quite large, so if you see an error accessing address 0x420, it's probably because you do have a null pointer and are trying to access a field. As a bonus, the offending offset may give you a hint as to which field and therefore where in your code the bad dereferencing is happening.

评论 #42894923 未加载

评论 #42894742 未加载

caspper694 months ago

The article wasn't terrible. I give it a C+ (no pun intended).Too general, too much trivia without explaining the underlying concepts. Questionable recommendations (without covering potential pitfalls).I have to say that the discourse here is refreshing. I got a headache reading the 190+ comments on the /r/prog post of this article. They are a lively bunch though.

Hizonner4 months ago

> Instead of translating what you’d like the hardware to perform to C literally, treat C as a higher-level language, because it is one.Alternately, stop writing code in C.

评论 #42896693 未加载

评论 #42895395 未加载

评论 #42894932 未加载

评论 #42895371 未加载

评论 #42895706 未加载

评论 #42894761 未加载

PoignardAzur4 months ago

That's not how you're supposed to write a "falsehoods programmers believe about X" article.The articles that started this genre are about oversimplifications that make your program worse because real people will not fit into your neat little boxes and their user experience with degrade if you assume they do. It's about developers assuming "Oh, everyone has a X" and then someone who doesn't have a X tries to use their program and get stuck for no reason.Writing a bunch of trivia about how null pointers work in theory which will almost never matter in practice (just assume that dereferencing them is always UB and you'll be fine) isn't in the spirit of the "falsehoods" genre, especially if every bit of trivia needs a full paragraph to explain it.

Blikkentrekker4 months ago

> In ye olden times, the C standard was considered guidelines rather than a ruleset, undefined behavior was closer to implementation-defined behavior than dark magic, and optimizers were stupid enough to make that distinction irrelevant. On a majority of platforms, dereferencing a null pointer compiled and behaved exactly like dereferencing a value at address 0.> For all intents and purposes, UB as we understand it today with spooky action at a distance didn’t exist.The first official C standard was from 1989, the second real change was in 1995, and the infamous “nasal daemons” quote was from 1992. So evidently the first C standard was already interpreted that way, that compilers were really allowed to do anything in the face of undefined behavior. As far as I know

bitwize4 months ago

Nowadays, UB is used pretty much as a license to make optimizer go brrrr. But back in the day, I think it was used to allow implementations wiggle room on whether a particular construct was erroneous or not -- in contrast to other specifications like "it is an error" (always erroneous) or "implementation-defined behavior" (always legitimate; compiler must emit something sensible, exactly what is not specified). In the null pointer case, it makes sense for kernel-mode code to potentially indirect to address 0 (or 0xffffffff, or whatever your architecture designates as null), while user-space code can be reasonably considered never to legitimately access that address because the virtual memory never maps it as a valid address. So accessing null is an error in one case and perfectly cromulent in the other. So the standard shrugs its shoulders and says "it's undefined".

评论 #42894867 未加载

megous4 months ago

Dereferencing a null pointer is how I boot half of my systems. :D On Rockchip platforms address 0 is start of DRAM, and a location where [U-Boot] SPL is loaded after DRAM is initialized. :)

评论 #42895487 未加载

userbinator4 months ago

"Falsehoods programmers believe" is the "considered harmful" of the modern dogma cult.

评论 #42895882 未加载

EtCepeyd4 months ago

> Dereferencing a null pointer always triggers “UB”.Calling this a "falsehood" is utter bullshit.

评论 #42897354 未加载

rstuart41334 months ago

> In ye olden times, the C standard was considered guidelines rather than a ruleset, undefined behavior was closer to implementation-defined behavior than dark magic, and optimizers were stupid enough to make that distinction irrelevant. On a majority of platforms, dereferencing a null pointer compiled and behaved exactly like dereferencing a value at address 0.Let me unpack that for you. Old compilers didn't recognise undefined behaviour, and so compiled the code that triggered undefined behaviour in exactly the same way they compiled all other code. The result was implementation defined, as the article says.Modern compilers can recognise undefined behaviour. When they recognise it they don't warn the programmer "hey, you are doing something non-portable here". Instead they may take advantage of it in any way they damned well please. Most of those ways will be contrary to what the programmer is expecting, consequently yielding a buggy program.But not in all circumstances. The icing on the cake is some undefined behaviour (like dereferencing null pointers) is tolerated (ie treated in the old way), and some not. In fact most large C programs will rely on undefined behaviour of some sort, such as what happens when integers overflow or signed is converted to unsigned.Despite that, what is acceptable undefined behaviour and what is not isn't defined by the standard, or anywhere else really. So the behaviour of most large C programs is it legally allowed to to change if you use a different compiler, a different version of the same compiler, or just different optimisation flags. Consequently most C programs depend on the compiler writers do the same thing with some undefined behaviour, despite there being no guarantees that will happen.This state of affairs, which is to say having a language standard that doesn't standardise major features of the language, is apparently considered perfectly acceptable by the C standards committee.

评论 #42896731 未加载

评论 #42896165 未加载