There's a big mistake here:<p><pre><code> // Bogus function, just to see how arguments are passed.
void bogus();
// Invoke bogus using ptr.
void do_ptr() { bogus(&ptr, ptr); }
// Invoke bogus using arr.
void do_arr() { bogus(&arr, arr); }
</code></pre>
"&ptr" and "&arr" are not the same. &ptr gives you a pointer to a pointer (char * *), but &arr just gives you a pointer to the array data. That's why the code is different.<p>This would normally show up as a compilation error, since &ptr and &arr have different types. It's disguised here by the variadic bogus() function.<p>(I didn't know that no-arg functions in C are implicitly variadic! That's bizarre. I would have expected modern compilers to disable that by default, but I can't get Clang to warn me about it even with -Wall.)
Interestingly, we get<p><pre><code> .LC0:
.string "Lorem ipsum"
[...]
movl $.LC0, %esi
movl $ptr2, %edi
xorl %eax, %eax
jmp bogus
</code></pre>
for the same experiment done with a const pointer to const data,<p><pre><code> const char * const ptr2 = "Lorem ipsum";
</code></pre>
(Incidentally, this kind of thing is why I prefer to write 'char const STAR const' rather than 'const char STAR const', and 'char const STAR' rather than 'const char STAR'; it systematically places the 'const's.)<p>(PS. HN really doesn't work well for trying to write either unicode or ascii star characters.)
All this talk about do_ptr being slower because it copies from main memory is misleading. Yes, it's slower, because it's <i>doing something different</i> than what do_arr does. It's not just a slower version of doing the same thing. So trying to compare do_ptr and do_arr makes no sense. And the whole thesis of this piece, that declaring string constants as arrays is better, has no supporting evidence. You know what's better? Don't dereference a pointer if you don't need to. That's it.
As others have noted, this is a little bizarre as the two functions are doing different things; one being less performant than the other is not really surprising. That said, there can be advantages to using array syntax rather than pointers for strings (even when also declaring the pointer to be constant), e.g. the compiler "knows" the array "pointer" is non-null:<p><pre><code> extern const char arr[];
void do_arr() {
if (arr) {
dummy();
}
}
</code></pre>
allows the compiler to optimize out the conditional: <a href="https://godbolt.org/g/FwBeWx" rel="nofollow">https://godbolt.org/g/FwBeWx</a>.
Other comments have already deconstructed much of the article, but I'll add this. The primary factor will be your CPU architecture. Can you put the string in the same cacheline as the code or an adjacent cacheline? If so, awesome. However, if you have a microcontroller with a harvard architecture, such as a cortex M3, it's may be better to put your strings in data memory rather than code memory. The processor can simultaneously load your string and the next instructions via the two memory ports.
The author took the time to write a blog post, but not trying the code for anything else than that bogus() function. Had they done that, they would have realized their mistake.
Arrays and pointers are not the same in C, nor have they ever been. The difference is more than that of mere syntax choices for equivalent concepts.<p>In the pointer version of the code, the pointer is reassignable. Only the data it references can't be modified, the pointer can. To prevent the pointer from being modified, the declaration should have been:<p>const char * const ptr = "Lorem ipsum";<p>With the array version, there is no pointer to be modified and so no need or ability to have a second const.
Another difference: if you use sizeof, the pointer version gives you the size of a pointer, but the array version gives you the actual number of bytes in the string.
While I do enjoy this sort of analysis, I feel the decision to choose one method over another should be based on benchmarks (preferably with more than one size of string.) After benchmarking, then do an analysis of the assembly code.<p>Guessing about pipelining, memory access times, and the effect of generated code size is much less valuable than real measurements.
What I find funny is that through all this he never actually mentioned the fact that sizeof() a string constant <i>includes the NUL character at the end</i>.<p>This has bitten me more times than I care to admit.<p>Nowadays, I almost always use the array form and actually declare my strings character by character so that I <i>see</i> the NUL if I actually meant to use it.
I wonder how the results would have differed if those were automatic variables instead of static? That's a much more real-world situation. You wouldn't be able to ignore the overhead of re-initializing the array, since it would be occurring at run time instead of compile time.
Any programming language that needs an entire blog post about how to declare string constants "the right way" is a programming language that needs to disappear.