Perhaps this is a digression, but there are some misunderstandings in
this blog post about the memory model in his C++ example. Since the
author goes to lengths to be precise and exact in terminology and his
explanations, it would be good to clarify some things.<p>For the example code C++ given:<p><pre><code> string some_guy = "Fred";
// ...
some_guy = "George";
</code></pre>
Here is the author's explanation:<p><pre><code> In the above, the variable some_guy refers to a location in
memory, and the value 'Fred' is inserted in that location (indeed,
we can take the address of some_guy to determine the portion of
memory to which it refers). Later, the contents of the memory
location referred to by some_guy are changed to 'George'. The
previous value no longer exists; it was overwritten. This likely
matches your intuitive understanding (even if you don't program in
C++).
</code></pre>
There are several incorrect statements here.<p>A. "the variable some_guy refers to a location in memory, and the
value 'Fred' is inserted in that location"
Not true. The variable some_guy refers to a location in memory, but
that location contains a <i>pointer</i> to a region of memory with the
value "Fred". More on this in a bit.<p>B. "indeed, we can take the address of some_guy to determine the portion
of memory to which it refers".
Again, not true in the way the author probably intended. The
address of 'some_guy' will be the address on the stack where the
variable is located. The value at that address on the stack will be
a pointer to a separate region of memory containing "Fred".<p>C. "Later, the contents of the memory location referred to by some_guy
are changed to 'George'."
To be precise, the contents of the memory location referred to by
some_guy are changed to a new <i>pointer</i> to a different region of
memory containing "George".<p>D. "the previous value no longer exists; it was overwritten."
The values "Fred" and "George" exist through the entire lifetime
of the program. They are string constants, which are compiled in
to the data section of the program binary. The author is correct on
one point: the value of 'some_guy' is overwritten during the
reassignment. However, the value in this case is simply a pointer,
not a string.<p>We can verify my claims experimentally. On my system (Ubuntu 11.04
32-bit, g++ 4.5.2), I wrote the following test program:<p><pre><code> //a.cpp
#include <string>
using namespace std;
int main()
{
string some_guy = "Fred";
some_guy = "George";
return 0;
}
</code></pre>
Compile like this:
$ g++ -O0 a.cpp -o a<p>Now, we can disassemble the "main" function to see exactly what is
going on in the executable:<p><pre><code> $ objdump -d a
a: file format elf32-i386
...
08048614 <main>:
8048614: 55 push %ebp
8048615: 89 e5 mov %esp,%ebp
8048617: 83 e4 f0 and $0xfffffff0,%esp
804861a: 53 push %ebx
804861b: 83 ec 2c sub $0x2c,%esp
804861e: 8d 44 24 1f lea 0x1f(%esp),%eax
8048622: 89 04 24 mov %eax,(%esp)
8048625: e8 06 ff ff ff call 8048530 <_ZNSaIcEC1Ev@plt>
804862a: 8d 44 24 1f lea 0x1f(%esp),%eax
804862e: 89 44 24 08 mov %eax,0x8(%esp)
8048632: c7 44 24 04 80 87 04 movl $0x8048780,0x4(%esp)
8048639: 08
804863a: 8d 44 24 18 lea 0x18(%esp),%eax
804863e: 89 04 24 mov %eax,(%esp)
8048641: e8 ca fe ff ff call 8048510 <_ZNSsC1EPKcRKSaIcE@plt>
8048646: 8d 44 24 1f lea 0x1f(%esp),%eax
804864a: 89 04 24 mov %eax,(%esp)
804864d: e8 ce fe ff ff call 8048520 <_ZNSaIcED1Ev@plt>
8048652: c7 44 24 04 85 87 04 movl $0x8048785,0x4(%esp)
...
</code></pre>
In this case, we are interested in the following two lines:<p><pre><code> 8048632: c7 44 24 04 80 87 04 movl $0x8048780,0x4(%esp)
</code></pre>
and<p><pre><code> 8048652: c7 44 24 04 85 87 04 movl $0x8048785,0x4(%esp)
</code></pre>
0x4(%esp) is the address on the stack of our 'some_guy' variable. We
are moving an "immediate" (constant) value into that location. What
does that immediate refer to? Again, let's find out experimentally:<p><pre><code> $ objdump -s -j .rodata a
a: file format elf32-i386
Contents of section .rodata:
8048778 03000000 01000200 46726564 0047656f ........Fred.Geo
8048788 72676500 00000000 rge.....
</code></pre>
Notice that the address of "Fred" is 0x8048778 + 8 = 0x8048780, which
is exactly the immediate value from the disassembled main function.<p>This means that our variable "some_guy" really is a pointer, and
nothing more. The strings "Fred" and "George" are statically allocated
in the data section of the binary, meaning they stick around
indefinitely.