Making it hard for the compiler to optimize your memset-zero away is not a long-term solution. At some point in the future the compiler might be able to analyze this and optimize it away. As a cryptographer you should not rely on bad compilers.<p>Actually, using his memzero solution would work, but not because of his reasons. Putting memzero into another compilation unit (.c file) requires to compile it separately. memzero itself cannot be compiled to a NOP, since the compiler does not know how it is used and a call to memzero cannot be optimized since the compiler does not know what it does.<p>Nevertheless, link-time optimization in theory could still optimize across compilation units. The only solution which comes to my mind is to use 'volatile' for the memory access, but that will never be fast.
Personally, I'd just write a wrapper function for memset().<p>The following one works in at least gcc 4.5.3 and clang 3.1, but is actually not guaranteed to work by C language semantics:<p><pre><code> static inline void memzero(void *volatile block, size_t size)
{
memset(block, 0, size);
}
</code></pre>
A safe alternative is<p><pre><code> static inline void memzero(void *block, size_t size)
{
static void *(*volatile const memset_)(void *, int, size_t) = memset;
memset_(block, 0, size);
}
</code></pre>
but has the downside that the virtual call cannot be optimized away, whereas gcc and clang actually inline the call to memset() if the first version is used.<p>If this is a concern, there's probably no alternative to an explicit loop:<p><pre><code> static inline void memzero(void *block, size_t size)
{
volatile unsigned char *bp = block;
while(bp < (unsigned char *)block + size)
*bp++ = 0;
}</code></pre>
That implementation will die with a bus error for seven out of every eight possible values of mem on most non-x86 architectures, zero out the wrong memory regions for these values on some more exotic architectures, and be horribly slow everywhere else.
<a href="http://stackoverflow.com/questions/1496848/does-unaligned-memory-access-always-cause-bus-errors" rel="nofollow">http://stackoverflow.com/questions/1496848/does-unaligned-me...</a>
Whether memset(), or any other function, gets optimized away by GCC should depend on function attributes (1) -- more exactly on the `pure'; possibly some others. However, GCC (tested with 4.7.1) somehow considers memset() pure regardles of declaration. The default declaration is:<p><pre><code> $ echo '#include <string.h>' | gcc -E - | grep memset
extern void *memset (void *__s, int __c, size_t __n) __attribute__ ((__nothrow__ , __leaf__)) __attribute__ ((__nonnull__ (1)));
</code></pre>
When replaced by hand with declaration lacking any attributes, it still gets optimized away.<p><pre><code> /* will be optimized away for unclear reasons */
extern void *memset (void *__s, int __c, size_t __n);
</code></pre>
Contrast that to behavior of any user-defined function:<p><pre><code> /* may be optimized away */
extern void * memxxx(void *__s, int __c, size_t __n) __attribute__ ((pure));
/* should not be optimized away */
extern void * memyyy(void *__s, int __c, size_t __n);
</code></pre>
IMHO GCC's special handling of memset() is broken...<p>(1) <a href="http://www.cs.auckland.ac.nz/references/c/gcc4.7/Function-Attributes.html" rel="nofollow">http://www.cs.auckland.ac.nz/references/c/gcc4.7/Function-At...</a>
CERT's Secure Coding wiki has more to say on the subject, including portable code for a memset_s() function that can still potentially be optimized way:<p><a href="https://www.securecoding.cert.org/confluence/display/seccode/MSC06-C.+Be+aware+of+compiler+optimization+when+dealing+with+sensitive+data" rel="nofollow">https://www.securecoding.cert.org/confluence/display/seccode...</a><p><a href="https://www.securecoding.cert.org/confluence/pages/worddav/preview.action?pageId=3524&fileName=protecting-sensitive-data.pdf" rel="nofollow">https://www.securecoding.cert.org/confluence/pages/worddav/p...</a>
The assumption that the trivial solution won't be optimized out is, I think, wrong. From your experiment GCC is indeed not smart enough to do so but I would bet that a compiler like ICC would. In this case the best is probably to use some pragmas to avoid optimizing out a statement.
OpenBSD has a function called secure_bzero. All it currently does is call regular bzero, but if external compilation isn't enough to do the job, we come up with something else.<p>Regarding the article, I don't at all understand why the three arguments are necessary. Why would the following patch not work?<p><pre><code> - memset(x, 0, n);
+ memzero(x, n);</code></pre>
<i>Note that the type punning is only actually useful on systems where memory addresses are 64 bits wide, hence we include that code conditionally for environments with the LP64 data model, which incudes most Unix-like systems.</i><p>The first statement seems false. I was not previously aware of an association between the number of address lines and the size of the data bus on computing systems? I know I've had 32-bit processors with at least 64-bit memory buses, and the SheevaPlug has a 32-bit processor with a 16-bit memory bus.<p>Also, the code above this paragraph will only use wide accesses on 64-bit architectures ("#ifdef __LP64__"), even though there are benefits available on 32-bit systems.
Why isn't there a keyword stating "don't optimize this!" in the C Standard? If there is is, please correct me. There are a bunch of similar problems introduced by optimizing compilers that could be solved with such a keyword.
I've only ever done an short compiler class in college, but wouldn't a sufficiently sophisticated optimizer unroll the loop and propagate the constant, thus eliminating the read, allowing the entire block to go away?
Relevant for those thinking that "volatile" will save the day: <a href="http://www.cs.utah.edu/~regehr/papers/emsoft08-preprint.pdf" rel="nofollow">http://www.cs.utah.edu/~regehr/papers/emsoft08-preprint.pdf</a><p>It shows that a) volatile does not have to do _anything_ (if that is documented by the compiler) and b) you cannot trust your compiler to be bug-free. The latter is one reason to follow cert.org's advice to read the disassembly output of your compiler.
Could please someone explain why this is an issue in the first place?<p>If the memory of the process is available to other processes after it finishes (or while it's running), isn't this already a lost game? I.e., how can you be sure that this particular chunk of memory wasn't cached on disk at some point? how can you be sure that someone didn't access it before your memset() call?
A good solution in c is to use calloc, which is malloc + zero'ing out the memory.<p><a href="http://www.cplusplus.com/reference/clibrary/cstdlib/calloc/" rel="nofollow">http://www.cplusplus.com/reference/clibrary/cstdlib/calloc/</a><p>Also, in a standards-compliant compiler, statically declared variables are automatically initialized to zero unless stated otherwise.