Beware of strncpy and strncat

108 pointsby eklitzkealmost 7 years ago

27 comments

nneonneoalmost 7 years ago

The advice given in this article is bad. Reducing the length of the copy by one will still fail to null-terminate the string if the source exceeds the destination length. You need to add<pre><code> dst[sizeof(dst)-1] = 0; </code></pre> or memset the array to zero beforehand. You are not guaranteed to have a zero at the end of the input buffer otherwise (neither local variable arrays nor malloc’d arrays are guaranteed to be zeroed).strncpy sucks for a second reason: it writes n bytes no matter what the source is. That means if you write<pre><code> char buffer[16384]; strncpy(buffer, “hello”, sizeof(buffer)-1); buffer[sizeof(buffer)-1] = 0; </code></pre> this will fill 16K of memory with zeroes despite only needing to copy a 6-byte string.strlcpy/strlcat do the right thing (copy up to n-1 bytes and null-terminate); shame they aren’t standardized. In their absence, I suggest snprintf instead:<pre><code> snprintf(buffer, sizeof(buffer), “%s”, src); </code></pre> Because snprintf returns the number of bytes that would be written, it can be used to detect overlong input strings, reallocate the buffer as necessary, and also to implement efficient concatenation. It’s also surprisingly fast in most implementations.

评论 #17249324 未加载

评论 #17249782 未加载

评论 #17249973 未加载

评论 #17249889 未加载

评论 #17255376 未加载

评论 #17250338 未加载

评论 #17249313 未加载

tlbalmost 7 years ago

Beware of the "correct" solutions proposed here.<pre><code> // OK: correctly copy src char dest[8]; strncpy(dest, src, sizeof(dest) - 1); </code></pre> With a long src, it fails to null-terminate dest. dest[7] will be whatever the contents of uninitialized memory were, so reading dest as a string is likely to run past the end.strlcpy has a better API, though sadly it's not standard on Linux.

评论 #17248245 未加载

评论 #17248280 未加载

评论 #17248377 未加载

评论 #17249210 未加载

评论 #17248402 未加载

评论 #17248592 未加载

TimJYoungalmost 7 years ago

I just don't understand why C hasn't been blessed with a proper string type yet. Object Pascal has had one (actually, several) for decades now and it doesn't hinder the language's ability to handle low-level memory manipulation (you can still manually copy string memory, convert them back/forth into raw Char pointers, etc.), and generally serves to make string handling much, much safer for most applications. It does however, result in slightly more memory consumption for the length/reference count tags and does incur some overhead in the form of compiler-generated reference count checks at the end of functions. But, IMO, the advantages outweigh the disadvantages for general-purpose C programming and you're always free to fall back to the more manual methods of handling character arrays.So, am I missing something and there is some concrete reason why this can't be implemented ?

评论 #17248838 未加载

评论 #17248689 未加载

评论 #17249189 未加载

评论 #17248672 未加载

评论 #17248666 未加载

loegalmost 7 years ago

Use strlcpy/strlcat instead.[0] strlcpy takes the full size of the destination buffer, limits the copy to N-1, and nul-terminates the result for you. It's like the "correct" example in TFA, but with less annoying boilerplate.Some more verbose design/rationale for the really curious.[1]Another thing to keep in mind is the sometimes surprising behavior of strncpy(large_buffer, short_string, sizeof(large_buffer))[2]:<pre><code> If the length of src is less than n, strncpy() writes additional null bytes to dest to ensure that a total of n bytes are written. </code></pre> Strlcpy doesn't do that. Just use strlcpy. On Linux, it can be found in the libbsd package.[0]: <a href="https://www.freebsd.org/cgi/man.cgi?query=strlcpy&sektion=3" rel="nofollow">https://www.freebsd.org/cgi/man.cgi?query=strlcpy&sektion=3</a>[1]: <a href="https://www.sudo.ws/todd/papers/strlcpy.html" rel="nofollow">https://www.sudo.ws/todd/papers/strlcpy.html</a>[2]: <a href="https://linux.die.net/man/3/strncpy" rel="nofollow">https://linux.die.net/man/3/strncpy</a>

nine_kalmost 7 years ago

Probably back in 1970s `(characters, \000)` looked "elegant", and `(counter, characters)`, wasteful.This decision, if not the single most prolific, is likely one of the top 3 sources of security exploits in C code. All for the want of saving a few bytes.

评论 #17248706 未加载

评论 #17248679 未加载

评论 #17248716 未加载

jhallenworldalmost 7 years ago

There are many worse string formats than C's NUL terminated:Last byte of string has bit 7 set.First byte of string has length, so strings are limited to 255 bytes in length.First two bytes of string has length, so strings are limited to 65535 bytes in length.Strings are stored in fixed length buffers with space padding to the end.Length prefixed strings are stored in a fixed length buffer, so you are limited to the buffer length. I think this was the case for PL/I "varying" strings.Back in the day, C was better than PASCAL because it had strdup, meaning it had a heap and you could put strings in it.C++ string is mediocre. I solves some problems, but what if:You have very long strings and you are worried about heap fragmentation. So you are better to have something like a linked list of segments each in their own malloc block. But can you extend std::string? Nope, oh well.You want strings to be semipredicates. I mean that strings should be able to have a NULL value, as I can do with C. (return NULL for 'char *'). Can std::string do this? Nope. Can it be extended? Nope.

评论 #17249024 未加载

评论 #17249269 未加载

评论 #17256605 未加载

bio_end_io_talmost 7 years ago

The author should've been wary of strncpy, because his solution is wrong."If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated."So copying sizeof(dest)-1 will not append a NULL byte as the author implies. You'll have to do that manually.

评论 #17249529 未加载

GuB-42almost 7 years ago

I usually avoid strncpy() and strncat() altogether<pre><code> char buf[256]; size_t sz = strlen(str); if (sz < sizeof (buf)) { memcpy(buf, str, sz + 1); } else { /* error processing */; } </code></pre> Truncation is not as bad as a buffer overflow. However, it is still not correct. You have to properly handle the case. And if truncating is the correct answer, make that explicit.In practice, I almost never use fixed size buffers for strings unless I know the size at compile time.Edit (for strcat):<pre><code> char buf[256] = "blah"; size_t sz1 = strlen(buf); size_t sz2 = strlen(str); if (sz1 + sz2 < sizeof (buf)) { memcpy(buf + sz1, str, sz2 + 1); } else { /* error processing */; }</code></pre>

Someonealmost 7 years ago

”C programmers should use the newer strncpy()”Newer? I thought strncpy dates back to the time Unix filenames were 14 characters, max, adding padding zeroes when needed in some fixed-length kernel structures.That’s also the reason strncpy always writes len bytes; not keeping garbage content in those 14-byte buffers allows the system to use memcmp to compare file names.

hawskialmost 7 years ago

There is no idiomatic way to use strncpy, unless you're running the 7th edition of Unix [0].[0] <a href="https://stackoverflow.com/a/1454071/6561829#6561829" rel="nofollow">https://stackoverflow.com/a/1454071/6561829#6561829</a>

ebikelawalmost 7 years ago

C++ basic_string::c_str always returns a null-terminated string. C++ is the solution to numerous C pitfalls.

评论 #17248542 未加载

MrBingleyalmost 7 years ago

Looking at the comments in this post, I'm resigning myself that there simply is no correct solution for copying or concating strings in C. Null-terminated strings are a fundamentally broken concept. I think the long-term solution is simply to move to a different language (Rust, C++, D, Go, whatever) where we have the benefit of hindsight and have (pointer, length) string types, which solve all the problems null-terminated strings introduce.

评论 #17250244 未加载

评论 #17250132 未加载

Analemma_almost 7 years ago

I'm just shouting into the void here, but why does anyone find it acceptable that C is almost fifty years old– a half-century– and we still have new articles published about the correct way to copy memory. And then, immediately following them, comments in responds to those articles saying the article is wrong and that you should actually do it this other way. Nobody has figured this out in 50 years?

评论 #17248744 未加载

评论 #17254174 未加载

评论 #17251224 未加载

评论 #17248703 未加载

vorticoalmost 7 years ago

Don't use `strncpy()` and `strncat()` at all, use<pre><code> snprintf(buffer, buffer_len, "%s Stuff %s", str1, str2); </code></pre> It's safe (C99, C++11) and easily extendible. Format strings are fun! Not the fastest, but if the bottleneck of your program is concatinating strings, just do it manually.

评论 #17248491 未加载

评论 #17249080 未加载

评论 #17248590 未加载

GlitchMralmost 7 years ago

Worth noting that strncpy doesn't stand for secure string copy or anything like that. Using strncpy for copying strings would be a mistake, even if technically you can do that.Rather, it's a fixed size string copy function. This structure is very rare in regular environments, but they can happen in embedded environments. For instance, if you want to have a string in binary file which is at most 10 bytes, you may want to avoid storing the termination byte when the string is exactly 10 bytes long. For instance, such a structure was used in UNIX to store file names, as they used to be limited to 14 bytes, and storing terminator would be a waste of space.

mlthoughts2018almost 7 years ago

I like learning about these caveats, but I have been asked tricky stuff like this in interviews before with gets() and the like.As a person who interviews other people, I find that it's waaay more valuable that someone is generally aware that they should watch out for this class of pitfalls than that they know any specifics about a given function.I've met people who basically had memorized the description of this phenomenon for gets(), but then their preferred solution was just to replace it with fgets() but then they don't know about checking for newlines or have any thoughts on what to do when individual lines are too long.I'd much rather hire someone who says to herself, "Oh, I need to read some characters from an input source using C. RED ALERT! Let me really research the specifics here."Instead of someone who thinks, "Oh, I need to read some characters from an input source using C. Good thing I memorized that trivia about gets() and can totally solve this in the best way immediately with the highest upvoted Stack Overflow solution of fgets() that I didn't bother to deeply grok."I find that when interviews are geared towards puzzle solving or esoteric trivia, the people who do well are mostly of the second type (the ones I wouldn't want to hire).Whereas someone of the first type might flounder around and struggle in a 20-minute programming task to process strings in C, directly because that person cares more about having a bigger picture point of view of what's actually going on rather than esoteric memorization of specific function signatures and usage mechanics.In other words, if I gave some kind of C string processing question in an interview for 20-30 minutes, one very excellent answer should be, "sorry man, not gonna try to do this in 20 minutes because in reality I know there are string handling landmines I would need to research and slowly process, and I would never believe this is worth committing to memory for a short interview."

eboyjralmost 7 years ago

Interesting how the "solutions" to the buffer overflow problems don't provide for all of the modern assumptions of programming with strings. I would love to know the history of the development of strncpy.

评论 #17248687 未加载

rurbanalmost 7 years ago

Interestingly even the Annex K strncpy_s and strncat_s are unsafe by design, that's why I only added them to the safeclib via --enable-unsafe.But recently I got fed with all this unsafety nonsense with the truncating variants and changed the implementation to always terminate the asciiz string properly. <a href="https://github.com/rurban/safeclib/blob/master/src/str/strncat_s.c" rel="nofollow">https://github.com/rurban/safeclib/blob/master/src/str/strnc...</a>

docker_upalmost 7 years ago

Not even this works. They forgot to force the last byte as a NULL, which is a classic bug in C. Either that or memset the char array before using it. But what the blog poster did is a pure bug.

评论 #17248722 未加载

lkjalksdjfasdfalmost 7 years ago

I don't use pure C for string handling anymore. I use C++ and extern C ABI. Rust can also work. C++ can infer the destination size via templates.

评论 #17254143 未加载

kazinatoralmost 7 years ago

Never memcpy structs, except if you need to ensure that the padding bytes are copied. C has had a = b assignment for structs since long before ANSI C.

chris_vaalmost 7 years ago

For those similarly curious:"""char * STRNCPY (char s1, const char s2, size_t n) {<pre><code> size_t size = __strnlen (s2, n); if (size != n) memset (s1 + size, '\0', n - size); return memcpy (s1, s2, size); </code></pre> }libc_hidden_builtin_def (strncpy)"""... which is actually different than how I thought it would be implemented (it ends up with an extra loop to figure out the size of the string).(side note, how does one format code in HN?)

评论 #17248673 未加载

rini17almost 7 years ago

1. Avoid sizeof, lest someone comes and changes the array into a pointer. Use constant parameter instead:#define BUFLEN 8char buffer[BUFLEN];strncpy(...etc... BUFLEN-1);2. Check length of the string to be copied BEFORE copying and if too long, fail (using assert, exit or so) instead of silent truncation or worse.Why it is so hard and instead new "safe" string functions must be invented?

评论 #17250342 未加载

professorTuringalmost 7 years ago

This link has nothing different to offer than "man 3 strncpy"...Worse, some examples are invalid.

评论 #17250226 未加载

Hello71almost 7 years ago

In addition to tlb's point, the article's description of strncat is not correct.> As with strcat(), the resulting string in dest is always null-terminated. > Therefore, the size of dest must be at least strlen(dest)+n+1.

评论 #17250670 未加载

jacquesmalmost 7 years ago

Bad advice is worse than no advice at all.

beeforporkalmost 7 years ago

Well, this is not very helpful advice, because strncpy(a, b, sizeof(a)) is in no way more safe than strncpy(a, b, sizeof(a)-1), because the latter is not 0-terminated either. And from malloc(), as in the examples, comes no 0-termined buffer, but random garbage memory. What would be safer is to alway 0-terminate the buffer after copying, and using the simplest copy possible:<pre><code> strcpy(a, b, sizeof(a)); a[sizeof(a)-1] = 0; </code></pre> But this is more boilerplate and hence more error-prone.Even safer, use strlcpy() (if available) or snprintf() which both 0-terminate (except under Windows, maybe). (But beware when preparing something for copying from trusted to untrusted: strncpy() clears the rest of the buffer while strlcpy() and snprintf() do not, so you might leak info via uninitialised memory behind the end of the string if you copy out that buffer across a trust boundary. Actually, the authors 'sizeof()-1' solution is less secure in this context.) So, use:<pre><code> snprintf(a, sizeof(a), "%s", b); </code></pre> And don't tell me anything about speed, please. Your main concern with C is not micro optimisations but robustness and avoiding undefined behaviour (and that snprintf() is not too slow).And for multiple concats, use multiple snprintfs(), like so:<pre><code> char *i = a, *e = a + sizeof(a); i += snprintf(i, e-i, "%s", b1); i += snprintf(i, e-i, "%s", b2); i += snprintf(i, e-i, "%s", b3); </code></pre> This is the most concise way I know to write this that works without buffer overflow (your main enemy, even more vile than missing 0-termination), without thinking too much, without writing too much boilerplate, and that is relatively robust against breaking in code restructuring (like, appending more stuff in the middle). The idiom also resembles a bit old style C++ iterators ('i' and 'e').Oh, and a truncated string is usually not good anyway, be it 0-terminated or not. So you do need to check for that after all that stringing stuff:<pre><code> if (strnlen(a, sizeof(a)) >= sizeof(a)-1) { /* ... error ... */ } </code></pre> Don't miss that '-1' there. Off-by-one is another enemy to know well. And dispite that check handling missing 0-termination, do not be tempted to fall back to strcpy(), because missing 0-termination is bad(tm).Phew!C is bad with strings. The above resembles C++ iterators ('i' and 'e') and works fine with any good snprintf implementation (i.e., probably not under Windows).And do not copy structs with memcpy, just assign them! memcpy() is for arrays only. This is not going to go away, is it?

评论 #17250137 未加载