Fixing C Strings

32 点作者 ushakov5 个月前

14 条评论

<pre><code> struct str { char *dat; sz len; }; </code></pre> It's the same solution D uses, except that it's a builtin type, and works for all arrays. I proposed this solution for C:<a href="https://www.digitalmars.com/articles/C-biggest-mistake.html" rel="nofollow">https://www.digitalmars.com/articles/C-biggest-mistake.html</a>It's hard to overstate what a huge win this is. D has had 23 years of experience with it, and the virtual elimination of array overflow bugs is just win, win, win.I will never understand why C keeps adding extensions consisting of marginal features, and ignores this foundational fix. I guess they still aren't tired of buffer overflow bugs always being the #1 security vulnerability of shipped C code (and C++, too!).

评论 #42482875 未加载

评论 #42483225 未加载

评论 #42483260 未加载

评论 #42482941 未加载

kevin_thibedeau5 个月前

> Current compilers warn you if the format string doesn’t match its arguments. But this only works on functions that have the same signature as printf so it doesn’t work on my implementation.GCC has the format attribute that lets you have printf type checking on your own variadic functions:<a href="https://gcc.gnu.org/onlinedocs/gcc-14.2.0/gcc/Common-Function-Attributes.html#index-Wformat-3" rel="nofollow">https://gcc.gnu.org/onlinedocs/gcc-14.2.0/gcc/Common-Functio...</a>

评论 #42482357 未加载

评论 #42484285 未加载

评论 #42483002 未加载

simscitizen5 个月前

There are quite a few of these "better C string" idioms floating around.Another one to consider is e.g. <a href="https://github.com/antirez/sds">https://github.com/antirez/sds</a> (used by Redis), which instead stores the string contents in-line with the metadata.

ropejumper5 个月前

Two people have already mentioned things like storing the length inline or including a null-terminator to be backwards-compatible. What's described there is basically the same as std::string_view or &str, and to me one of the biggest reasons to use these structures is that your particular view of the string doesn't interfere with someone else's. You can slice your string in the middle and just look at it piecewise without bothering anyone else.Choosing between these trade-offs just depends on what you're doing. I'd definitely choose this pattern if I were to write a parser for instance.

评论 #42481833 未加载

jdblair5 个月前

I've done something similar, but unlike the author, I always reserved one extra byte and I always null terminated the string. This was so I could use existing string output functions.

cozzyd5 个月前

Why not have the null terminator so you can pass to normal printf?You could even do something crazy with packing a null byte with sz on 64-bit systems (since you will never have a string that long anyway...)

评论 #42465981 未加载

评论 #42481788 未加载

up2isomorphism5 个月前

For all the complaints ,all you need to do is to include an another .h files from some string lib and that’s it.But I would say for 95% percent using a fixed length char array with strncpy will work just fine.

superjared5 个月前

The bstring library[0] has been around a _long_ time.[0]: <a href="https://bstring.sourceforge.net/" rel="nofollow">https://bstring.sourceforge.net/</a>

codr75 个月前

I would consider putting the buffer last in the structure and making it flexible to allow skipping one allocation.

评论 #42483184 未加载

Levitating5 个月前

> I liked this kind of pattern at the bottom of OpenAI's site :)Where on OpenAI's site do I find a footer like that?

Quis_sum5 个月前

Sorry, but there is a significant misunderstanding: There is no such thing as a string in C. What you call a string is a pointer to char (typically "int8") - nothing more nothing less. The \0 termination is just a convention/convenience to avoid passing the bounds of the memory segment, resp. when to stop processing earlier.Once you go down the route proposed by many of the comments here - why not enhance it to deal with UTF8... Or rather implement a proper "array" type? What about the lack of multidimensional arrays instead of the pointer to pointer to ... approach? Idiosyncracies such as "int a[2][3];" being of type "int *" and not "int **"?C was never intended to shield you from mistakes, but rather replace a macro assembler. ANSI C addressed some of the issues in the original K&R C, but that is about it.If your use case would benefit from all of these protections, there are plenty of higher level language alternatives...

评论 #42489499 未加载

评论 #42488703 未加载

teo_zero5 个月前

Good attempt at a topic that annoys many programmers.I see a problem with the separation between str and str_buf, though: you create new strings with the latter, but most functions take the former as arguments. Do you convert them every time? Isn't your code littered with str_from_buf()?Put it in another way, it's like the mess with const that you mention in your article. If str is the type you use for a const read-only string, and str_buf for a non-const mutable string, you would like to pass a non-const even to those functions that "only" require a const. (I say "only" because being const is a weaker requirement than being mutable; the fact that it's more wordy is another thing that C's syntax makes confusing, but this is an entirely different topic!)It would be nice if the compiler could be instructed to automatically cast str_buf into str and not vice versa, just like it does for non-const to const.The only way out I can think of, would be to get rid of the two types and only use the one with the cap field, with the convention that if cap is zero, then the string is read-only. The drawback is that certain mistakes are only detected at run-time and not enforced by the compiler. For example, a function than takes a string s and replaces every substring s1 with s2 could have the following prototype in the two-type system:<pre><code> replace(str_buf s, str s1, str s2); </code></pre> And it would be immediate to recognize that you cannot pass a read-only string as the first argument. With a one-type system you loose this ability.Oh well, I guess if a perfect solution existed, it would have been adopted by the C committee, wouldn't it? /s

评论 #42489521 未加载

zwnow5 个月前

Never had a string related bug in any programming language in 4 years. I sincerely don't know what people talk about when they claim strings are buggy? What kinda tasks do these happen in?

评论 #42482906 未加载

评论 #42483030 未加载

zabzonk5 个月前

I have been using null terminated strings since the mid 1970s - before using C, and have never had any problems with them.I have never seen an explanation from someone that has that makes any sense.

评论 #42482588 未加载