Writing C for Curl

124 pointsby TangerineDreamabout 2 months ago

12 comments

kpcyrdabout 2 months ago

> We count about 40% of our security vulnerabilities to date to have been the direct result of us using C instead of a memory-safe language alternative. This is however a much lower number than the 60-70% that are commonly repeated, originating from a few big companies and projects.There has been discussion in an Arch Linux internal channel about the accuracy of these classifications. We noticed many advisories contain a "This bug is not considered a C mistake. It is not likely to have been avoided had we not been using C."-disclaimer, but was unclear what the agenda was and how "C mistake" is defined.It was brought up because this disclaimer was also present in the CVE-2025-0665 advisory[0], which is essentially a double-free but on file descriptor level. The impact is extremely low (it's more "libcurl causing unsoundness in your process rather than can-be-exploited-into-RCE"), but it's a direct result of how C manages resources. This kind of bug can also occur in Python, but you're unlikely to find this kind of bug in Rust.Could this bug have occurred with a programming language that isn't C? Yes. Could this bug have been avoided by using a programming language that isn't C? Also yes.[0]: <a href="https://curl.se/docs/CVE-2025-0665.html" rel="nofollow">https://curl.se/docs/CVE-2025-0665.html</a>

评论 #43609549 未加载

评论 #43610334 未加载

评论 #43610004 未加载

评论 #43610165 未加载

sebstefanabout 2 months ago

The guidelines feel out of sync with the directions I've seen people push coding styles over the years"Identifiers should be short" when I've mostly seen people decry how annoying it is to find yourself in a codebase where everything is abbreviated C-style (htons, strstr, printf, wchar_t, _wfopen, fgetws, wcslen)There's a case for more verbosity and if you look at modern Curl code it reflects that as well, new identifiers aren't short<a href="https://github.com/curl/curl/blob/master/lib/vquic/vquic.c">https://github.com/curl/curl/blob/master/lib/vquic/vquic.c</a>"Functions should be short" where I've mostly seen very negative feedback on codebases written following the trend of Uncle Bob's short functions. Complaints that hiding code in 10 levels of function calls isn't helpful, and that following rabbit holes is tedious even with modern editors"Code should be narrow", "we enforce a strict 80 column maximum line length" I don't think I've seen that take lately. I remember seeing a few posts fly by about the number 80 specificallyYou want to prevent dragging your eyes. For my IDE on default settings on a 1080p monitor, half of a 15" screen fits 100 charactersIf you take away 20 columns to fit your text on less of the screen do you really get any benefitsWhat about the cascading effects on the code, like worse names, split lines, ...In the end it's semi-interesting but we're all building sheds and these are mostly debates on what color the shed should be

评论 #43609640 未加载

评论 #43610497 未加载

评论 #43616525 未加载

评论 #43610391 未加载

评论 #43611629 未加载

评论 #43609907 未加载

timhhabout 2 months ago

johnisgoodabout 2 months ago

> Code should be easy to read. It should be clear. No hiding code under clever constructs, fancy macros or overloading.I highly agree with this. I do not always want highly abstracted code, and some programming languages aiming to replace C are much more difficult to read, that said, Rust is supposed to replace C++, not C, right?Thank you for the article!

评论 #43609704 未加载

评论 #43610008 未加载

评论 #43609251 未加载

veltasabout 2 months ago

> So many people will now joke and say something about wide screens being availableAnd this is a silly point because I want to be able to put 2-3 files side-by-side, on that big monitor. Who are all these people asking for long code that means I don't get more than one file on screen at a time?

评论 #43610025 未加载

评论 #43610689 未加载

kwon-youngabout 2 months ago

Curl is one of the very few projects I managed to contribute to with a very simple PR.At the time, I was a bit lost with their custom testing framework, but was very imprest by the ease of contributing to one of the most successful open-source project out there.I now understand why. It is because of their rules around testing and readability (and the friendly attitude of Daniel Stenberg) that a novice like me managed to do it.

kobzolabout 1 month ago

Great post!I have some random guesses as to why the 40% vs 60-70% memory issues percentage:- 180k is not that much code. The 60-70% number comes from Google and Microsoft, and they are dealing with way larger codebases. Of course, the size of the codebase in theory shouldn't affect the percentage, but I suspect in practice it does, as the larger the codebase is, the harder it is to enforce invariants and watch for all possible edge cases.- A related aspect to that is that curl is primarily maintained by one person (you), or at most a handful of contributors. Of course many more people contribute to it, but there is a single maintainer who knows the whole codebase perfectly and can see behind all (or most) corners. For larger codebases with hundreds of people working on them, that is probably not the case.- Curl is used by clients a lot (probably it's used more by clients than servers, for whatever definition of these words) over which you have no control and monitoring. That means that some UB or vulnerabilities that were triggered "in the wild", on the client side, might not ever be found. For Google/Microsoft, if we're talking about Chrome, Windows, web services etc., which are much more controled and monitored by their companies, I suspect that they are able to detect a larger fraction of vulnerabilities and issues than we are able to detect in curl.- You write great code, love what you're doing and take pride in a job done well (again, if we scale this to a large codebase with hundreds of developers, it's quite hard to achieve the same level of quality and dedication there).(sent this as a comment directly on the post, but it seems like it wasn't approved)

janoelzeabout 2 months ago

This is remarkably clear writing — you sense how it was formed by thousands upon thousands of hours spent communicating, really cool.

bitwizeabout 2 months ago

> how do we write C in curl to make it safe and secure for billions of installations?"That's the neat thing -- you don't."Curl should do what fish did: bite the bullet and rewrite the damn thing in Rust.

评论 #43619536 未加载

dcminterabout 2 months ago

> "Wider code is harder to read. Period. "That's stated as if it were proven, and I can believe that it has enough basis in fact that one might choose to enforce it, but I don't believe it's universally true.I do often see code subject to a line-length linting enforcement that I think would have been clearer not broken up across multiple lines.Personally I prefer a linter with escape hatches so that you can declare "this line exempt from such and such a rule" if you have enough reason for it and are willing to take the fight to the pull request :D

acmjabout 2 months ago

Some part of this article is opinionated. Curl may be well written but this is more likely to be the result of the overall structure than the number of characters per line. Actually I don't know whether curl is well written. Popularity doesn't always equate to code quality. I have used curl APIs before. I don't like them.

MrMcCallabout 2 months ago

All his ideas are fantastic, and are obviously the result of long experience in a seasoned and highly successful project. He is sharing techniques that simply work for large, complex codebases. Ignore them at your peril!Specifically, though, these sections are related, in my experience:> Avoid "bad" functions> Buffer functions> Parsing functions> Monitor memory function useThese related aspects are why I tend to wrap many library functions that I use (in any language environment) with my own wrapper function, even if it's to just localize their use into one single entry/use point. That allows me to have one way that I use the function, thereby giving my code a place to not only place all best practices for its use, but to allow me to update those best practices in one single place for the entire codebase. And it is especially helpful if I want to simply rewrite the code itself to, for example, never use scanf, which I determined was a necessary strategy many, many moons ago.Now, when a single function needs to accomodate different use cases and doing such separate kinds of logic would incur too much logical or runtime cost, a separate wrapper can be added, but if the additional wrappers can utilize the cornerstone wrapper, that is the best, if feasible. Of course, all these wrappers should be located in the same chunk of code.For C, especially, wrapper functions also allow me to have my own naming convention over top of the standard library's terse names (without using macros, because they're to be avoided). That makes it easier for me to remember its name, thereby further reducing cognitive load.