Nice. If this can be reasonably retrofitted to existing libraries and projects so that the safety properties compose from local to global, then this could actually be a meaningful improvement to the safety of real-world C code.<p>There would be many more steps required "toward" memory safety, such as eliminating all forms of UB including uninitialized memory, out of bounds pointers, data races, etc. but if this direction is to be pursued it has to start somewhere.
I might be missing something, but this seems to require ownership annotations on all functions, e.g. a compatible and correct prototype for `fclose` to correctly note that the owned `FILE *` is moved into the call.<p>If that's correct, then this is somewhat practically limited: either pre-existing codebases will need to be retrofitted with an essentially bespoke set of macros, or the compiler will need to be "fail open" by default. The tradeoffs between these two are hard (substantial developer pain versus being ineffective against the bulk of a compiled program's API surface).<p>(Also, this design appears to be for temporal safety only, not spatial safety. But again I might have missed something.)
> new methods of communication with the compiler have been established.<p>From what I understand, this appears to a be separate binary from GCC/Clang that does static analysis and outputs C99.<p>Can this be a GCC plugin? I know we can write plugins that are activated when a specific macro is provided, and the GCC plugin event list allows intercepting the AST at every function declaration/definition. Unless you're rewriting the AST substantially, I feel this could be a compiler plugin. I'd like to know a bit more about what kinds of AST transformations/checks are run as part of Cake.
Agreed this is awesome, obviously sanitizers fill some of this gap currently but they aren't great with things like reference counting that RAII makes a doddle. Fwiw, here is an implementation of a runtime RAII style checking on top of leak sanitizer:
<a href="https://perf.wiki.kernel.org/index.php/Reference_Count_Checking" rel="nofollow">https://perf.wiki.kernel.org/index.php/Reference_Count_Check...</a>
There's an interesting overlap with the cleanup attribute that is now appearing in the Linux kernel (by way of systemd):
<a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/cleanup.h" rel="nofollow">https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...</a>
This project is amazing because it also seems that has #embed included, IIRC no other compiler has it yet.<p>Just for that #embed directive I would already use cake for the moment (although it seems like it is only doing the file->array conversion)
This is awesome. Could they reconcile this with [[gsl::Owner]] or gsl::owner<T> somehow so we don't end up with multiple syntaxes in C++?<p><a href="https://reviews.llvm.org/D64448" rel="nofollow">https://reviews.llvm.org/D64448</a><p><a href="https://github.com/microsoft/GSL/blob/main/docs/headers.md#gslowner">https://github.com/microsoft/GSL/blob/main/docs/headers.md#g...</a>
C safety addons like this (there have been many) is that they don't prevent extracting raw pointers from controlled pointers. Optional memory safety isn't.<p>> If this can be reasonably retrofitted to existing libraries and projects<p>That's the problem.<p>If you want to fool around in this space, consider revisiting C++ to Rust conversion.
There's something called Corrode, which compiles C to a weird subset of Rust full of objects that implement C raw pointers. The output is verbose and unmaintainable.
What's needed is something that can figure out how big things are and who owns what, possibly guessing, and generate appropriate ideomatic Rust. Now that LLMs are sort of working, that might be possible.<p>Can you ask Github Co-pilot to look at C code and answer the question "What is the length of the array 'buf' passed to this function"? That tells you how to express the array in a language where arrays have enforced lengths, whicn includes both C++ and Rust. With hints like that, ideomatic translation becomes possible. Bad guesses will result in programs that subscript out of range, which is caught at run time. But guesses should be correct most of the time, because C programmers tend to use the same idioms for arrays with lengths. Forms such as "int read(int fd, char* buf, size_t buf_l)" show up often.<p>Using LLMs to help with tightening up existing code might work.
I think this is a really interesting direction.<p>That it can translate C23 to C89 means it has most of the work in place to translate C23 to C23, or C99 to C99 etc. If that is done in a (mostly) reversible fashion - successfully re-encode back to the original, where you `preprocess -> parse -> unparse -> re-preprocess` which is a nuisance but possible, then it opens the door to much more aggressive type systems.<p>In particular, the input can be C with the ownership annotations, and if they're valid, the output can be C with those annotations dropped to be fed into some other compiler. Or whatever other invariant systems the compiler dev is interested in.<p>Or the input could be C extended with namespace {} syntax, C++ style lambdas, contract checking - whatever you wish really, and the output can be the extensions desugared into C. Templates (possibly the D style ones) can be implemented as instantiating normal functions from said template.<p>That the output is C means this is usable in all the pipelines that already work with C.<p>Good stuff, thanks for posting.
I've been dabbling in embedded programming. Everything is written in C. I just don't understand why. C++ solves pretty much all problems if you want it too (RAII, smart pointers, move semantics) and the frameworks writers wouldn't need to implement their bespoke OOP system on top of opaque pointers and callbacks.<p>Maybe it was bad luck on my part, and other embedded frameworks are better; but I got into both ESP32 and STM32, both frameworks are the worst spaghetti code I have ever seen. You need to jump through at least one, often two layers of indirection to understand what a particular function call will do. Here's an example of what I mean:<p><pre><code> // peripheral_conf.h
#define USE_FOOBAR_PERIPHERAL 1
// obj_t.h
#define USE_OBJ_PARAM2
// In the library header
#ifdef USE_FOOBAR_PERIPHERAL
#define DoSomethingCallback FoobarCallback
#endif
// foobar.h
status_t FoobarCallback(int32_t data, int32_t param);
// obj_t.c
status_t Init(Obj_t* obj) {
obj->param1 = obj->init.initparam & 0xFF;
#ifdef USE_OBJ_PARAM2
obj->param2 = (obj->init.initparam >> 16) & 0xFF;
#endif
obj->callback = DoSomethingCallback;
return OK;
}
status_t DoSomething(Obj_t *obj, int32_t data) {
#ifdef USE_OBJ_PARAM2
return obj->callback(data, obj->param2);
#else
return obj->callback(data, obj->param1);
#endif
}
// main.c
Obj obj = {0};
obj.init.initparam = 0x12345678;
Init(obj);
DoSomething(obj, 0x42);
</code></pre>
And that's an <i>easy</i> example. Macros everywhere, you need to grok what's happening in four different files to understand what the hell a single function call will <i>actually</i> do. Sure, the code is super efficient, because once it's compiled all the extraneous information is pre-processed away if you don't use such and such peripheral or configuration option. But all this could be replaced by an abstract class, perhaps some templates... And if you disable stuff you may not need (RTTI, exceptions) then you'd get just as efficient compiled code. It would be much easier to understand what going on, and <i>you wouldn't be able call DoSomething on uninitialized data</i>... Because you'd have to call the constructor first to even have access to the method.<p>Anyway, thank god for debuggers, step-by-step execution, and IDEs.
lowkey smart pointers are often just used to deflect the responsibility of thinking about memory layouts<p><a href="https://floooh.github.io/2018/06/17/handles-vs-pointers.html" rel="nofollow">https://floooh.github.io/2018/06/17/handles-vs-pointers.html</a><p>and most issues can be caught by using a static analyser of a memory leak checker (getting ppl to consistently use them is another issue, but still)
Do I need to use the Cake frontend to use the ownership library or is it actually macros (or an extension?) I could use in code compiled with gcc or clang?
I think that ownership for C is gross. It's hard to convert code to something like this.<p>But you could get most of the benefit by just isoheaping (strictly allocate different types in different heaps).
This is overly complicated, there is no need to bring Rust semantics to C to ensure memory safety.<p>A good mempool implementation is all you need (i.e keeps track of every request, and zeros out the memory on release)