"Because Pnut can be distributed as a human-readable shell script (`pnut.sh`), it can serve as the basis for a reproducible build system. With a POSIX compliant shell, `pnut.sh` is sufficiently powerful to compile itself and, with some effort, [TCC](<a href="https://bellard.org/tcc/" rel="nofollow">https://bellard.org/tcc/</a>). Because TCC can be used to bootstrap GCC, this makes it possible to bootstrap a fully featured build toolchain from only human-readable source files and a POSIX shell.<p>Because Pnut doesn't support certain C features used in TCC, Pnut features a native code backend that supports a larger subset of C99. We call this compiler `pnut-exe`, and it can be compiled using `pnut.sh`. This makes it possible to compile `pnut-exe.c` using `pnut.sh`, and then compile TCC, all from a POSIX shell."<p>Anywhere we can see a step-by-step demo of this process.<p>Curious if the authors tried NetBSD or OpenBSD, or using another small C compiler, e.g., pcc.<p>Historically, tcc was problematic for NetBSD and its forks. Not sure about today, but tcc is <i>still</i> in NetBSD pkgsrc WIP which suggests problems remain.
If you are wondering how it handles C-only functions.. it does not.<p>open(..., O_RDWR | O_EXCL) -> runtime error, "echo "Unknow file mode" ; exit 1"<p>lseek(fd, 1, SEEK_HOLE); -> invalid code (uses undefined _lseek)<p>socket(AF_UNIX, SOCK_STREAM, 0); -> same (uses undefined _socket)<p>looking closer at "cp" and "cat" examples, write() call does not handle errors at all. Forget about partial writes, it does not even return -1 on failures.<p>"Compiler you can Trust", indeed... maybe you can trust it to get all the details wrong?
I love things like these because they shake our perception of normal loose. And who said our perception of normal doesn't deserve a good shake?<p>A C to shell compiler might seem impractical, but you know what is even more impractical? Having a separate language for a build system. And yet, here we are. Using Shell, Make or CMake to build a C program is only acceptable because is has always been so. It's a "perceived normality" in the C world.<p>There is no good reason, however, CMake isn't a C library. With build system being a library, we could write, read, and, most importantly, debug build scripts just like any other part of the buildable. We already have includeOS, why not includeMake?
This is very cool, regardless of how serious it was intended to be taken. Before base-64 encoders/decoders became more common as preinstalled commands in the environments I found myself on, I wrote a base64 utility in mostly pure POSIX shell:<p><pre><code> https://25thandClement.com/~william/2023/base64.sh
</code></pre>
If this project had existed I might have opted to compile my C-based base-64 encoder and decoder routines, suitably tweaked for pnut's limitations.<p>I say base64.sh is mostly pure not because it relies on shell extensions, but because the only non-builtins it depends on are od(1) or, alternatively, dd(1) to assist with binary I/O. And preferably od(1), as reading certain control characters, like NUL, into a shell variable is especially dubious. The encoder is designed to operate on a stream of decimal encoded bytes. (See decimals_fast for using od to encode stdin to decimals, and decimals_slow for using dd for the same.)<p>It looks like pnut uses `read -r` for reading input. In addition to NULs and related raw byte issues, I was worried about chunking issues (e.g. truncation or errors) on binary data, e.g. no newlines within LINE_BUF bytes. Have you tested binary I/O much? Relatedly, how many different shell implementations have you tested your core scheme with? In addition to bash, dash, and various incarnations of /bin/sh on the BSDs, I also tested base64.sh with Solaris' system shells (ksh88 and ksh93 derivatives), as well as AIX's (ksh88 derivative). AIX had some odd quirks with pipelines even with plain text I/O. (Unfortunately Polar Home is gone, now, so I have no easy way to play with AIX; maybe that's for the better.)
I was puzzled by the example C function containing pointers. Do I understand correctly that you implement pointers in shell by having a shell variable _0 for the first "byte" of "memory", a shell variable _1 for the second, etc.?
Also see this related submission from May, 2024:<p><i>Amber: Programming language compiled to Bash</i> <a href="https://news.ycombinator.com/item?id=40431835">https://news.ycombinator.com/item?id=40431835</a> (318 comments)<p>---<p>Pnut doesn't seem to differentiate between `int' and `int*' function parameters. That's weird, and doesn't come across as trustworthy at all! Shouldn't the use of pointers be disallowed instead?<p><pre><code> int test1(int a, int len) {
return a;
}
int test2(int* a, int len) {
return a;
}
</code></pre>
Both compile to the exact same thing:<p><pre><code> : $((len = a = 0))
_test1() { let a $2; let len $3
: $(($1 = a))
endlet $1 len a
}
: $((len = a = 0))
_test2() { let a $2; let len $3
: $(($1 = a))
endlet $1 len a
}
</code></pre>
The "runtime library" portion at the bottom of every script is nigh unreadable.<p>Even still, it's a cool concept.
Just to be clear, the input must be written in a subset of C, because many constructs are not recognized, like unsigned types, static variables, [] arrays, etc.<p>Is there a plan to remove such limitations?
Looking forward to the point where this can build autoconf. It's great that the generated ./configure script is portable but if I want to make substantial changes to the project I need to find a binary for my machine (and version differences can be quite substantial)
This is not useful if it doesn't call external libraries.<p>Even POSIX standard ones. Chokes on:<p><pre><code> #include <glob.h>
int main() // must be (); (void) results in syntax error.
{
glob_t gb; // syntax error here
glob("abc", 0, NULL, &gb);
return 0;
}
</code></pre>
Nobody needs entirely self-contained C programs with no libraries to be turned into shell scripts; Unix people switch to C when there is a library function they need to call for which there no command in /bin or /usr/bin.<p>If I reduce it to:<p><pre><code> #include <glob.h>
int main()
{
glob("abc", 0, NULL, 0);
return 0;
}
</code></pre>
it "compiles" into something with a main function like:<p><pre><code> _main() {
defstr __str_0 "abc"
_glob __ $__str_0 0 $_NULL 0
: $(($1 = 0))
}
</code></pre>
but what good is that without a definition of _glob.
Hrmmm. But why?<p>Quite frankly I think Bash scripting is awful and frequently wish shell scripts were written in a real and debuggable language. For anything non-trivial that is.<p>I feel like I’d rather write C and compile it with Cosmopolitan C to give me a cross-platform binary than this.<p>Neat project. Definitely clever. But it’s headed in the opposite direction from what I’d prefer...
I am sorry if this comes off to be negative, but with every example provided on the site, when compiled and then fed into ShellCheck¹, generates warnings about non-portable and ambiguous problems with the script. What exactly are we supposed to trust?<p>¹ <a href="https://www.shellcheck.net" rel="nofollow">https://www.shellcheck.net</a>
I'm writing something similar, but it's based on its own scripting language. The idea of transpiling C sounds appealing but impractical: how do they plan to compile, say, things using mmap, setjmp, pthreads, ...? It would be better to clearly promise only a restricted subset of C.
This is quite interesting! Without having dug deeper into it, seeing the human readable output I assume quite different semantics from C?<p>The C to shell transpiler I'm aware of will output unreadable code (elvm using 8cc with sh backend)
I use linux-vt-setcolors in my startup, which would be a bit more convenient if it was a shell script instead of C, but it uses ioctl.<p>Trying to compile with this tool fails with "comp_glo_decl: unexpected declaration"
Can it do wrapping arithmetic?<p>The `sum` example doesn't seem to do wrapping, but signed int overflow is technically UB so I guess they're fine not to.<p>Switching it to `unsigned int` gives me:<p>code.c:1:1 syntax error: unsupported type
It seems to have practically no error checking. Try compiling<p><pre><code> int why(int unused) {
wat_why_does_this_compile;
no_error_checking();
}</code></pre>