Pnut: A C to POSIX shell compiler you can trust

193 pointsby feeley10 months ago

26 comments

"Because Pnut can be distributed as a human-readable shell script (`pnut.sh`), it can serve as the basis for a reproducible build system. With a POSIX compliant shell, `pnut.sh` is sufficiently powerful to compile itself and, with some effort, [TCC](<a href="https://bellard.org/tcc/" rel="nofollow">https://bellard.org/tcc/</a>). Because TCC can be used to bootstrap GCC, this makes it possible to bootstrap a fully featured build toolchain from only human-readable source files and a POSIX shell.Because Pnut doesn't support certain C features used in TCC, Pnut features a native code backend that supports a larger subset of C99. We call this compiler `pnut-exe`, and it can be compiled using `pnut.sh`. This makes it possible to compile `pnut-exe.c` using `pnut.sh`, and then compile TCC, all from a POSIX shell."Anywhere we can see a step-by-step demo of this process.Curious if the authors tried NetBSD or OpenBSD, or using another small C compiler, e.g., pcc.Historically, tcc was problematic for NetBSD and its forks. Not sure about today, but tcc is still in NetBSD pkgsrc WIP which suggests problems remain.

评论 #41071790 未加载

theamk10 months ago

If you are wondering how it handles C-only functions.. it does not.open(..., O_RDWR | O_EXCL) -> runtime error, "echo "Unknow file mode" ; exit 1"lseek(fd, 1, SEEK_HOLE); -> invalid code (uses undefined _lseek)socket(AF_UNIX, SOCK_STREAM, 0); -> same (uses undefined _socket)looking closer at "cp" and "cat" examples, write() call does not handle errors at all. Forget about partial writes, it does not even return -1 on failures."Compiler you can Trust", indeed... maybe you can trust it to get all the details wrong?

评论 #41057674 未加载

评论 #41064523 未加载

评论 #41053886 未加载

评论 #41055144 未加载

cozzyd10 months ago

Can finally port systemd to shell to quell the rebellion.

评论 #41053453 未加载

okaleniuk10 months ago

I love things like these because they shake our perception of normal loose. And who said our perception of normal doesn't deserve a good shake?A C to shell compiler might seem impractical, but you know what is even more impractical? Having a separate language for a build system. And yet, here we are. Using Shell, Make or CMake to build a C program is only acceptable because is has always been so. It's a "perceived normality" in the C world.There is no good reason, however, CMake isn't a C library. With build system being a library, we could write, read, and, most importantly, debug build scripts just like any other part of the buildable. We already have includeOS, why not includeMake?

评论 #41057422 未加载

评论 #41063187 未加载

评论 #41055414 未加载

评论 #41057825 未加载

评论 #41056623 未加载

评论 #41055728 未加载

评论 #41063407 未加载

wahern10 months ago

This is very cool, regardless of how serious it was intended to be taken. Before base-64 encoders/decoders became more common as preinstalled commands in the environments I found myself on, I wrote a base64 utility in mostly pure POSIX shell:<pre><code> https://25thandClement.com/~william/2023/base64.sh </code></pre> If this project had existed I might have opted to compile my C-based base-64 encoder and decoder routines, suitably tweaked for pnut's limitations.I say base64.sh is mostly pure not because it relies on shell extensions, but because the only non-builtins it depends on are od(1) or, alternatively, dd(1) to assist with binary I/O. And preferably od(1), as reading certain control characters, like NUL, into a shell variable is especially dubious. The encoder is designed to operate on a stream of decimal encoded bytes. (See decimals_fast for using od to encode stdin to decimals, and decimals_slow for using dd for the same.)It looks like pnut uses `read -r` for reading input. In addition to NULs and related raw byte issues, I was worried about chunking issues (e.g. truncation or errors) on binary data, e.g. no newlines within LINE_BUF bytes. Have you tested binary I/O much? Relatedly, how many different shell implementations have you tested your core scheme with? In addition to bash, dash, and various incarnations of /bin/sh on the BSDs, I also tested base64.sh with Solaris' system shells (ksh88 and ksh93 derivatives), as well as AIX's (ksh88 derivative). AIX had some odd quirks with pipelines even with plain text I/O. (Unfortunately Polar Home is gone, now, so I have no easy way to play with AIX; maybe that's for the better.)

评论 #41053576 未加载

评论 #41056991 未加载

voidUpdate10 months ago

When I'm told that "I can trust" something that I feel like I had no reason to distrust, it makes me feel even more suspicious of it

评论 #41059468 未加载

评论 #41057677 未加载

评论 #41057119 未加载

评论 #41056751 未加载

akoboldfrying10 months ago

I was puzzled by the example C function containing pointers. Do I understand correctly that you implement pointers in shell by having a shell variable _0 for the first "byte" of "memory", a shell variable _1 for the second, etc.?

评论 #41053028 未加载

rubicks10 months ago

I can't wait to see the shell equivalents for ptrace, setjmp, and dlopen.

评论 #41056750 未加载

metadat10 months ago

Also see this related submission from May, 2024:Amber: Programming language compiled to Bash <a href="https://news.ycombinator.com/item?id=40431835">https://news.ycombinator.com/item?id=40431835</a> (318 comments)---Pnut doesn't seem to differentiate between `int' and `int*' function parameters. That's weird, and doesn't come across as trustworthy at all! Shouldn't the use of pointers be disallowed instead?<pre><code> int test1(int a, int len) { return a; } int test2(int* a, int len) { return a; } </code></pre> Both compile to the exact same thing:<pre><code> : $((len = a = 0)) _test1() { let a $2; let len $3 : $(($1 = a)) endlet $1 len a } : $((len = a = 0)) _test2() { let a $2; let len $3 : $(($1 = a)) endlet $1 len a } </code></pre> The "runtime library" portion at the bottom of every script is nigh unreadable.Even still, it's a cool concept.

teo_zero10 months ago

Just to be clear, the input must be written in a subset of C, because many constructs are not recognized, like unsigned types, static variables, [] arrays, etc.Is there a plan to remove such limitations?

评论 #41055390 未加载

itvision10 months ago

Instantly make your C code 200 times slower without any effort!

评论 #41056962 未加载

评论 #41056710 未加载

andrewf10 months ago

Looking forward to the point where this can build autoconf. It's great that the generated ./configure script is portable but if I want to make substantial changes to the project I need to find a binary for my machine (and version differences can be quite substantial)

评论 #41053371 未加载

评论 #41053142 未加载

kazinator10 months ago

This is not useful if it doesn't call external libraries.Even POSIX standard ones. Chokes on:<pre><code> #include <glob.h> int main() // must be (); (void) results in syntax error. { glob_t gb; // syntax error here glob("abc", 0, NULL, &gb); return 0; } </code></pre> Nobody needs entirely self-contained C programs with no libraries to be turned into shell scripts; Unix people switch to C when there is a library function they need to call for which there no command in /bin or /usr/bin.If I reduce it to:<pre><code> #include <glob.h> int main() { glob("abc", 0, NULL, 0); return 0; } </code></pre> it "compiles" into something with a main function like:<pre><code> _main() { defstr __str_0 "abc" _glob __ $__str_0 0 $_NULL 0 : $(($1 = 0)) } </code></pre> but what good is that without a definition of _glob.

forrestthewoods10 months ago

Hrmmm. But why?Quite frankly I think Bash scripting is awful and frequently wish shell scripts were written in a real and debuggable language. For anything non-trivial that is.I feel like I’d rather write C and compile it with Cosmopolitan C to give me a cross-platform binary than this.Neat project. Definitely clever. But it’s headed in the opposite direction from what I’d prefer...

评论 #41053236 未加载

评论 #41054108 未加载

评论 #41053310 未加载

评论 #41054256 未加载

vermon10 months ago

If the end goal is portability for C, would Cosmopolitan Libc be a better choice because it supports a lot more features and probably runs faster?

评论 #41054690 未加载

iod10 months ago

I am sorry if this comes off to be negative, but with every example provided on the site, when compiled and then fed into ShellCheck¹, generates warnings about non-portable and ambiguous problems with the script. What exactly are we supposed to trust?¹ <a href="https://www.shellcheck.net" rel="nofollow">https://www.shellcheck.net</a>

评论 #41060714 未加载

osmsucks10 months ago

I'm writing something similar, but it's based on its own scripting language. The idea of transpiling C sounds appealing but impractical: how do they plan to compile, say, things using mmap, setjmp, pthreads, ...? It would be better to clearly promise only a restricted subset of C.

kxndnenfn10 months ago

This is quite interesting! Without having dug deeper into it, seeing the human readable output I assume quite different semantics from C?The C to shell transpiler I'm aware of will output unreadable code (elvm using 8cc with sh backend)

dsp_person10 months ago

I use linux-vt-setcolors in my startup, which would be a bit more convenient if it was a shell script instead of C, but it uses ioctl.Trying to compile with this tool fails with "comp_glo_decl: unexpected declaration"

Retr0id10 months ago

Can it do wrapping arithmetic?The `sum` example doesn't seem to do wrapping, but signed int overflow is technically UB so I guess they're fine not to.Switching it to `unsigned int` gives me:code.c:1:1 syntax error: unsupported type

yencabulator10 months ago

It seems to have practically no error checking. Try compiling<pre><code> int why(int unused) { wat_why_does_this_compile; no_error_checking(); }</code></pre>

atilaneves10 months ago

I'm still figuring out why anyone would want to write a shell script in C. That sounds like torture to me.

JoshTriplett10 months ago

Several times I've found myself wishing for the reverse: a shell-to-binary compiler or JIT.

layer810 months ago

Can you trust that it faithfully reproduces undefined behavior? ;)

gojomybeloved10 months ago

Love this!

o11c10 months ago

It's a bad sign when I immediately look at the screenshot and see quoting bugs.

评论 #41052900 未加载