TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

SuperC: Parsing All of C by Taming the Preprocessor [pdf] (2012)

95 pointsby g0xA52A2Aabout 1 year ago

9 comments

evanjrowleyabout 1 year ago
This is way over my head, but I was reminded of <i>The C language is purely functional</i> by Conal Elliott: <a href="http:&#x2F;&#x2F;conal.net&#x2F;blog&#x2F;posts&#x2F;the-c-language-is-purely-functional" rel="nofollow">http:&#x2F;&#x2F;conal.net&#x2F;blog&#x2F;posts&#x2F;the-c-language-is-purely-functio...</a>
ksherlockabout 1 year ago
The source code: <a href="https:&#x2F;&#x2F;github.com&#x2F;appleseedlab&#x2F;superc&#x2F;">https:&#x2F;&#x2F;github.com&#x2F;appleseedlab&#x2F;superc&#x2F;</a>
评论 #39651244 未加载
DriftRegionabout 1 year ago
Figure 1 spoke to me. It&#x27;s an expanded syntax tree that branches depending on on the value of a preprocessor definition &quot;CONFIG...X&quot;. I&#x27;ve often found myself doing the kind of code archeology that this paper seems to be trying to automate: exploring all the configuration possibilities implied by the codebase &#x2F; build system. A C program that makes heavy use of the preprocessor is generally harder to grok by both h humans and static analysis because 1. the C preprocessor syntax is different from C, 2. the inputs are not necessarily bounded by what appears in the source files alone (&quot;-DCONFIG...X=foo&quot; passed in from the build system), and 3. the resulting program and its control flow may be quite different depending on preprocessor options. As a simple example embedded systems often define an &quot;ASSERT(X)&quot; macro as either noop, an infinite loop, a print statement or the like.<p>This is definitely a niche space but I see clear use for large, portable and configurable c codebases (e.g. Linux kernel, FreeRTOS) for providing better visibility into the configuration system.
评论 #39654555 未加载
mncharityabout 1 year ago
Fwiw, ~20 years ago my experience was that preprocessor use in open-source C code was <i>very</i> idiomatic, and iirc, a simple backtracking parser with idioms was sufficient to parse all I tried it against, including the linux kernel.
kazinatorabout 1 year ago
By the way, GNU Bison implements general LR (GLR) parsing by something that can be called &quot;fork merge LR&quot;. The documentation states that Bison&#x27;s GLR algorithm resolves ambiguities by forking parallel parses, which then merge. It&#x27;s not the same as forking due to a preprocessor conditional, but worth mentioning.
mdanielabout 1 year ago
I am obviously not able to understand what, specific, problem this is solving based on the title of &quot;parsing all of C&quot; when the preprocessor is apparently left intact by design<p><pre><code> static int mousedev_open(struct inode *inode, struct file *file) { int i; #ifdef CONFIG_INPUT_MOUSEDEV_PSAUX if (imajor(inode) == 10) i = 31; else #endif i = iminor(inode) - 32; return 0; } (b) The preprocessed source preserving all configurations </code></pre> and my experience with C is that there are untold number of &quot;unbound&quot; tokens that are designed to be injected in by -D or auto-generated config.h files, so presumably this works closer to the &quot;ready for compilation&quot; phase versus something one could use to make tree-sitter better (as an example)
lacraig2about 1 year ago
This looks really useful, but it seems like an uphill battle even reproducing given the lack of updates in almost the last decade.
评论 #39653532 未加载
kazinatorabout 1 year ago
&gt; <i>In exploring configuration-preserving parsing, we focus on performance.</i><p>Why, because this goose is so thoroughly cooked that all that is left is optimizing for speed?<p>There is a lot of misplaced focus on performance in CS academia, and also in software.<p>Suppose we have some accurate tool that does something useful with a C program, but it takes 5 minutes to run instead of 5 seconds. So what? Someone still wants to use it. Suppose the program is used by millions of people, and that 5 minute run only has to be repeated half a dozen times during development.<p>Get it right, and get it in people&#x27;s hands should be the priorities, and not necessarily in that order.
dzdtabout 1 year ago
This is (2012). I don&#x27;t see that it has been discussed before here though. I guess it didn&#x27;t make much of a splash.
评论 #39651436 未加载