Ask HN: LLVM vs. C?

38 点作者 danielEM将近 2 年前

What does LLVM have that can not be achieved 1 to 1 in C? And what does C have that can not be reproduced 1 to 1 in LLVM?And when I say C I mean tcc, sdcc, gcc, clang and other C compilers.It feels to me like C should be superior to LLVM and allow to do anything that is possible to do with LLVM, but maybe I'm wrong? (asking about it due to all that "fuzz-buzz" about zig going for C as its main target)Would appreciate real down to the ground answers, not a sense of how things work or generalizations - as that will introduce unnecessary noise in judgement of comparison.

19 条评论

jcranmer将近 2 年前

> What does LLVM have that can not be achieved 1 to 1 in C?Do you want to have signed integer overflow not be undefined behavior? Sorry, you can't do that in C. Do you want to support SIMD vector types? Oops, no support in C. Underaligned memory accesses? Packed structures? Support for handling unwind structures? Coroutines?> It feels to me like C should be superior to LLVM and allow to do anything that is possible to do with LLVM, but maybe I'm wrong?Very much the inverse: as LLVM is used to implement a C compiler, LLVM can faithfully reproduce all of the C semantics, but C cannot be used to implement all of the LLVM features. Even if you pretend C doesn't have undefined behavior, and even if you include compiler extensions as part of C, there are still a few LLVM instructions and intrinsics that just don't exist in C (chief among them is invoke, which is used to implement C++ exception handling).

评论 #36544003 未加载

评论 #36544023 未加载

评论 #36544014 未加载

评论 #36543854 未加载

coreyp_1将近 2 年前

Well, clang actually uses LLVM.As I see it: LLVM is like a universal remote. It is a quasi-assembly representation that can then be further compiled to multiple architectures. If you target LLVM, then you can compile to anything that LLVM can compile to. To support additional architectures, then all that needs to happen is for that architecture to work with LLVM, and then every project that uses LLVM will now support that architecture.C, on the other hand, relies on a compiler. Because a compiler may be specifically designed for one architecture, then it may (although it is not guaranteed) generate better binaries than LLVM.Because C can be compiled using clang, which uses LLVM, then there is nothing that C can do that LLVM cannot.It may, however, be possible to produce opcode sequences using LLVM that is not possible in C, since C imposes semantics and structure on a program (syntax).Lastly, it may be a simple question of resources. Is it easier to find people who know C, or people who know LLVM? Is it easier to set up a toolchain that compiles C, or one that compiles LLVM? Historically speaking, which one has been around longer? (C, of course.) Do the semantics of the source language closely match the available C paradigms? Which is more stable?

tibordp将近 2 年前

You can go surprisingly far with C, though LLVM is probably a better long-term option for a serious compiler, because it's a tool made for the job (unless you target exotic and/or embedded platforms that don't have LLVM support - but that's fairly unlikely).C is very easy to get started with if you don't already know LLVM. You don't have to flatten everything to SSA + basic blocks and can keep expressions as trees. The downside is that once your compiler is reasonably complete, you may spend quite a bit of time working around quirks of C (e.g. int promotion is very annoying when you already have full type information, so your compiler either has to understand C semantics fairly well or defensively cast every subexpression).I have a C backend in my compiler (<a href="https://github.com/alumina-lang/alumina">https://github.com/alumina-lang/alumina</a>) and it works really well, though the generated C is really ugly and assembly-like. With #line directives, you can also get source-level debugging (gdb/lldb) that just works out of the box.There are a few goodies that LLVM gives you that you don't get with C, like coverage (<a href="https://clang.llvm.org/docs/SourceBasedCodeCoverage.html" rel="nofollow noreferrer">https://clang.llvm.org/docs/SourceBasedCodeCoverage.html</a>). It works when done through clang, but cannot easily be made to track the original sources.

KerrAvon将近 2 年前

Your questions don't make sense. LLVM is a framework for building compilers and tools. C is a language specification.

评论 #36543663 未加载

评论 #36543900 未加载

pizlonator将近 2 年前

The main downside of targeting C instead of llvm is that C is slower to parse and validate than llvm IR.Another downside of C is that to use C as a backend well, you’ll have to rely on nonstandard compiler flags like -fwrapv and -fno-strict-aliasing (among others). I don’t think this is actually a downside of C, since support for those flags (or their equivalents) is widespread.Another downside of C is that there will be a few intrinsics or other weird features you won’t get. But it’s not clear to me that those make enough of a difference to matter.You could argue that a downside of C is that you won’t be able to easily add debugging support that way. But llvm’s support for debug info is so bad (it’s hella inaccurate) that I’m not sure this is a real downside of C.A major upside of C is that there are more diverse backends for C than for llvm IR.Another major advantage of C is that with a modicum of care you can make your backend produce readable and hackable C code. This speeds up the debugging process greatly. (And by debugging I mean debugging your language implementation, not users of your implementation debugging their programs.)So, if I was implementing an AOT language today, my first cut would be a C backend and my second cut would be a custom backend. So, I would do what Zig is doing.

erichocean将近 2 年前

There's almost no reason in 2023 [0] to target LLVM IR directly anymore for a new programming language. I would highly recommend targeting MLIR instead of LLVM. [1]Eventually, your language will benefit from adding a "middle end" and MLIR is the best choice for building that today (and it's likely to remain so for the foreseeable future, since there's very little competition and it has a ton of momentum). It handles infrastructure-level middle-end stuff that it's difficult to improve upon, so you'd just waste dev hours reimplementing boilerplate stuff yourself.Before you build a middle end for your language, you can just target MLIR's LLVM IR dialect and you'll be around a dozen lines of code away from compiling with LLVM and invoking LLVM's JIT. So it's basically cost-free to target MLIR today and have an easy path to incrementally defining a new dialect for your middle end when it becomes profitable to do so.[0] A good reason to not target MLIR is you are targeting an existing runtime like the JVM or CLR, or writing a transpiler to some existing language (e.g. JavaScript). Since you mentioned C as an option, this wouldn't apply to you.[1] MLIR is part of the larger LLVM project.

dleslie将近 2 年前

So let's assume we mean C as a target, in the way one might target LLVM IR. What LLVM IR offers over C11 or C20 is a great deal more tools to describe to the compiler how one might engage in micro-optimizations.Ie, linkage[0], lifetime[1], and ordering[2] information can be critical to delivering performance.Now that said, GCC is _pretty damned amazing_ at squeezing out this information from C, and it's available for use just about everywhere. But it's good at figuring it out for _hand-written_ C. The sort of C that's generated as a compilation output from other languages, like Chicken Scheme and Nim, may not be written in a way that allows the C compiler to fully take advantage of its optimization abilities.What I find with generated C is that it often is prone to blowing the stack or missing cache far more than hand-written C would. There's often something funky happening (Cheney-on-the-MTA) or it's a soup of function pointers and object references scattered haphazardly across memory.0: <a href="https://llvm.org/docs/LangRef.html#linkage-types" rel="nofollow noreferrer">https://llvm.org/docs/LangRef.html#linkage-types</a>1: <a href="https://llvm.org/docs/LangRef.html#object-lifetime" rel="nofollow noreferrer">https://llvm.org/docs/LangRef.html#object-lifetime</a>2: <a href="https://llvm.org/docs/LangRef.html#memory-model-for-concurrent-operations" rel="nofollow noreferrer">https://llvm.org/docs/LangRef.html#memory-model-for-concurre...</a>

评论 #36543755 未加载

zokier将近 2 年前

For this sort of discussion it is important to distinguish between GCC/Clang-C and portable ISO/ANSI C, they are quite different in their range. Thinking of it, looking at the compiler extensions available in clang is probably insightful to see what you might be missing out on if you restrict yourself to portable C only.As for specific features, for example function and parameter attributes[1][2] generally are not directly accessible in ISO C. You also have bit easier time controlling FP ops[3]; in ISO C they are annoyingly underdefined imho. For integers you have arbitrary width integer types (i42 is valid type in llvm, not expressible in C). For both you also have first-class vector types[4], while ISO C is afaik completely blind to vectorization.That is just scratching the surface here. C doesn't really map anymore nicely to all the low-level intricacies we have these days like it did in the K&R days.[1] <a href="https://llvm.org/docs/LangRef.html#function-attributes" rel="nofollow noreferrer">https://llvm.org/docs/LangRef.html#function-attributes</a>[2] <a href="https://llvm.org/docs/LangRef.html#parameter-attributes" rel="nofollow noreferrer">https://llvm.org/docs/LangRef.html#parameter-attributes</a>[3] <a href="https://llvm.org/docs/LangRef.html#floating-point-environment" rel="nofollow noreferrer">https://llvm.org/docs/LangRef.html#floating-point-environmen...</a>[4] <a href="https://llvm.org/docs/LangRef.html#vector-type" rel="nofollow noreferrer">https://llvm.org/docs/LangRef.html#vector-type</a>

distcs将近 2 年前

Great question. I'd like to know too! I always thought C is a better target for compiler makers because every target architecture has got a C compiler for it. The number of targets that LLVM support is far less impressive. Also ton of work has gone into optimizing C compilers for any architecture you can imagine. I'd like to know too why compiler makers generally target LLVM instead of C?

评论 #36538917 未加载

duped将近 2 年前

I mean if you didn't have something like LLVM you couldn't achieve anything in C so it's a bit of a nonsense question.But if you're asking the advantage of targeting LLVM directly as opposed to targeting C then the answer is that you don't have to implement everything in terms of C's semantics, compilation model, and memory model. While you probably can the question is should you, and what that costs in terms of optimizations left on the table.Some examples are the strict aliasing model, support for generators out of the box with blockaddr, guaranteed tail call optimization via specialized calling conventions, zero cost exception handling, and so on.That's not even getting into some of the annoying stuff that you as a language author will probably care about at some point, like relying on glibc.

pornel将近 2 年前

You don't control the stack directly, and can't implement custom unwinding (you can piggyback on C++ exceptions).You don't control aliasing info beyond C's minimum. You don't get true immutability, since C's const allows its target to be mutated from elsewhere.You don't control what happens before main(). You don't get static constructors, unless you use C++.You have almost no control over debug information.You have very little control over inlining.You don't control calling conventions. You can't control alignment.There's a bit more you can do if you use non-standard C extensions, but generally you'll always be an abstraction layer away from the code you want to generate, and have your language's semantics tainted by C's semantics.

评论 #36545833 未加载

Findecanor将近 2 年前

LLVM-IR has low-level features not offered by C, used for compiling other languages.For instance exception handling (used for C++), operations with checks for overflow (used by both Swift and Rust, I suspect), and arithmetic with integers smaller than an "int" (also, Swift and Rust).I think LLVM also has explicit support for coroutines and multi-entry functions.If your language needs any of those features, implementing any of those on top of C could be difficult to get right and would probably not perform as well.

syntheweave将近 2 年前

C isn't "just C", it's also stdlib and the assorted headers for platform code.This little distinction makes all the difference in terms of immediate utility, because in practical usage, languages tend to be libraries with the language attached, rather than the converse.To be a serious replacement to C, Zig had to resolve this question about what a standard library should look like going forward, and across more exotic targets like WebAssembly. They made some thoughtful choices, more than I can summarize(especially since I haven't followed the project in a while).When your language is "compile to C" you get stdlib for free...and you have to deal with cross-platform behavior in a C-like way forevermore. And this is a huge hindrance because it means your build system is now in the 1970's, and anything you accrete upon it eventually gets dragged back there because a user wants to get some Big Important Library compiling and nobody is on the same page about how to do that.If you target LLVM you get compiler infrastructure, but also options about what you want the overall experience to look like, which can include C header compatibility, or something else. It's more feasible to achieve a thoughtful design like what Zig does.

patmorgan23将近 2 年前

TypeError: Cannot compare Types COMPLIER and LANGUAGELLVM is an open source modular complier system. There are three main pieces:The Front-end. This is a part that reads in your source code files and generates any syntax errors. Clang is an example of a front end. The front end produces IR or "Internal Representation" which is consumed by the other parts of the compiler.The optimizer. Pretty self explanatory. The optimizer takes in IR and (ideally) moves stuff around some that the final executable runs faster.The backend. This is where the magic really happens. The backend takes in IR and produces that sweet sweet machine code that we're really after.The benefit to zig of using LLVM is that they only have to write/maintain a Frontend. I don't think gcc and other C/C++ compliers where quite designed for modularity like LLVM was so building a new front end for them is difficult compared to LLVM.

slavapestov将近 2 年前

LLVM is not a language, it’s a C++ library. There also happens to be a printer and parser for a textual representation of the IR, but it’s far from complete and doesn’t let you do many things the C++ allows.The downside is that your frontend pretty much has to be written in C++ (or you maintain your own bridging wrapper library like Rust does).For a hobby project, go with whatever is easiest. If you’re a C++ programmer and don’t mind writing your frontend in C++, take a look at LLVM. Writing an interpreter is also a valid choice. You can still learn a lot about compiler design by targeting a suitable bytecode format instead of a real CPU.Production compilers these days either target the LLVM C++ API or have their own code generation backend.

DamonHD将近 2 年前

<a href="https://stackoverflow.com/questions/10264635/compiler-output-language-llvm-ir-vs-c" rel="nofollow noreferrer">https://stackoverflow.com/questions/10264635/compiler-output...</a>Gives some good reasons. I just searched for LLVM vs C.

评论 #36540384 未加载

viraptor将近 2 年前

This is a very underspecified question. Achieved in what way / for what purpose? Do you want to target either one as a compilation target? Are you thinking of writing either one directly? What's the aspect you care about? (readability, performance, portability, ...)Otherwise it's hard to really compare them in any reasonable way. Maybe you have something in mind, but currently the question reads like you don't understand very well what llmv is. (I'd recommend searching / reading on that topic first if that's the case)

anon291将近 2 年前

Off the top of my head... Llvm allows much more complex branching since it's done at the basic block level whereas c enforces particular structures

thrownay2341将近 2 年前

Use LLVM if it supports the target platform you require.Targeting C in current year doesn't make sense when Zig and C++ are available.Don't forget JVM or BEAM.