My first fifteen compilers

387 pointsby dhanushover 7 years ago

15 comments

kornishover 7 years ago

Favorite quote:> There’s a wealth of tutorials, courses, books, and the like about how to write compilers. But if somebody believes that writing a transpiler isn’t fundamentally the same thing as writing a compiler, it may not occur to them to look at any of that material.The basic argument is this: "compiler" isn't a term that needs to be limited from transforming a high-level input to a low-level output. Any program which is structured to map an AST to another AST, no matter the relative levels of abstraction, can borrow from many of the principles of modern compiler theory, including using many small passes rather than monolithic rewrites.At a company I worked for, we compiled a high-level declarative expression language of our own devising into SQL and other back-end representations, including English. Is that a "compiler" as many see it? No. However, thinking about the problem like a compiler problem gave us lots of insights into how to architect our product; it let us draw on an existing wealth of knowledge to make improvements quickly and reliably.

评论 #15154381 未加载

评论 #15154668 未加载

评论 #15155331 未加载

评论 #15154743 未加载

评论 #15159943 未加载

TheAceOfHeartsover 7 years ago

I'd strongly suggest diving into compilers if you've never studied the subject. Learning a bit on the subject unlocks a ton of incredibly useful skills. That knowledge helps you implement stuff like autocomplete, linters, syntax highlighting, etc.The Super Tiny Compiler [0] is a very gentle introduction to the subject. It's great because it helps you quickly develop an initial mental model.To give an everyday usage example: I've used jscodeshift [1] many times to safely refactor large amounts of code. In one case, I quickly migrated a project's test assertion library to an alternative which the team agreed was superior. This tool is also typically used by the react team in order to provide a smooth migration path whenever they make changes to the public API.[0] <a href="https://github.com/thejameskyle/the-super-tiny-compiler" rel="nofollow">https://github.com/thejameskyle/the-super-tiny-compiler</a>[1] <a href="https://github.com/facebook/jscodeshift" rel="nofollow">https://github.com/facebook/jscodeshift</a>

评论 #15157003 未加载

评论 #15154618 未加载

评论 #15155377 未加载

SatvikBeriover 7 years ago

As a favor for a friend, I wrote a mini-"compiler" that translated detailed specs for story game scenes into code. I had never done anything similar (my background is more on the Math/Stats side) but I figured hey, what the hell.There were only a few types of possible scenes, so my first approach was to create a data structure for each type of scene that had a method converting it to code. However, this broke badly with conditional/branching paths that could potentially have arbitrary levels of nesting.So my next approach was to create multiple passes using some simple recursive data structures "Parseables" where conditional branches could contain other Parseables (including other conditional branches) as well as multiple passes (Text -> Parseable -> Printable -> Code instead of Text -> Screen -> Code.) This worked quite nicely.Had I realized this was a compiler, I could have probably read some tutorials and not had to do everything from scratch. This would probably have resulted in better engineering, but been a lot less fun.My amateurish code, if anyone is curious: <a href="https://github.com/Satvik/spec-compiler" rel="nofollow">https://github.com/Satvik/spec-compiler</a>

评论 #15157269 未加载

kristianpover 7 years ago

I wonder if scheme-based compiler courses are still run at Indiana University?Abdulaziz Ghuloum's 'Incremental compiler construction' [1] also has a working compiler at the end of each stage. For example after the first week you have a compiler that outputs a program that prints a single integer, the 2nd week immediates. The tutorial is at [2]. It doesn't use a nanopass framework, just builds the complexity of the language. It's for a scheme compiler written in scheme.Apparently Ghuloum was a phd student under Dyvbig who also wrote his own scheme compiler to x86 [3],[4].[1] <a href="http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf" rel="nofollow">http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf</a>[2] <a href="https://raw.githubusercontent.com/namin/inc/master/docs/tutorial.pdf" rel="nofollow">https://raw.githubusercontent.com/namin/inc/master/docs/tuto...</a>[3] <a href="https://en.wikipedia.org/wiki/Ikarus_(Scheme_implementation)" rel="nofollow">https://en.wikipedia.org/wiki/Ikarus_(Scheme_implementation)</a>[4] <a href="https://web.archive.org/web/20101210085823/http://www.cs.indiana.edu/~aghuloum/" rel="nofollow">https://web.archive.org/web/20101210085823/http://www.cs.ind...</a>

评论 #15156786 未加载

评论 #15155178 未加载

mhh__over 7 years ago

I think the nanopass concept - e.g. lots of well defined IRs passing through a pipeline - is a very good idea for teaching, but I'm interested to see how well it performs and also whether, from the programmers perspective, whether this simplifies code or adds unneeded complexity (Specifically, whether they can be optimized quite as well as a big monolithic compiler).I'm currently writing a compiler framework - nothing big, as a learning project, and I think I shall try integrate this idea in some way. I feel it would be particularly useful for compiler backends like GCC or LLVM because having well defined pipelines etc. means that one can hook into the framework cleanly (Potentially a huge saving in compile times, e.g. not having to cart around unneeded libraries/symbols).

评论 #15156027 未加载

评论 #15157117 未加载

userbinatorover 7 years ago

I suppose if you look at it that way, then anyone who has completed Crenshaw's excellent tutorial series[1] could also claim to have written (approximately) the same number of "compilers".With a parser combinator library, you write a parser by starting with a bunch of primitive parsers (say, that parse numbers or characters) and combining them, eventually building up the ability to parse a sophisticated language.That sounds like recursive descent, also a highly recommended method of writing a parser for its simplicity, speed, and ease of error reporting.But if somebody believes that writing a transpiler isn’t fundamentally the same thing as writing a compiler, it may not occur to them to look at any of that material.From what I understand, the definition of a transpiler is one which almost exclusively performs syntax-syntax transforms, and doesn't delve into the semantics with e.g. dataflow or control flow. Thus the lack of material about writing "transpilers" --- or rather, someone looking to write one should instead be seeking out information on "search and replace" algorithms.[1] <a href="https://compilers.iecc.com/crenshaw/" rel="nofollow">https://compilers.iecc.com/crenshaw/</a> --- highly recommended.

评论 #15154199 未加载

评论 #15154998 未加载

评论 #15160222 未加载

gsgover 7 years ago

There's a decent textbook written in the incremental style argued for by the author: Essentials of Compilation. It features compilers for seven languages, written in Racket, each with successively more advanced features.<a href="https://jeapostrophe.github.io/courses/2017/spring/406/notes/book.pdf" rel="nofollow">https://jeapostrophe.github.io/courses/2017/spring/406/notes...</a>

GarvielLokenover 7 years ago

<a href="http://www.craftinginterpreters.com/a-tree-walk-interpreter.html" rel="nofollow">http://www.craftinginterpreters.com/a-tree-walk-interpreter....</a> This link has been up a long time ago on hackernews, or reddit, can't remember which. But i am trying to work through it, it has good basic explanation.And then you have SICP which introduces compilers without really talking about it :p.

Athasover 7 years ago

I am very fascinated by this notion of teaching compilers by starting at code generation and working backwards from there. I've read about nanopass compilers before, but do they also work well if the compiler is written in a statically typed language? They clearly will for passes that do not change the program representation, but my experience is that it is awkward to define large amount of similar-but-slightly-different representations in a statically typed language.

评论 #15155795 未加载

mickronomeover 7 years ago

I've always thought of compilers as usually lossy graph rewriters with input and output data structures usually being 'flatter' in some sense. Maybe a both simplistic and vague model, but it has served me well enough the few times I needed to build one.This isn't meant to validate or invalidate any other view or definition, but I'm curious if there are any good counter examples or theoretical reasons for characterising compilers differently ?Quite naturally, all compilers might not be implemented with explicit graphs and their subsequent rewrites, but implicitly both input and output represent a set of statements encoding a specific set of truths and actions, all of them which is intrinsically related to their various contexts.In this light there is really no distinction between high level and low level targets, but only between various levels of information loss and how explicit the actions and truths are expressed in the input and output.It might be worth noting that most of the perceived information loss, except for names, is only due to human perception and limitations. State of the art decompilers can in many cases recover a surprising amount of the original types and code structures, albeit at considerable computational costs.

评论 #15156589 未加载

评论 #15155757 未加载

ckokover 7 years ago

Most compilers sort of work this way, except that they don't have an intermediate format. They take source, turn it into tokens, turn it into a tree of sorts, run several passes over it till the end result is reached (whatever the target is).My own compilers turn 4 different input languages in the same parser tree (Hand written tokenizers/parsers), do several passes over it until it has a very base set of instructions left, at which point it generates code for whatever backend was picked.The only big difference is that the process outlined in the article has an actual textual intermediate format, if I understand it correctly. Sounds like a lot of work of extra work with little gain, a simpler approach might be to have "ToString" method working on all nodes on all levels that' outputs info clear enough to understand from within a debugger.

评论 #15156046 未加载

steenreemover 7 years ago

Blender is a toolset for creating nanopass compilers: <a href="https://github.com/keyboardDrummer/Blender" rel="nofollow">https://github.com/keyboardDrummer/Blender</a>

pgorczakover 7 years ago

This is an interesting instance of the sorites paradox :)

stephen422over 7 years ago

Anyone got stories about attempts to combine compiler construction with deep learning techniques? As AI related technologies now become realistically implementable, wouldn't the compiler theory be one of the most greatly affected research fields?

评论 #15165119 未加载

austincheneyover 7 years ago

I find many people tend to use the terms parser and compiler almost interchangeably, which is a horrid mistake. These terms are not the same and are not related. So, lets get clear on terminology:* lexer: A scanner. It runs code, typically as a string, through an evaluator and builds pieces based upon known language syntax rules.* parser: A rule evaluator. Parsers typically use lexers to reason about code and then evaluate the pieces to determine context, relationships, and categories that describes the code structure.* compiler: A transformer. Compilers change code from one format (syntax) to another different format.---> With a parser combinator library, you write a parser by starting with a bunch of primitive parsers (say, that parse numbers or characters) and combining them, eventually building up the ability to parse a sophisticated language.I do like that part of the article. I am working on a universal language parser right now. To be truly universal you have to accomplish two big goals:1. parse all the languages2. seamless interchange between the various different parsers (it is a single parser with interchange between the various lexers)Seamless interchange is necessary for languages like JSX, which starts as JavaScript, but can contain XML code units that then escape back to JavaScript syntax. Another example is code blocks in markdown documents where the code block can specific a name of the language described by the code block.

评论 #15155735 未加载