Greetings,<p>I am in my late 30s without a formal CS education. I write software for a living and have this itch to learn to write a compiler / interpreter.<p>DO I have to take any courses (e.g. Discrete Maths, Automata Theory) in order to do it?<p>Do I have to take any courses? If so, do you have recommendations for those as online courses?<p>I really want to do this in 2019 but the thought that I have to study Maths, then Automata, maybe OS, then Compilers sounds like a long path.<p>Is there who has learned to write compiler / interpreter without CS background? How did you do it?
I would highly recommend you start with this book first <a href="https://interpreterbook.com" rel="nofollow">https://interpreterbook.com</a> and then feel free to read Nystrom's online book as ablerman suggested.<p>The you can read more formal CS book for the sake of covering the necessary theory.<p>Another suggestion would be to check the source code of various SQL databases, such as SQLite3 and PostgreSQL.<p>Also, I would suggest to read lots of lots of source code from github.<p>Here is a short list of suggestions:<p><pre><code> * https://github.com/lemon-lang/lemon
* https://github.com/d5/tengo
* https://github.com/wren-lang/wren (by Robert Nystrom)
* https://github.com/tj/luna (by TJ Holowaychuk, creator of Express.js)</code></pre>
I cannot recommend Thorsten Ball's books highly enough. They're really fantastic, step-by-step tutorials, and you certainly don't need a CS degree to understand what he's doing.<p><i></i>Writing An Interpreter In Go<i></i>
<a href="https://amzn.to/2FYwwiQ" rel="nofollow">https://amzn.to/2FYwwiQ</a><p><i></i>Writing A Compiler In Go<i></i>
<a href="https://amzn.to/2sLilph" rel="nofollow">https://amzn.to/2sLilph</a>
Robert Nystrom has been putting an online book together. I’ve found it really clear so far.<p><a href="http://craftinginterpreters.com/" rel="nofollow">http://craftinginterpreters.com/</a>
Graydon Hoare's "One-Day Compilers" presentation: <a href="http://www.venge.net/graydon/talks/mkc/html/mgp00001.html" rel="nofollow">http://www.venge.net/graydon/talks/mkc/html/mgp00001.html</a><p><pre><code> > goal: translate subset of makefile syntax into native executables
result: pipeline to compile subset of makefile syntax into native executables, in less than 400 lines of code
> including
> * lexing and parsing
> * variable binding
> * type checking
> * error reporting
> * native code generation
- use custom camlp4 pre processor
- define Ocaml types and functions to model things found in input makefile (variable, rules, dependencies). leverage Ocaml semantics
- parse input simple makefile language
- map parsed input into AST for Ocaml functions defined earlier
- use ocaml's quotation machinery to define ocaml AST values from standard ocaml syntax, without building AST values by hand
- bundle everything into an ocaml program
- ocaml program produces single ocaml value that models the input program described by makefile
- translate ocaml value into C code:
- generate C code from templated quotations, again using ocaml quotation support, but for C language, not ocaml
- compile generated C code with gcc
- glue everything together into one pipeline</code></pre>
If you're familiar with recursion and basic data structures, that's probably a sufficient background. Toy compilers aren't that difficult when you break them down into parts. (Real world languages end up with warts and special cases).<p>You have a text file. You break it up into tokens (numbers, strings, keywords, operators, etc) . You assemble the tokens into a structured tree (statements, expressions, etc). Then you walk the tree and generate code (for a compiler) or execute it (for an interpreter). That's it. Really.<p>Every other week, somebody posts a tutorial to HN about writing a lisp interpreter in javascript. or go. or python. or rust. or lisp. If you can't wait, Let's Build a Compiler by Jack Crenshaw is an oldie but a goodie.<p><a href="https://compilers.iecc.com/crenshaw/" rel="nofollow">https://compilers.iecc.com/crenshaw/</a>
I designed a programming language and wrote an interpreter for it [1] for fun without a formal CS education. I started a series of articles [2] to teach people everything they need to do the same. Give it a try :).<p>[1] <a href="https://github.com/ftchirou/blink" rel="nofollow">https://github.com/ftchirou/blink</a><p>[2] <a href="https://hackernoon.com/lets-build-a-programming-language-2612349105c6" rel="nofollow">https://hackernoon.com/lets-build-a-programming-language-261...</a><p>(P.S. Feel free to reach out to me [github_username] at gmail.com)
You may be interested in this <a href="https://beautifulracket.com/stacker/intro.html" rel="nofollow">https://beautifulracket.com/stacker/intro.html</a>. At the end you have a small interpreter that can handle math. There are further tutorials on <a href="https://beautifulracket.com" rel="nofollow">https://beautifulracket.com</a> to expand it. This is probably the most approachable option.
You can get started on the compiler/interpreter specific learning as long as you already know a programming language and basic data structures. You do not need any math skills.<p>1) Lexical Analysis (Tokeniser)<p>The first step of your compiler takes in free form text and turns it into a set of tokens that make the rest of the process much easier. To understand this you should learn about Regular Expressions and Finite State Machines.<p>2) Syntax Analysis (Parsing)<p>Processing the above tokens and ensuring the input has the valid syntax of a program is your next step. You need to learn about Context Free Grammers, Backus Naur Form and Abstract Syntax Trees.<p>3) Semantic Analysis<p>Just because the input has valid syntax does not mean it makes actual sense. Here you need to learn about Symbol Tables and Type Checking. The specifics are very dependant on the actual language you are dealing with.<p>4) Code Generation<p>A compiler will be generating code but an interpreter will perform actual execution of the Abstract Syntax Tree operations. Again, the specifics are very dependant on the actual language you are dealing with and the target. Start by looking into Code Generation, Code Optimizations and Assembly Code.<p>I recommend an online, free course, like the following from Stanford that is self-paced and results in a compiler that generates assembly level instructions for test running in an emulator.<p><a href="https://lagunita.stanford.edu/courses/Engineering/Compilers/Fall2014/about" rel="nofollow">https://lagunita.stanford.edu/courses/Engineering/Compilers/...</a>
@stephen82 has great references. In particular wren, and luna are references I used when I created my language como-lang-ng: <a href="https://github.com/rmccullagh/como-lang-ng" rel="nofollow">https://github.com/rmccullagh/como-lang-ng</a> You do not need Dicrete Math, or Automata Theory to learn how to write a compiler and interpreter.<p>I would recommened first, figuring out what you want your language to look like (the syntax). Then, you can define a grammar for your language, and research parsers. From parsing, you will be leade to Abstract Syntax Trees (the most modern way to get from text file to bytecode). After that you'll need to compile the abstract syntax tree (which you would have built from the parser). Compiling the AST is typically done targetting your own simplified instruction set.<p>For an example instruction set, check out Python's header file at <a href="https://github.com/python/cpython/blob/master/Include/opcode.h" rel="nofollow">https://github.com/python/cpython/blob/master/Include/opcode...</a>
Some books on compilers go into the automata theory for the following reason: they teach about the workings of <i>tools that generate</i> lexical analyzers and parsers. A compiler writer is usually just a consumer of those tools. The skills that come into play is being good in program design: structuring code and data.<p>How much CS comes into play depends on how advanced you make the compiler and what design choices you make. E.g. if you make a stack-based VM, or one with unlimited registers, instead of targeting a real machine with a fixed number of registers, you don't have to study the algorithms for register allocation.<p>Some languages avoid the complexities of parsing, such as those in the Forth family, and to a large extent Lisp family. To make a language, you don't even need to be a consumer of scanner and parser generating tools, let alone understand how they work.
I think this book suits your situation and I've been into that!<p>Compiler Construction Using Java, JavaCC, and Yacc, IEEE/Wiley, 2012<p>Get this book. Please. __<i>I promise</i>__. You'll be able to grok compilers after reading and answering the exercises from this book.<p>We can talk further by sending me an email (see my profile).<p>Here's my previous comment somewhere in HN:<p>I bet this is what you are looking for: <a href="https://www.amazon.com/Compiler-Construction-Using-Java-Java.." rel="nofollow">https://www.amazon.com/Compiler-Construction-Using-Java-Java...</a>.<p>This book taught me how to write a compiler and build a programming language of my own.<p>To answer you question "where and how did you start?": This is where I started learning about compilers and programming language implementation.<p>Here is its description from its website:<p>* Comprehensive treatment of compiler construction.<p>* JavaCC and Yacc coverage optional.<p>* Entire book is Java oriented.<p>* Powerful software package available to students that tests and evaluates their compilers.<p>* Fully defines many projects so students can learn how to put the theory into practice.<p>* Includes supplements on theory so that the book can be used in a course that combines compiler construction with formal languages, automata theory, and computability theory.<p>What I promise you with this book: You'll learn how to write your own programing language. Not only how compilers and about the language but also about computer architecture. This book is very easy to read and understand. It starts with very basic topics then slowly building from it until you'll grok implementing the language given the specification. You'll have the chance to build a compiler from scratch or by using JavaCC and Yacc.
If you don’t want to read up front, don’t; start with a calculator. If you know regular expressions (and even if you don’t), getting something that handles<p><pre><code> a=3
b=4
a=a+b
print a
</code></pre>
shouldn’t be too hard. In iteration 1, forget about expressions with more than one operator such as<p><pre><code> a=3*b+c
</code></pre>
If you think it’s easier, also forget about multi-character identifiers (a…z should be enough for everybody) types (all variables can be int or, if desired, float)<p>Also, initially forget about making a product that doesn’t crash on invalid input. If a…z is enough for everybody, your toy language may crash on<p><pre><code> A=3
</code></pre>
or<p><pre><code> aa=0
</code></pre>
Then, add features as you see fit. You will run into problems, such as questioning whether<p><pre><code> print=3
print print
</code></pre>
should be a valid program, or whether it’s a good idea if<p><pre><code> a=2+3*4
print a
</code></pre>
prints ‘20’, but that’s part of the fun (if you have a puzzling mind, figuring out how to do that ‘properly’ without reading about it isn’t out of reach). And those features you add shouldn’t be large. For example, looping:<p><pre><code> a=1
@again
print a
a=a+1
>again
</code></pre>
add simple if statements:<p><pre><code> a=1
@again
print a
a=a+1
if a<20 >again
</code></pre>
The end result will likely be buggy in places. For example, if you decide to support<p><pre><code> a=3;b=4
</code></pre>
and parse by splitting each line on semicolons, adding strings<p><pre><code> a=“strings may have semicolons, too;”
</code></pre>
may result in an interpreter/compiler that is broken for a while. Who cares? You aren’t building a compiler or interpreter, you’re learning.<p>Also: start with an interpreter. A compiler isn’t that much harder, but requires you to know about your system’s assembly, object code format, etc.
I found tutorials like this [0] one to be pretty useful, since they show how to actually use frameworks like LLVM, and lex/yacc/etc. practically. Note that this one is from 2009, and uses an outdated version of LLVM. Actually following along requires some debugging and digging through documentation, which is also helpful in its own way.<p>[0] <a href="https://gnuu.org/2009/09/18/writing-your-own-toy-compiler/" rel="nofollow">https://gnuu.org/2009/09/18/writing-your-own-toy-compiler/</a>
Practical steps:<p>1. Create a grammar for your language. Start with Lex/Yacc and model your grammar with Lex/Yacc. Use them to generate lexer/parser.<p>2. Create sample code in your new language. Parse the generated parser on said language file into a list of Syntax trees.<p>3. With said Syntax trees, generate target code - Compiler (or) evaluate the tree and print results - Interpreter.<p>Iterate and repeat steps until your language is complete.
Well, the way you ask your question in abstract sort of biases it towards a CS-heavy approach, doesn't it?<p>What's your goal?<p>I've written non-trivial macros that could be understood to be a tiny compiler, but wasn't ever interested in compiler design and I can't claim to know much about it.