TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: What's the process of writing a new programming language?

14 pointsby audaceabout 9 years ago
I know of the process on a very high level. But what would the first 5-10 steps look like for writing a language that would compile into C (like Go I belive).

7 comments

panicabout 9 years ago
1. Sketch out what you want the first iteration of the language to look like. Write some example programs in your language.<p>2. Write a lexer: a program which turns a string of source code in your language into a list of values like &quot;open parenthesis&quot;, &quot;plus sign&quot;, &quot;if keyword&quot;, &quot;identifier&quot;. These values are called tokens. To test your lexer, output each token and see if your example programs tokenize properly.<p>3. Write a parser: a program which turns a list of tokens into a structured representation of your program&#x27;s source code. For example, &quot;if keyword&quot; &quot;identifier&quot; &quot;equals sign&quot; &quot;number&quot; &quot;then keyword&quot; &quot;print keyword&quot; &quot;identifier&quot; might turn into an IfStatement with a predicate EqualsExpression (that itself has a left IdentifierExpression and a right LiteralNumberExpression) and a list of Statements for the code to run. You can write the parser yourself (look up recursive descent parsing) or use a parser generator tool to do it.<p>4. Write a code generator: a program which goes through your structured representation and outputs lower-level code (in this case, C) for each expression and statement.
评论 #11622472 未加载
评论 #11626605 未加载
kayamonabout 9 years ago
Go read the Jack Crenshaw series of articles. It&#x27;s not the most relevant nowadays, but you can&#x27;t beat it for simplicity.
评论 #11657223 未加载
david927about 9 years ago
Here is a place to start:<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Compilers:_Principles,_Techniques,_and_Tools#Second_edition" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Compilers:_Principles,_Techniq...</a>
评论 #11617005 未加载
GregBuchholzabout 9 years ago
If you are interested new languages, then you might want to think about writing an interpreter first, and then writing a compiler later when you have more experience with your new language. Languages like Prolog are good for making interpreters, along with Lisp&#x2F;Scheme and ML&#x2F;Haskell.
robertelderabout 9 years ago
I haven&#x27;t made my own programming language yet, but I am working on a from-scratch C compiler (<a href="http:&#x2F;&#x2F;recc.robertelder.org&#x2F;" rel="nofollow">http:&#x2F;&#x2F;recc.robertelder.org&#x2F;</a>), so I&#x27;ll give you a few ideas:<p>1) You&#x27;ll probably want to start by thinking about what the programming language will do, and what the grammar (<a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Extended_Backus%E2%80%93Naur_Form" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Extended_Backus%E2%80%93Naur_F...</a>) of the language will look like. I would recommend starting by writing an LL grammar (<a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;LL_grammar" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;LL_grammar</a>), so you can write a recursive descent parser for it (<a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Recursive_descent_parser" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Recursive_descent_parser</a>). You will also need to be careful to not introduce indirect, or direct left recursion into your grammar (<a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Left_recursion" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Left_recursion</a>).<p>2) The first step will naturally lead you to need to consider tokenization: <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Tokenization_(lexical_analysis)" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Tokenization_(lexical_analysis...</a><p>3) Some caveats of step 1 include the dangling else problem (<a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Dangling_else" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Dangling_else</a>) and other grammar ambiguities (<a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Ambiguous_grammar" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Ambiguous_grammar</a>). A recursive descent parser will likely need to do backtracking so you&#x27;ll want to think about how you can also backtrack any internal state that gets build up as the parser does its thing.<p>4) Once you can create a full parse tree and can traverse it, you can consider code generation. For un-optimized code, this is probably the easiest part, but once you start to considering possible optimizations, you&#x27;ll probably want to write a &#x27;back-end&#x27; and you could probably spend the rest of your life creating new optimizations.<p>5) Of course, this all gets more complicated if you want to do it differently with an LR grammar or if you want an interpreted language. You can also think about things like just in time compilation, etc.
montyedwardsabout 9 years ago
The Go programming language doesn&#x27;t compile to C. Compiling Go is faster than compiling C.<p>If Go transpiled into C code first, and then had to compile resulting C code, then that entire process would be slower.<p>The Nim programming language compiles to C, so you may want to reach out and ask their community. It used to be called Nimrod, but is now Nim.<p>The Rust programming language leverages LLVM instead of transpiling to C, so you may want to take a look at how that is done. A recent post about Rust MIR is well-written and is an enjoyable read for anyone interested in compilers.
bjourneabout 9 years ago
First you write an RPN calculator. Make a program that takes the input: &quot;3 4 * 4 + 2 &#x2F;&quot; and figures out that the answer is 8.