What skills, technologies etc.<p>Inspired by https://www.reddit.com/r/cscareerquestions/comments/d1fqwd/people_of_this_sub_who_have_jobs_as_data/
Maybe you're different than me, but I have a tough time really learning something without a concrete task at hand. I also find that I have a tough time learning if I don't have someone to ask questions of. And I find that online developer communities are much more welcoming of newb questions after you've contributed something concrete.<p>If it was me (and it really <i>was</i> me just a few years ago), I'd git clone llvm, join #llvm on freenode, and ask if there's any simple refactoring or cleanup people are doing. Give me a chance to absorb the code and also build some goodwill in the community, which I might then leverage into getting help on more complex parts of the compiler that I want to hack on.<p>If you're looking for a project, the nvidia GPU ("NVPTX") backend in LLVM has a bunch of horrible global variables protected by locks, and it all really should go away. And it's not hard to find other simpler refactorings to do in there, it's pretty yucky. No compilers experience needed, just C++ skills.
JSC compiler architect here.<p>First of all make sure you are very comfortable with these concepts:<p>- SSA<p>- graph coloring, linear scan, and priority coloring approaches to register allocation. You really have to know all of them otherwise you’ll have incorrect ideas about which is best and when.<p>- sea of nodes. Not because you will necessarily implement it (it’s not that great IMO) but because you will definitely use some ideas from it. It’s a very inspiring concept.<p>- abstract interpretation<p>- types, points to sets, abstract heaps, and the ways that these things are the same<p>- instruction selection. This one is tricky because the literature doesn’t say smart things about it. Gotta read code or talk to people. I learned how to do it by word of mouth.<p>You need to strike a balance between these two activities to become good:<p>- Write your own compiler and get it to beat other compilers on some benchmark. It can be a simple benchmark or a simple language. You need to be comfortable with overall compiler architecture and there is no substitute to seeing the whole thing fall together. Then, after you do this, do it again because if you’re like me then your first attempt will be shit.<p>- Learn a major, mature compiler architecture like JSC, llvm, V8, GC, or whatever. Write some measurable improvement to such a compiler. Every major compiler has brilliant nuggets of awesomeness that you will only come to understand if you jump in there and try to make it better.<p>Hope this helps and good luck! You picked a fun profession.
I'm not exactly a compiler engineer (I do program analysis research, frequently with LLVM as my base), but here are some things that come to mind:<p>* The LLVM ecosystem is an absolute godsend for most instrumentation, analysis, and optimization tasks. The project provides an excellent tutorial on writing LLVM passes[1] that I refer to daily.<p>* The compiler blogosphere is full of excellent resources, including Eli Bendersky[2], Trail of Bits[3] (fd: my employer), and John Regehr[4].<p>[1]: <a href="https://llvm.org/docs/WritingAnLLVMPass.html" rel="nofollow">https://llvm.org/docs/WritingAnLLVMPass.html</a><p>[2]: <a href="https://eli.thegreenplace.net/tag/llvm-clang" rel="nofollow">https://eli.thegreenplace.net/tag/llvm-clang</a><p>[3]: <a href="https://blog.trailofbits.com/category/compilers/" rel="nofollow">https://blog.trailofbits.com/category/compilers/</a><p>[4]: <a href="https://blog.regehr.org/" rel="nofollow">https://blog.regehr.org/</a>
I have a PhD in programming languages; and I hire people with these backgrounds (compilers, PL, verification, synthesis). As others have said, if you've looked at LLVM, I'd find that interesting, but for practical use, there are more useful + interesting + fun things out there.<p>E.g. SMT solvers (CVC, Z3, or the like) -- infinitely more fun; and require experience to truly understand what works and what doesn't. Or if you've done something really novel with meta-programming or designed a custom DSL for a domain.
I would start looking at MLIR:<p><a href="https://github.com/tensorflow/mlir" rel="nofollow">https://github.com/tensorflow/mlir</a><p>This is a new compiler framework that attempts to bake machine learning computation into the compiler stack. It is spearheaded by Chris Lattner who started LLVM.<p>I think this project is both early in it's development phase and has a good chance at turning into important compiler infrastructure.<p>(I worked in compilers at NVIDIA for a few years)
For a new grad, I recon you checkout the LLVM introduction and its pointers by Adrian Sampson for a primer and then look at the LLVM tutorials [0] about building a toy language. This will give you a feel about how 'modern' compiler development ties in with a programming language using the LLVM infrastructure.<p>If you are feeling for contributing to a real-world language, there are a few such as Go (very advanced), Rust, Swift, etc. For a beginner, my recommendation would be to checkout the Zig Programming Language [2] as a start and then look at the others.<p>If you can't choose, look at Awesome Compilers: [3]<p>This is the sort of question that I am very pleased to see on HN and we need more of.<p>[0] - <a href="https://www.cs.cornell.edu/~asampson/blog/llvm.html" rel="nofollow">https://www.cs.cornell.edu/~asampson/blog/llvm.html</a><p>[1] - <a href="https://llvm.org/docs/tutorial/" rel="nofollow">https://llvm.org/docs/tutorial/</a><p>[2] - <a href="https://ziglang.org" rel="nofollow">https://ziglang.org</a><p>[3] - <a href="https://github.com/aalhour/awesome-compilers" rel="nofollow">https://github.com/aalhour/awesome-compilers</a>
Something else in addition to the compiler engineering<p>Pure compiler jobs are far and wide between. You might get a good job but your career movement will be limited.<p>I believe the future of compiler design will go hand in hand with the demands of the industry. Traditional “by the dragon book” compiler is a solved problem. The growth is in the application of compiler technologies to Machine Learning, distributed systems, specialized hardware.
Just adjacent to "compiler engineers" are lots of engineers who work on what I would call "plumbing." e.g. the tools in your toolchain which aren't compiler optimization passes but instead implement a particular backend or language/library/OS feature (preprocessor, linker, assembler, C library, etc). For those of us plumbers it's great to get experience with the specific tools in the toolchain on various backends to see how they work. The most critical things are (1) dumping output at various stages (2) your specific toolchain's debug functions, and (3) running strictly-analysis tools like readelf/objdump.<p>Learning how linkers and loaders [1][2] work really helps put the pieces together.<p>Exercises like isolating reproducible failures to a particular tool or compiler pass, C-reduce [3], etc. -- these are valuable.<p>Of course, like everyone else here says: LLVM is a great place to kick the tires on some of this stuff.<p>[1] <a href="https://en.wikipedia.org/wiki/Special:BookSources?isbn=978-1558604964" rel="nofollow">https://en.wikipedia.org/wiki/Special:BookSources?isbn=978-1...</a><p>[2] <a href="https://eli.thegreenplace.net/tag/linkers-and-loaders" rel="nofollow">https://eli.thegreenplace.net/tag/linkers-and-loaders</a><p>[3] <a href="https://embed.cs.utah.edu/creduce/" rel="nofollow">https://embed.cs.utah.edu/creduce/</a>
I wouldn't ask them to read any existing stuff, because I'd want them to learn how to write some of that themselves.<p>Things like: Lexer, Parser, Expression Creator, Optimizer, Evaluator, Expression -> Machine Code Template Matcher, and Machine Code Generator.<p>Where commercial compilers do better than most grad student projects is the modularity, the number of optimizer passes and options, and the run-time tooling and modularity.<p>Unless a student implements these themselves, they'll be spending more time understanding the discrete implementations, with it's flaws/features more than the concepts itself, or the big picture.<p>Hence, in order to create better engineers overall, I'd recommend they do it all themselves initially.
I bet there are a few good businesses around building type hinting for large rails codebases, both for use for profiling, bug detection, and as an editor-agnostic backend. Ditto for large React codebases. I don't think these are large businesses, but look at eg Sidekiq. There's probably a very nice living for a handful of people who get to do interesting work.<p>NB: people on HN are oddly cheap. From the perspective of someone who makes payroll for 10 engineers, I can spend $1k on something like that basically because a senior engineer asked nicely or thought it might be useful. $5-$10k is definitely in scope for useful tooling. (Prices per year because I understand that if the authors can't make a working business, they stop offering me the X that I'm buying). Also, please make it work with vim. Pretty pretty please.<p>And there are probably good industry jobs eg at Dropbox for python or rust, etc. Basically, find a large company with a big investment in a slightly-off-the-beaten-path programming language, and there will be very interesting work.
I worked on compilers at Microsoft, NVIDIA and there are few jobs for traditional hardware compilers. Some new hardware is being designed though, most fun was new optimizations specific to GPUs different caches such as loop optimizer and rematerialization in register allocator.<p>Then I’ve worked on query optimizers for databases and they’ve got completely different technology and papers. Here one can extend Apache Spark, they give lots of interesting extension points.<p>Also, these days my startup is building parsers in Scala for DSLs and if performance is not critical, I’m loving Packrat parsing in Scala (parser combinators), this is way easier and fun. Interesting tooling can be built in Scala, you can also use Scala macros, get access to Scala compiler AST. This kind of work around data might have applications for more engineers.
Data Analytics.<p>There is so much data is being generated and worldwide enterprises are lagging so much in discovery and leveraging insights in data - so there are lots of fun work here.<p>Add ML and AI here.<p>Applying it to above gives you even more powers.<p>And then - you can apply these to any other discipline.
For the front end, Haskell or similar functional languages. For the backend, cranelift. It is currently in the work so contributing to the project will be a good learning experience
I'm self taught on compilers. I focus more in "how" do stuff.<p>1- Focus on AST transformations. A lot about parsing but that is the "easiest" part (using a parser generator, pratt parsing or combinators).<p>In the AST is where the "action" is. I even made my toy langs without parsing at all (I build a small internal DSL).<p>2- Not expect much information about the <i>real neat</i> stuff.<p>How make repls for compilers??? how enable debugging?? how represent AGDTs?? How test them??? How do FFI??? Which data structures to base on the rest??? How profile them??? How do type inference?? So, which GC to use?? How implement a GC?? How implement macros and generics?? ie: without lisp. How implement generators?? etc.<p>A LOT you will find in papers. But real examples??? Never.<p>So I think if you wanna get serious learn how read papers. I don't get the weird math them use and my ignorant impression is that VERY few have real information even if understood. Have the abstract math is small potatoes at the time of implementing.<p>So many times I get answers like "is easy dude" and pressing how "just read how the LLVM is made!".<p>3- That is why I'm very glad of<p><a href="http://journal.stuffwithstuff.com/category/language/" rel="nofollow">http://journal.stuffwithstuff.com/category/language/</a>
<a href="http://craftinginterpreters.com" rel="nofollow">http://craftinginterpreters.com</a><p><i>Real gems</i> here.<p>4- You need to read lisp, oCalm & Haskell if wanna get some good ideas. I'm using Rust and the little is there (ie: toy langs) is good!<p>5- I don't know what to do with LLVM and other larger codebases. Too much complications and when done in Java, .NET (except F#), C or worse, C++, codebases the noise is big. Is much clear the samples on oCalm, Haskell sometimes, lisp sometimes.<p>Or in other words, small/medium compilers are better to get stuff.<p>6- Semantics & features. This is the meat. The toy math calculator is too easy. In the moment you wanna do OO, Lazy, AGDTs, Streaming, Structural type system, etc is where you will see how sparce the actual info is. So narrow the kind of semantics/features you look for.<p>Just add this or that could lead to MASSIVE changes in how do the language.<p>For example, I'm doing a relational language (<a href="http://tablam.org" rel="nofollow">http://tablam.org</a>).<p>Is not that conventional, and a lot of info is from the RDBMS guys, and that mean a lot of detour about STORAGE/ACIDs and not actual languages!<p>6- Finally, pick your host language with care. Probably compilers with transpiling not matter much but your host will define the boundaries of how and what your could do "easily".
I second the recommendation to spend time on LLVM, as there are decent books and tutorials, and it is in active use. It can also be helpful to study JIT compilers - see the .Net runtime code, or various JDK source code - since dynamic code generation has some different tradeoffs.
Data Analytics.
There is so much data is being generated and worldwide enterprises are lagging so much in discovery and leveraging insights in data - so there are lots of fun work here.<p>Add ML and AI here.<p>Applying it to above gives you even more powers.<p>And then - you can apply these to any other discipline.
I know nothing about this topic. But saw this book once and made me excited about compilers. Maybe one day I'll go for it: <a href="https://compilerbook.com" rel="nofollow">https://compilerbook.com</a>
It's not hard, but it's tricky.<p>The reason it's tricky is that there is so many features in various file formats that it's very possible to implement an entire project then come to generating that one bit in that one field that you forgot about, then need to go all the way back up to the parser to attach it to the right place in the AST so you can pass it all the way down.<p>I'd be looking at compiling ML code but otherwise, compilers are a solved problem.