R – Writing compilers … what’s right and what’s wrong?

bisoncompiler-constructionflex-lexerresources

Okay, in my quest to figure out the necessary stuff to write a compiler, I've reached a bit of a roadblock. It seems that every technology or tool that I find has some opposition somewhere.

I use Bison and Flex right now but I'm getting the feeling that this method is outdated. Is this true? Is this a good forward-compatible way to proceed with writing a full fledged programming language?

In a sea of different concepts and tools (ANTLR, LL(k), GLR, LALR, LLVM, Flex, Bison) What's the current trend and best practices for writing compilers? Is the dragon book out of date?

Best Answer

Unless you want to write a truly simple compiler, your focus is wrong.

Writing compilers is only a tiny bit about writing parsers. Having a parser is like climbing the foothills of the Himalayas when the problem is climbing Everest. You get to the top of the foothill and look up ... only 20,000 feet to go and you've only done the truly easy part. And you'll note the technology required to get to the top of the foothills is radically easier than the technology you need to go the rest of the way.

(FYI: the best present parsing technology is GLR, which easily accepts ambiguous grammars without hacking the grammar. GLR even easily parses C++, which violates the folk theorem that C++ is hard to parse. The folk theorem came from people trying to use YACC and ANTLR to parse it).

To build a compiler you need lots of machinery:

  • AST building
  • Symbol table construction
  • Control flow analysis
  • Data flow analysis
  • Representation of program code essentially as a data flow computation (SSA or triples)
  • A model of the target machine
  • A means to map program code to machine instructions
  • Register allocation
  • Optimizations: constant propagation, loop unrolling, ...

We haven't even gotten near global flow analysis, global optimizations, or special handling for modern day instruction sets involving SIMD instructions or cache optimizations. ... The list goes on and on. The Dragon book gives a nice introduction to the basic topics, but doesn't address any of the advanced ones. You'll want Cooper's "Engineering a Compiler" and Muchnick's "Advanced Compiler Design" as references and it would be good if you had skimmed them well before you start.

Building a modern compiler is quite a feat of engineering.