Writing a Superset of a Programming Language as a Transcompiler

compilersyntax

My idea is to write a superset of C# (but question is not language-specific), so that it source-to-source compiles (transcompiles) to C# itself (fall-through switch clauses, default method parameters etc., nothing impossible in C#).

First idea was to parse it, make syntax trees, abstract trees etc. but it seems as a bit of an overkill to me, mostly because large portions of code will remain the same.

My question: Is there a simpler way to do this?

One of my ideas was to search for tokens that need modifying (e.g. switch in case of fall-through) and then rewrite the code (add goto case NEXT_CASE where needed) but is there a better and cleaner way to do this?

Best Answer

If you want this to maintainable then not really. I've seen a compiler that was literally an overgrown sed script. It worked of course, but then we decided we wanted to add something to the language..

However, if you take the more or less standard route of

  1. Lex
  2. Parse
  3. Compile superset to vanilla C# AST
  4. Pretty print AST

you can almost certainly use an existing library for 4, and if you decide to grow your compiler then you'll have a far easier time. If you want to do anything vaguely serious with this compiler than the initial overhead is well worth it.

It might be worth your time to look into some nicer tools for parsing/lexing. I don't think it'd be impossible to find/modify an existing C# grammar to deal with 1 and 2.