ANTLR vs Parsing Libraries – When to Use Each

compilerjavalanguage-designlexerparsing

I've always wanted to learn how to write a compiler – I've decided to use ANTLR, and am currently reading through the book (its very good by the way)

I'm pretty new to this, so go easy, but the jist seems to be that you write your grammar, transform this into a data structure (usually an AST) and then walk this one or more times actually executing the 'meat' of whatever you want your program to do.

If my input "language" was something like JSON or XML, i.e something that probably has a library that can turn it into a graph of pojos – does this negate the need to do the lexing and parsing with a compiler compiler like ANTLR? clearly if my input is very bespoke then I need to write my own lexer/parser – but could I and should I short-cut this if my input language is already broadly used.

Would be fair to say you could parse, say json, with Jackson,into POJO's and then drive your code of the resulting pojos? – or in this case, does a 'proper' compiler compiler offer some advantage?

Edited (based on the answers) to add

I probably should have pointed out that my question was slightly hypothetical – I wouldn't ever try and build a programming language in XML!

So I guess the deal is that AST != Pojo – and the tree walkers that antlr give you are more useful in the case where you need to 'walk' the data structure and execute code.

Best Answer

You could use either to perform either task. The difference is what each is meant for. You can draw pictures or diagrams in Excel if you want to, but you can also draw a picture in something built for that purpose.

JSON and XML libraries are designed around general purpose loading of documents, pulling parts out of them, or transforming their structures into different structures, etc.

ANTLR on the other hand is a tool designed for generating parsers for compilers. It is tailored specifically to suit the needs of that task.

If you use an xml or json parser to parse either of those, you'll ultimately end up writing a bunch of code that transforms your input into an AST of some sort, for you to process. So, whether you want to write and debug all of that, or use something that gives it to you up front, that's up to you.

Related Topic