Are separate parsing and lexing passes good practice with parser combinators

lexerparser-combinatorparsing

When I began to use parser combinators my first reaction was a sense of liberation from what felt like an artificial distinction between parsing and lexing. All of a sudden everything was just parsing!

However, I recently came across this posting on codereview.stackexchange illustrating someone reinstating this distinction. At first I thought this was very silly of them, but then the fact that functions exist in Parsec to support this behavior leads me to question myself.

What are the advantages/disadvantages to parsing over an already lexed stream in parser combinators?

Best Answer

Under parsing we understand most often analysis of context free languages. A context free language is more powerful than a regular one, hence the parser can (most often) do the job of the lexical analyser right away.

But, this is a) quite unnatural b) often inefficient.

For a), if I think about how for example an if expression looks, I think IF expr THEN expr ELSE expr and not 'i' 'f', maybe some spaces, then any character an expression can start with, etc. you get the idea.

For b) there are powerful tools that do an excellent job recognizing lexical entities, like identifiers, literals, brackets of all kinds, etc. They will do their work in practically no time and give you a nice interface: a list of tokens. No worries about skipping spaces in the parser anymore, your parser will be much more abstract when it deals with tokens and not with characters.

After all, if you think a parser should be busy with low level stuff, why then process characters at all? One could write it also on the level of bits! You see, such a parser that works on the bit level would be almost incomprehensible. It's the same with characters and tokens.

Just my 2 cents.

Related Topic