Lexer Design – Should a Lexer Un-Escape Strings?

lexer

Is it a lexer's job to undo any escaping done to a string literal? For example:

"Me: \"Hello World!\""

Becomes:

Me: "Hello World!"

Should this conversion be done inside the lexer? I am guessing it should, because it'd allow for a more abstract and modular design. You could add ways to represent strings and you won't have to update every component.

Best Answer

If you are implementing something close to string literals in C, then yes. This is because at the level of the parser, you are only concerned about something being a string literal and not how they are implemented.

But if you have some additional requirement such as the double quotes appearing inside the string literal must be matched (i.e., "\"" is invalid). Then this can only be captured only using a context free grammar and can only be handled by a parser.