C++ – Writing a Lexer

ccompilerlexer

What are good resources on how to write a lexer in C++ (books, tutorials, documents), what are some good techniques and practices?

I have looked on the internet and everyone says to use a lexer generator like lex. I don't want to do that, I want to write a lexer by hand.

Best Answer

Keep in mind that every finite state machine corresponds to a regular expression, which corresponds to a structured program using if and while statements.

So, for example, to recognize integers you could have the state machine:

0: digit -> 1
1: digit -> 1

or the regular expression:

digit digit*

or the structured code:

if (isdigit(*pc)){
  while(isdigit(*pc)){
    pc++;
  }
}

Personally, I always write lexers using the latter, because IMHO it is no less clear, and there is nothing faster.