Lexical Analysis without regular expressions

lexerregular expressionstheory

I've been looking at a few lexers in various higher level langauges (Python, PHP, Javascript among others) and they all seem to use regular expressions in one form or another. While I'm sure regex's are probably the best way to do this, I was wondering if there was any way to achieve basic lexing without regular expressions, maybe some sort of direct string parsing or something.

So yeah, is it possible to implement some sort of basic lexing in a higher level language* without using regular expressions in any form?

*Higher level languages being things like Perl/PHP/Python/Javascript etc. I'm sure there is a way to do it in C

Best Answer

First of all, there have been regular expression libraries for C since before your "higher-level" languages were invented. Just saying, C programs aren't as podunk as some people seem to think.

For most grammars, lexing is a matter of searching for whitespace and a few other characters like ()[]{}; to split the words, and then matching against a list of keywords to see if any match.

Related Topic