I am writing a parser for a fairly complicated language in C++. The Parser
class is given a list of tokens and it builds the AST. Though only a part of the parser is completed, the Parser.cpp
file is already more than 1.5k lines and the class has around 25 functions. So, I plan to break the large Parser
class into smaller classes such that I can have separate classes for parsing different language constructs.
For example, I wish to have ExprParser
class that parses expressions, a TypeParser
class that parses types. It seems to be much cleaner. The problem is that the parsing functions must have access to a state that includes the position of the current token, and several parsing helper functions. In C#, it is possible to implement related functions in different classes using partial classes. Is there any specific design pattern or recommended way for this?
Best Answer
Create a Scanner or Tokenizer class, which takes the input data (the text to be parsed) and holds the position of the current token or similar state. It can also provide some shared helper functions. Then provide a reference (or a shared pointer) to the Scanner object to all your individual
xyzParser
objects, so they can all access the same scanner. The "scanner" will be only responsible for accessing the data by basic tokenize functions, the individual parsers will be responsible for the actual parsing logic.This will work most easily as long as your scanner does not need to know which individual parsers exists. If the scanner actually needs to know this, you might consider to resolve the cyclic dependency by introducing abstract "interface" base classes, or by implementing some kind of call back or event mechanism, where the scanner can notify any kind of observers.