So I have a program whose purpose is to take text files and parse them into a binary format that an embedded system understands. However, the text format I've inherited that I need to parse is sufficiently complex enough that after refactoring the main parse
routine I'm left with a class with more than 50 methods that almost all look something like parseChannel
, parseWCommand
, parseVCommand
, parsePCommand
, parseLoop
, parseHex
, parseInt
, etc. etc. etc.
Needless to say, the class, from its declaration, looks huge and daunting.
However, the methods to interface with the class are extremely simple, just parse
(compile the text, figure out its compiled size) and link
(fix the pointers once the location in memory is known, return the finalized raw binary data). Really, to the user the class has virtually no other reason for existence besides those two functions, but there's so much seemingly useless fluff in the class declaration that it's hard to even see what it's supposed to do. There's a similar situation going on with the rather massive number of data members that, again, are useless to everything but the class's internal methods that need them to talk to each other, though I don't know if that's as much of an issue.
I've considered making a separate class solely for parsing that's used within the parse
method, but it seems strange to me to build an entirely separate class that's only used in a single method in a single class. It seems a bit…superfluous, I guess? And I don't even know if that's attacking the right problem.
I guess in the end, here's what I'm asking:
- Is this seemingly-huge class actually a problem? I don't think it's strictly a "god object", but it superficially looks like one.
- If it is a problem, what are the best method(s) to fix it?
Best Answer
Ditch the OOP
The member functions probably don’t need to be. You can have the
Parser
class provide a few primitive operations (accept character, backtrack, &c.) and implement the rest of the parser as free functions taking aParser
reference. They can live in the source file (under an anonymous namespace) thus keeping the header nice and minimal.In this sense,
Parser
is really a minimal wrapper around a stream state.Also, it helps to factor out combinators such as
many
,choice
, and so on, to avoid writing error-prone and verbose manual loops.Sketch
In
Parse.h
, your entire public API.Your parser state can be entirely private to
Parse.cpp
.The public API implementation just forwards to the start production of a grammar with a new parser.
Output iterators are a fairly convenient way to produce multiple results.
Pointers have the advantage of a conversion to bool.
And so on. The real strengths of C++ to leverage when writing parsers are:
Objects, not so much.