I'm trying to parse readable data from PDFs and I keep ending up writing code like the following:
if (IsDob(line))
{
dob = ParseDob(line);
}
else if (IsGender(line))
{
gender = ParseGender(line);
}
...
...
...
else if (IsRefNumber(line))
{
refNumber = ParseRefNumber(line);
}
else
{
unknownLines.Add(line);
}
Then I use all this data to build up relevant objects, e.g. Customer using all of their personal data.
This tends to get kind of ugly when there's lots to parse.
I've split them up into functions like TryParsePersonalInfo(line), TryParseHistory(line) etc. But that feels like it's just moving the problem as I still have these endless else ifs everywhere.
Best Answer
Here's what I would start with given the information you've provided.
Create an interface like so:
Let's assume Record is a propery bag for now. Don't get hung up on this, I'm keeping it simple for demonstration purposes.
Sorry if the syntax is not right for C#. If you aren't sure what I'm getting at, I'll clarify.
Then create a list of instances of LineParser: one for type of line. Then, you write a loop (python/pseudocode):
Now the implementation of one of these might look like so:
This should make the code more manageable. Instead of a large if statement, you will have a number of classes where each one is specific to a case. This not only organizes the code, it means you can add or remove cases without touching any of the code around other cases.
One other thing is that now that the processes interface is down to a single method, you can actually start thinking of this as more of a function pointer/lambda so you can dispense with the interface if you so desire.