C# – Refactoring Many Else If Statements

cclean codecoding-stylerefactoring

I'm trying to parse readable data from PDFs and I keep ending up writing code like the following:

if (IsDob(line))
{
    dob = ParseDob(line);
}
else if (IsGender(line))
{
    gender = ParseGender(line);
}
...
...
...
else if (IsRefNumber(line))
{
    refNumber = ParseRefNumber(line);
}
else
{
    unknownLines.Add(line);
}

Then I use all this data to build up relevant objects, e.g. Customer using all of their personal data.

This tends to get kind of ugly when there's lots to parse.

I've split them up into functions like TryParsePersonalInfo(line), TryParseHistory(line) etc. But that feels like it's just moving the problem as I still have these endless else ifs everywhere.

Best Answer

Here's what I would start with given the information you've provided.

Create an interface like so:

interface LineProcessor<E> {
  boolean Process(Record record, Line line); 
}

Let's assume Record is a propery bag for now. Don't get hung up on this, I'm keeping it simple for demonstration purposes.

class Record {
  public Date dob { get; set; }
  public String gender { get; set; }
  public String refNumber { get; set; }
  // ...
}

Sorry if the syntax is not right for C#. If you aren't sure what I'm getting at, I'll clarify.

Then create a list of instances of LineParser: one for type of line. Then, you write a loop (python/pseudocode):

for line in pdf:
  obj = process(record, line)

def process(record, line):
  for handler in processors:
    if handler.process(record, line): return
  else:
    unknown.add(line)

Now the implementation of one of these might look like so:

class DobProcessor implements LineProcessor {
  boolean process(Record record, Line line) {
    if (IsDob(line)) {
      record.dob = ParseDob(line);
      return true;
    } else {
      return false;
    }
  }

  Date ParseDob(Line line) {
    //...
  }

  boolean IsDob(Line line) {
    //...
  }
}

This should make the code more manageable. Instead of a large if statement, you will have a number of classes where each one is specific to a case. This not only organizes the code, it means you can add or remove cases without touching any of the code around other cases.

One other thing is that now that the processes interface is down to a single method, you can actually start thinking of this as more of a function pointer/lambda so you can dispense with the interface if you so desire.

Related Topic