Abstract Class Design – How to Retrieve Data Properly

abstract classcobject-oriented

I come from a self taught background and am trying to shore up my weaknesses in OOP, specifically C# and class design. I've been reading Code Complete 2 and it became apparent that I'm not following good class design principals. I currently have a ReportHandler class that generates a report based on a ReportType enum that is set when the handler is instantiated.

So now I'm refactoring the code, and am starting by considering the reports and how they differ:

They all share the following properties which will all be the same type:

  • Report ID (Guid)
  • Start Date (DateTime)
  • End Date (DateTime)
  • Total Results (int)

They also all have to retrieve the data from a database (currently the same database, could change in the future) and parse the data into an object, and be able to convert that object into a JSON. The data object and parsing will be different for each report.

So I'm wondering what the best design for this would be. If I created an abstract class I could do it like this:

public abstract class Report
{
    public abstract void GetData();
    public abstract void ParseData();
    public abstract string ToJson();

    public Guid ReportId { get; set; }
    public DateTime StartDate { get; set; }
    public DateTime EndDate { get; set; }
    public int TotalResults { get; set; }

    //Not sure if this is correct, I want to make sure we can't instantiate a derived class without these values.
    protected Report(Guid reportId, DateTime startDate, DateTime endDate)
    {
        if (startDate > endDate)
            throw new ArgumentException("Start date cannot be after end date.");
        ReportId = reportId;
        StartDate = startDate;
        StartDate = endDate;
    }
}

I believe this would help keep the classes organized. I could create a something like this:

public class SpecificReport : Report
{
    //The raw data type can change between reports.
    private List<string> _rawData;
    public SpecificReportObject ParsedReport { get; private set; }

    public SpecificReport(Guid reportId, DateTime startDate, DateTime endDate)
        : base(reportId, startDate, endDate)
    {

    }

    public override void GetData()
    {
        var rawData = new List<string>();
        //Connect to a database and assign the results to rawData
        _rawData = rawData;
    }

    public override void ParseData()
    {
        var parsedReport = new SpecificReportObject();
        foreach (var result in _rawData)
        {
            //Parse RawData into a specific report object
        }
        ParsedReport = parsedReport;
    }

    public override string ToJson()
    {
        return JsonConvert.SerializeObject(ParsedReport);
    }

    public class SpecificReportObject
    {
        //Specific specific properties and logic
    }
}

Now that seems more organized to me, and logically seems to make more sense. You can instantiate a report, call GetData(), ParseData(), and finally ToJson() to get the report in a portable format.

However I am wondering if the derived reports should contain this functionality at all. I don't like that I have to remember to call GetData() and ParseData(), but putting those functions in the constructor seems like a bad idea. I wouldn't want to put something in a constructor that can fail, especially a potentially lengthy operation like querying a database.

Would it be better to separate GetData from the class? Then I could retrieve the data first and load it into the ParseData function as an argument

Best Answer

I'm going to make a different suggestion more to broaden your perspective. While I do recommend the concepts behind what I'll say generally, I would need more context to see if it is most appropriate for your situation (and likely it would suggest changes to other code).

You absolutely hit the nail on the head when you mention your concerns about having to remember to call methods. The situation is potentially worse. Even if I remember to call the methods, what happens if I call GetData again later? As your code is currently written, you end up with an object with inconsistent data. (As an aside, you should probably look at Dependency Injection [the concept formerly known as "parameterization"] to move details like where to fetch the data out of your classes. I won't focus on this below.)

One solution to this problem is to use the type system to help enforce your invariants. That's what it is there for. To make the example a little more impactful, I'm going to assume there is some pressing need to have explicit control over when a report is parsed. (In your actual code, it's not clear why you expose GetData at all. If you don't want it done in the constructor, then have it done on-demand by ParseData.) The key idea is:

Make a different type for each stage of processing.

Consider something like:

public interface IReportDataProvider {
    IUnparsedReport Fetch();
}

public interface IUnparsedReport {
    // ...
    IParsedReport Parse();
}

public interface IParsedReport {
    // ...
    string ToJsonString();
}

With this approach, you are now just unable to do things out of order, or forget to do a step, or produce inconsistent data by duplicating a step. Applying this technique doesn't actually require a type system. It's just data abstraction. One place where a type system can be valuable in this is one possible implementation of Parse in some contexts may be return this;. In this scenario, we are using the type system to enforce invariants because if you just ask for an IUnparsedReport, you don't know that the specific instance you get also implements IParsedReport (unless you do something grotty like a downcast or reflection). In fact, you can apply this approach to your code almost as-is. This can avoid needing wrappers or copying data around or duplicating code. I don't recommend doing this by default though as it implies mutation and can lead to hidden sharing that produces confusing results.

To cast this approach in OOD terms, you can view each stage as a Factory that produces the next stage. I used terminology similar to that in BobDalgleish's answer as the DataProvider discussed there can be viewed as a stage. (Though depending on how general it is, it may be better not to think of that way and just provide it as a parameter to a specific implementation of IUnparsedReport.) I don't agree with the description of the class though, as it just recapitulates the problem in the DataProvider. What if you forget to "connect"? You can use the same techniques in this answer to resolve the staging issues that the description of DataProvider brings up.

There's a broader mantra, of which this is an instance, popular in the typed functional programming community, though it's not at all limited to the FP or even the typed programming communities. Namely:

Make invalid states unrepresentable.