I come from a self taught background and am trying to shore up my weaknesses in OOP, specifically C# and class design. I've been reading Code Complete 2 and it became apparent that I'm not following good class design principals. I currently have a ReportHandler class that generates a report based on a ReportType enum that is set when the handler is instantiated.
So now I'm refactoring the code, and am starting by considering the reports and how they differ:
They all share the following properties which will all be the same type:
- Report ID (Guid)
- Start Date (DateTime)
- End Date (DateTime)
- Total Results (int)
They also all have to retrieve the data from a database (currently the same database, could change in the future) and parse the data into an object, and be able to convert that object into a JSON. The data object and parsing will be different for each report.
So I'm wondering what the best design for this would be. If I created an abstract class I could do it like this:
public abstract class Report
{
public abstract void GetData();
public abstract void ParseData();
public abstract string ToJson();
public Guid ReportId { get; set; }
public DateTime StartDate { get; set; }
public DateTime EndDate { get; set; }
public int TotalResults { get; set; }
//Not sure if this is correct, I want to make sure we can't instantiate a derived class without these values.
protected Report(Guid reportId, DateTime startDate, DateTime endDate)
{
if (startDate > endDate)
throw new ArgumentException("Start date cannot be after end date.");
ReportId = reportId;
StartDate = startDate;
StartDate = endDate;
}
}
I believe this would help keep the classes organized. I could create a something like this:
public class SpecificReport : Report
{
//The raw data type can change between reports.
private List<string> _rawData;
public SpecificReportObject ParsedReport { get; private set; }
public SpecificReport(Guid reportId, DateTime startDate, DateTime endDate)
: base(reportId, startDate, endDate)
{
}
public override void GetData()
{
var rawData = new List<string>();
//Connect to a database and assign the results to rawData
_rawData = rawData;
}
public override void ParseData()
{
var parsedReport = new SpecificReportObject();
foreach (var result in _rawData)
{
//Parse RawData into a specific report object
}
ParsedReport = parsedReport;
}
public override string ToJson()
{
return JsonConvert.SerializeObject(ParsedReport);
}
public class SpecificReportObject
{
//Specific specific properties and logic
}
}
Now that seems more organized to me, and logically seems to make more sense. You can instantiate a report, call GetData(), ParseData(), and finally ToJson() to get the report in a portable format.
However I am wondering if the derived reports should contain this functionality at all. I don't like that I have to remember to call GetData() and ParseData(), but putting those functions in the constructor seems like a bad idea. I wouldn't want to put something in a constructor that can fail, especially a potentially lengthy operation like querying a database.
Would it be better to separate GetData from the class? Then I could retrieve the data first and load it into the ParseData function as an argument
Best Answer
I'm going to make a different suggestion more to broaden your perspective. While I do recommend the concepts behind what I'll say generally, I would need more context to see if it is most appropriate for your situation (and likely it would suggest changes to other code).
You absolutely hit the nail on the head when you mention your concerns about having to remember to call methods. The situation is potentially worse. Even if I remember to call the methods, what happens if I call
GetData
again later? As your code is currently written, you end up with an object with inconsistent data. (As an aside, you should probably look at Dependency Injection [the concept formerly known as "parameterization"] to move details like where to fetch the data out of your classes. I won't focus on this below.)One solution to this problem is to use the type system to help enforce your invariants. That's what it is there for. To make the example a little more impactful, I'm going to assume there is some pressing need to have explicit control over when a report is parsed. (In your actual code, it's not clear why you expose
GetData
at all. If you don't want it done in the constructor, then have it done on-demand byParseData
.) The key idea is:Consider something like:
With this approach, you are now just unable to do things out of order, or forget to do a step, or produce inconsistent data by duplicating a step. Applying this technique doesn't actually require a type system. It's just data abstraction. One place where a type system can be valuable in this is one possible implementation of
Parse
in some contexts may bereturn this;
. In this scenario, we are using the type system to enforce invariants because if you just ask for anIUnparsedReport
, you don't know that the specific instance you get also implementsIParsedReport
(unless you do something grotty like a downcast or reflection). In fact, you can apply this approach to your code almost as-is. This can avoid needing wrappers or copying data around or duplicating code. I don't recommend doing this by default though as it implies mutation and can lead to hidden sharing that produces confusing results.To cast this approach in OOD terms, you can view each stage as a Factory that produces the next stage. I used terminology similar to that in BobDalgleish's answer as the
DataProvider
discussed there can be viewed as a stage. (Though depending on how general it is, it may be better not to think of that way and just provide it as a parameter to a specific implementation ofIUnparsedReport
.) I don't agree with the description of the class though, as it just recapitulates the problem in theDataProvider
. What if you forget to "connect"? You can use the same techniques in this answer to resolve the staging issues that the description ofDataProvider
brings up.There's a broader mantra, of which this is an instance, popular in the typed functional programming community, though it's not at all limited to the FP or even the typed programming communities. Namely: