Object-Oriented Design – Identical Behavior in Semantically Different Classes

cclean codedesignobject-orientedobject-oriented-design

I am writing a program which is a similar to Ruby's Active Record Migrations, in which that every migration has both an "Up" and "Down" in terms of creating a change to the database, "Up" meaning bring the database forward in time, and "Down" reversing that change.

My program has two main functions to achieve its use-cases

Get a list of scripts off the filesystem
Execute the appropriate migrations on the database

The "Up" and "Down" migrations are stored in their own separate files, but they match with a similar naming scheme

2014000000000001_CreateTable_up.sql
2014000000000001_CreateTable_down.sql

There is a 1:1 correlation between up and down scripts. Having just an up without the corresponding down script is not valid (and vise verse).

I have two classes in my project to represent these filesystem files.

public sealed class UpScript // (and DownScript, which looks identical)
{
    public UpScript(long version, string name, string content)
    {
        Content = content;
        Version = version;
        Name = name;
    }

    public long Version { get; private set; }
    public string Name { get; private set; }
    public string Content { get; private set; }
}

The reason that I did not make a single class called Script is I didn't want to create ambiguity about the purpose of a specific Script instance. UpScript and DownScript classes should not be used interchangeability as they do not conform to the Liskov Substitution Principle.

The problem that I have here is a few parts of code that are dealing with both either UpScript or DownScript instances look almost identical. How do I reduce the duplication without losing the expressiveness of having different classes?

My current ideas with concerns are:

Abstract base class

public abstract class Script
{
    protected Script(long version, string name, content)
    {
        // code
    }

    // properties
}

public sealed UpScript : Script
{
    public UpScript(long version, string name, string content)
        : this(version, name, content)
    {}
}

public sealed DownScript : Script
{
    public DownScript(long version, string name, string content)
        : this(version, name, content)
    {}
}

This reduces the redundant code in the two classes, however, code that creates and consumes is still very duplicated.

Single class with Direction property

public enum Direction { Up, Down }

public sealed class Script
{
    protected Script(Direction direction, long version, string name, string content)
    {
        // ...
    }

    public Direction Direction { get; private set; }

    // ...
}

This code reduces the duplication, however, it adds onus on consumes to ensure the proper script is passed to it and it creates ambiguity if you ignore the Direction property.

Single class with two Content properties

public sealed class Script
{
    protected Script(long version, string name, string upContent, string downContent)
    {
        // ...
    }

    public string UpContent { get; private set; }
    public string DownContent { get; private set; }
}

This reduces duplication and is not ambiguous, however, the usage of the program is to either run the up script, or run the down scripts. With this scheme, the code that gathers and creates these Script instances is doing twice as much work, I don't consider not doing this being premature optimization because if the user wants to do an up migration, there's no point is looking at down scripts.

With all that said I may be looking at the wrong problem to solve entirely.

If more information would be helpful in forming a suggestion, you can view ScriptMigrations on GitHub

Best Answer

My C# is rusty, hopefully I haven't glossed over any language constraints.

I would implement UpScript and DownScript as interfaces that have the same members:

public interface IUpScript {
    public long Version { get; private set; }
    public string Name { get; private set; }
    public string Content { get; private set; }
}
// ...
public interface IDownScript {
    public long Version { get; private set; }
    public string Name { get; private set; }
    public string Content { get; private set; }
}

This approach attempts to provide the minimal code needed to create a distinction between up and down, nothing more. Thoughts:

Classes implementing an interface will have the common properties you've defined as in your other approaches but don't inherit implementation details like a constructor.
Your implementing classes would be free to extend other classes if they needed.
It looks as though you don't need to support multiple implementations of up and down scripts but you'd have no trouble doing so.

Related Solutions

Object-Oriented Design – Are Trivial Protected Getters Overkill?

Is this just blatant overkill?

Often times it is, sometimes it's not.

Keeping m_obj in some known good state helps protect the Liskov Substitution Principle, keeping your code more resilient and higher quality. Sometimes you can trust inheritors to respect that behavior, sometimes you can't (either due to usage pattern or the subtlety of the contract it implements). Though this code/question also stumbles towards some of the reasons for "Why is Clean Code suggesting avoiding protected variables?"

C# Class Design – Arguing Against a ‘Completely Public’ Mindset

Completely public classes have a justification for certain situations, as well as the other extreme, classes with only one public method (and probably lots of private methods). And classes with some public, some private methods as well.

It all depends on the kind of abstraction you are going to model with them, which layers you have in your system, the degree of encapsulation you need in the different layers, and (of course) what school of thought the author of the class comes from. You can find all of these types in SOLID code.

There are entire books written about when to prefer which kind of design, so I am not going to list any rules here about it, the space in this section would not be sufficient. However, if you have a real world example for an abstraction you like to model with a class, I am sure the community here will happily help you to improve the design.

To address your other points:

"private backing fields with no logic in the properties": Yes, you are right, for trivial getters and setters this is just unneccessary "noise". To avoid this kind of "bloat", C# has a short-cut syntax for property get/set methods:

So instead of

   private string field1;
   public string Prop1
   { get { return field1; } }
   { set { field1 = value; } }

write

   public string Prop1 { get;set;}

   public string Prop1 { get;private set;}

"Multiple constructors": that is not a problem in itself. It gets a problem when there is unnecessary code duplication in there, like shown in your example, or the calling hierarchy is convoluted. This can be easily solved by refactoring common parts into a separate function, and by organizing the constructor chain in a unidirectional manner
"Potentially no properties would be assigned a value due to the empty constructor": in C#, every datatype has a clearly defined default value. If properties are not initialized explicitly in a constructor, they get this default value assigned. If this is used intentionally, it is perfectly ok - so an empty constructor might be ok if the author knows what he is doing.
"It's too many properties! (in the 30 case)": yes, if you are free to design such a class in a greenfield manner, 30 are too many, I agree. However, not everyone of us has this luxury (did you not write in the comment below it is a legacy system?). Sometimes you have to map records from an existing database, or file, or data from a third party API to your system. So for these cases, 30 attributes might be something one has to live with.

Best Answer

Related Solutions

Object-Oriented Design – Are Trivial Protected Getters Overkill?

C# Class Design – Arguing Against a ‘Completely Public’ Mindset

Related Topic