Java – Class design for writing multiple versions of multiple files

class-designjavaobject-orientedobject-oriented-design

I am writing a web service in Java which reads some information from a DB and generates multiple JSON files which are written to S3. For each type of file, I have a POJO which is serialized to JSON using jackson.

The schema of the files can change over time – new fields can be added, existing fields can be removed etc. Any change in schema will require a new version of the file to be created. Therefore at any point in time, there might be multiple versions of a file in S3.

The current architecture is pretty simple – There is a Controller class which calls a couple of classes to get data from DB. After that it passes that data to a FileGenerator class which creates multiple POJOs and writes them to S3 through a Repository class.

I need to change this design to make it more modular and handle multiple versions of files.

Till now this is what I have come up with:

A generator class for generating each type of file (or POJO) (File1Generator, File2Generator etc)
A FileManager class for each type of file , which calls the generator and serializes the POJO and then writes it to the data-store. (File1Manager, File2Manager etc)
The Controller class will now all the File managers one by one.

However, I am not sure how to design the system in a way that multiple versions of the files can be handled. For the POJOs, I have to have a separate POJO for each version – File1POJO_V1, File1POJO_V2. But then I will also have additional generators and additional managers. The information passed to generators might vary from one version to another. Same with managers. I am struggling to create a nice class hierarchy such that the code is easily extensible.
Following is a skeleton of the code as I have thought:

class File1POJOV1{
 private String prop1;
 private String prop1;
 ..........
 public setProp1(){}
 public setProp2(){}
}

class File1Generator_V1{
  public File1POJOV1 createFile(String prop1, String prop2 ...){
        //Perform manipulations and create File1POJO_V1
  }
}

class File1Manager_V1{
  private File1Generator_V1 fileGenerator;
  private Serializer serializer; //JSON serializer
  public void createFile(String prop1, String prop2,.., String directory){
      File1POJOV1 file = fileGenerator.createFile(prop1, prop2 ...);
      byte[] data = serializer.serialzer(data); 
      String path  = generateFilePath(); //dynamic based on properties
      writeToS3(data, path, directory); //write data to S3 bucket
  }
}


class Controller{
   File1Manager_V1 manager1_V1;
   File2Manager_V1 manager2_V1;
   generateFile(String prop1, prop1..., directory){
      manager1.createFile(prop1, prop2..,directory);
      manager2.createFile(prop3, prop4.. directory) 
   } 
}

The problem is that when there is new version of a file, there will be a separate POJO for that which will have more(or less) fields than the previous version. This change will be propagated to the Generators and Managers as the input parameters to the methods will change. I am not able to figure out, how to have an elegant design for this system which makes maintenance easy.

EDIT:
I have got some good suggestions for not removing fields in successive versions of a file. But my question remains: assuming that I don't remove fields and that each new version is a subclass of the older one, I will still have to write a separate generator and a manager for each subclass. Can this be avoided ? What would be a better design ?

Best Answer

Keep the need for versioning low

I assume here that the JSON files are generated for someone other component/service/client to consume. So try not removing fields. If you only add new fields, then there is no need to create a new version, so long as consumers of the JSON files ignore fields they don't know.

An interface where consumers ignore features they don't know about is more robust. Imagine for example if I consume your data at V0 and then for a year you add new fields on a biweekly basis, arriving at V26 after a year. Then in V27 you add a field that I want to consume also. Should I be bothered by updating my code to handle the fields added between V1 through V26 even though I don't use them? I don't think so.

Removing fields on the other hand is a different beast. You should do that rarely, and in bulk. The whole idea of major and minor versions in semantic versioning is about that. If you add something, it is a minor version and it should not affect users, if you remove things, it is a major update, that can break dependent code.

This also coincides with the notion of subtypes, polymorphism and substitutability. Essentially, to add a new field to a FirstPOJO, you could modify it, or you could subclass it to SubPOJO, that extends FirstPOJO by adding someField. Code written against FirstPOJO will be able to handle SubPOJO transparently. Of course if you start removing things, then code can break.

I know this doesn't exactly answer your question. But your basic problem is that you have a code architecture, that doesn't scale. Reducing the need to scale in the first place, does circumvent the problem.

Related Solutions

Java – Object-Oriented Class Design

Example 2 is quite bad for testing... and I don't mean that you can't test the internals. You also can't replace your XmlReader object by a mock object as you have no object at all.

Example 1 is needlessly hard to use. What about

XmlReader reader = new XmlReader(url);
Document result = reader.getDocument();

which is not any harder to use than your static method.

Things like opening the URL, reading XML, converting bytes to strings, parsing, closing sockets, and whatever, are uninteresting. Creating an object and using it is important.

So IMHO the proper OO Design is to make just the two things public (unless you really need the intermediate steps for some reason). Static is evil.

C# Class Design – Arguing Against a ‘Completely Public’ Mindset

Completely public classes have a justification for certain situations, as well as the other extreme, classes with only one public method (and probably lots of private methods). And classes with some public, some private methods as well.

It all depends on the kind of abstraction you are going to model with them, which layers you have in your system, the degree of encapsulation you need in the different layers, and (of course) what school of thought the author of the class comes from. You can find all of these types in SOLID code.

There are entire books written about when to prefer which kind of design, so I am not going to list any rules here about it, the space in this section would not be sufficient. However, if you have a real world example for an abstraction you like to model with a class, I am sure the community here will happily help you to improve the design.

To address your other points:

"private backing fields with no logic in the properties": Yes, you are right, for trivial getters and setters this is just unneccessary "noise". To avoid this kind of "bloat", C# has a short-cut syntax for property get/set methods:

So instead of

   private string field1;
   public string Prop1
   { get { return field1; } }
   { set { field1 = value; } }

write

   public string Prop1 { get;set;}

   public string Prop1 { get;private set;}

"Multiple constructors": that is not a problem in itself. It gets a problem when there is unnecessary code duplication in there, like shown in your example, or the calling hierarchy is convoluted. This can be easily solved by refactoring common parts into a separate function, and by organizing the constructor chain in a unidirectional manner
"Potentially no properties would be assigned a value due to the empty constructor": in C#, every datatype has a clearly defined default value. If properties are not initialized explicitly in a constructor, they get this default value assigned. If this is used intentionally, it is perfectly ok - so an empty constructor might be ok if the author knows what he is doing.
"It's too many properties! (in the 30 case)": yes, if you are free to design such a class in a greenfield manner, 30 are too many, I agree. However, not everyone of us has this luxury (did you not write in the comment below it is a legacy system?). Sometimes you have to map records from an existing database, or file, or data from a third party API to your system. So for these cases, 30 attributes might be something one has to live with.

Best Answer

Keep the need for versioning low

Related Solutions

Java – Object-Oriented Class Design

C# Class Design – Arguing Against a ‘Completely Public’ Mindset

Related Topic