Java – Class design for writing multiple versions of multiple files

class-designjavaobject-orientedobject-oriented-design

I am writing a web service in Java which reads some information from a DB and generates multiple JSON files which are written to S3. For each type of file, I have a POJO which is serialized to JSON using jackson.

The schema of the files can change over time – new fields can be added, existing fields can be removed etc. Any change in schema will require a new version of the file to be created. Therefore at any point in time, there might be multiple versions of a file in S3.

The current architecture is pretty simple – There is a Controller class which calls a couple of classes to get data from DB. After that it passes that data to a FileGenerator class which creates multiple POJOs and writes them to S3 through a Repository class.

I need to change this design to make it more modular and handle multiple versions of files.

Till now this is what I have come up with:

  • A generator class for generating each type of file (or POJO) (File1Generator, File2Generator etc)

  • A FileManager class for each type of file , which calls the generator and serializes the POJO and then writes it to the data-store. (File1Manager, File2Manager etc)

  • The Controller class will now all the File managers one by one.

However, I am not sure how to design the system in a way that multiple versions of the files can be handled. For the POJOs, I have to have a separate POJO for each version – File1POJO_V1, File1POJO_V2. But then I will also have additional generators and additional managers. The information passed to generators might vary from one version to another. Same with managers. I am struggling to create a nice class hierarchy such that the code is easily extensible.
Following is a skeleton of the code as I have thought:

class File1POJOV1{
 private String prop1;
 private String prop1;
 ..........
 public setProp1(){}
 public setProp2(){}
}

class File1Generator_V1{
  public File1POJOV1 createFile(String prop1, String prop2 ...){
        //Perform manipulations and create File1POJO_V1
  }
}

class File1Manager_V1{
  private File1Generator_V1 fileGenerator;
  private Serializer serializer; //JSON serializer
  public void createFile(String prop1, String prop2,.., String directory){
      File1POJOV1 file = fileGenerator.createFile(prop1, prop2 ...);
      byte[] data = serializer.serialzer(data); 
      String path  = generateFilePath(); //dynamic based on properties
      writeToS3(data, path, directory); //write data to S3 bucket
  }
}


class Controller{
   File1Manager_V1 manager1_V1;
   File2Manager_V1 manager2_V1;
   generateFile(String prop1, prop1..., directory){
      manager1.createFile(prop1, prop2..,directory);
      manager2.createFile(prop3, prop4.. directory) 
   } 
}

The problem is that when there is new version of a file, there will be a separate POJO for that which will have more(or less) fields than the previous version. This change will be propagated to the Generators and Managers as the input parameters to the methods will change. I am not able to figure out, how to have an elegant design for this system which makes maintenance easy.

EDIT:
I have got some good suggestions for not removing fields in successive versions of a file. But my question remains: assuming that I don't remove fields and that each new version is a subclass of the older one, I will still have to write a separate generator and a manager for each subclass. Can this be avoided ? What would be a better design ?

Best Answer

Keep the need for versioning low

I assume here that the JSON files are generated for someone other component/service/client to consume. So try not removing fields. If you only add new fields, then there is no need to create a new version, so long as consumers of the JSON files ignore fields they don't know.

An interface where consumers ignore features they don't know about is more robust. Imagine for example if I consume your data at V0 and then for a year you add new fields on a biweekly basis, arriving at V26 after a year. Then in V27 you add a field that I want to consume also. Should I be bothered by updating my code to handle the fields added between V1 through V26 even though I don't use them? I don't think so.

Removing fields on the other hand is a different beast. You should do that rarely, and in bulk. The whole idea of major and minor versions in semantic versioning is about that. If you add something, it is a minor version and it should not affect users, if you remove things, it is a major update, that can break dependent code.

This also coincides with the notion of subtypes, polymorphism and substitutability. Essentially, to add a new field to a FirstPOJO, you could modify it, or you could subclass it to SubPOJO, that extends FirstPOJO by adding someField. Code written against FirstPOJO will be able to handle SubPOJO transparently. Of course if you start removing things, then code can break.

I know this doesn't exactly answer your question. But your basic problem is that you have a code architecture, that doesn't scale. Reducing the need to scale in the first place, does circumvent the problem.

Related Topic