Java – Design Pattern for Processing a Huge CSV File

design-patternsjavaobject-orientedobject-oriented-designprogramming practices

I am learning design patterns in Java and also working on a problem where I need to handle huge number of requests streaming into my program from a huge CSV file on the disk. Each CSV line is one request, and the first field in each line indicates the message type. There are 7 types of messages, each of which should be handled differently.

The code is something like the following:

class Handler {
    private CustomClass {….}
    Map<String, CustomClass> map = new HashMap<String, CustomClass>();
    Public void runFile() {
        // Read the huge CSV file with millions of records, line by line
        // for each line, get the string before the first comma (say X)
        switch (X) {
        case 1 : myMethod1(….); break;
        case 2 : myMethod2(….); break;
        case 3 : myMethod3(….); break;
        // ...
        default: // ...
        }
    }
    // Methods 1, 2, 3 declarations
}

Note 1: Some of the methods affect the map and others don't.
Note 2: Each request (method) uses different variables from the CSV line and executes a different logic.
Note 3: requests/methods are NOT connected; i.e. myMethod2() does not logically follow myMethod1().

Now my question – What is an appropriate design pattern for this problem? Is it fine if I keep the whole logic in one class (similar to the code above) without changing?

Best Answer

I assume the code sample you show is just a simplified example and that the real problem is more complex, so as to deserve using a pattern.

  • Make CustomClass an external class(*).
  • Have several processors that implement the same interface
  • Have a map of processors using the integer that identifies the format of the CSV line as the key (you call it x).
  • Retrieve a processor from the map (with the correspoding key) and make it process the line.
  • This similar to strategy pattern, it defines a family of algorithms,encapsulates each algorithm, and makes the algorithms interchangeable within that family.

Advantages: flexibility, if you create the map of processors outside the handler and pass it in the constructor, more processors can be added later and Handler will not need to be changed (for example to add a new case the switch control structure).

enter image description here

(*) You can achieve the same results having the interface and the processors as well as the customclass as inner classes/interfaces inside Handler, but it would pollute the solution a lot.

==> CustomClass.java <==

public class CustomClass {}

==> IMessageProcessor.java <==

import java.util.Map;

public interface IMessageProcessor {
    public void processLine(Map<String, CustomClass> map, String line);     
}

==> ProcessorA.java <==

import java.util.Map;

public class ProcessorA implements IMessageProcessor {
    @Override
    public void processLine(Map<String, CustomClass> map, String line) {
        // TODO Auto-generated method stub
    }
}

==> ProcessorB.java <==

import java.util.Map;

public class ProcessorB implements IMessageProcessor {
    @Override
    public void processLine(Map<String, CustomClass> map, String line) {
        // TODO Auto-generated method stub
    }
}

==> ProcessorC.java <==

import java.util.Map;

public class ProcessorC implements IMessageProcessor {
    @Override
    public void processLine(Map<String, CustomClass> map, String line) {
        // TODO Auto-generated method stub
    }
}

==> Handler.java <==

import java.util.HashMap;
import java.util.Map;

public class Handler {
    private Map<String, CustomClass> map = new HashMap<String, CustomClass>();
    private Map<Integer,IMessageProcessor> processors = new HashMap<Integer,IMessageProcessor>();
    public processFile(){
        // store the processors in their map with the appropiate keys 
        processors.put(1, new ProcessorA());
        processors.put(2, new ProcessorB());
        processors.put(3, new ProcessorC());

        // Read the huge CSV file with millions of records, line by line
        // for each line, get the string before the first comma (say x)
        processors.get(x).processLine(map,line);
    }
}

Note: You might wish to validate first whether the processor for the key x exists, and it it doesnt, fall back to a default processor, say, stored with key -1, or any other value garanteed not to exist in the CSV file.

Related Topic