Java – Quickest Way to Split a Delimited String

efficiencyjavaperformancesorting

I am building a Comparator that provides multi-column sort capability on a delimited String.
I am currently using the split method from String class as my preferred choice for splitting the raw String into tokens.

Is this the best performing way to convert the raw String into a String array? I will be sorting millions of rows so I think approach matters.

It seems to run fine and is very easy, but unsure if there's a faster way in java.

Here is how the sort works in my Comparator:

public int compare(String a, String b) {

    String[] aValues = a.split(_delimiter, _columnComparators.length);
    String[] bValues = b.split(_delimiter, _columnComparators.length);
    int result = 0;

    for( int index : _sortColumnIndices ) {
        result = _columnComparators[index].compare(aValues[index], bValues[index]);
        if(result != 0){
            break;
        }
    }
    return result;
}

After benchmarking the various approaches, believe it or not, the split method was the quickest using the latest version of java. You can download my completed comparator here: https://sourceforge.net/projects/multicolumnrowcomparator/

Best Answer

I've written a quick and dirty benchmark test for this. It compares 7 different methods, some of which require specific knowledge of the data being split.

For basic general purpose splitting, Guava Splitter is 3.5x faster than String#split() and I'd recommend using that. Stringtokenizer is slightly faster than that and splitting yourself with indexOf is twice as fast as again.

For the code and more info see http://demeranville.com/battle-of-the-tokenizers-delimited-text-parser-performance/

Related Topic