Java – Simple java algorithm to encode/decode the following string

compressiondecodeencodejava

Suppose I have
String input = "1,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2,2,3,0,4,0,0,0,4,0,3";
I want to encode it into a string with less character and actually hides the actual information by representing it in roman character, IE. the above encodes to something like "Adqwqkjlhs". Must be able to decode to original string if given the encoded string.

The string input is actually something I parse from the hash of an URL, but the original format is lengthy and open to manipulation.

Any ideas?

Thanks

Edit #1
The number can be from 0 to 99, and each number is separate by a comma for String.split(",") to retrieve the String[]

Edit #2 (Purpose of encoded string)
Suppose the above string encodes to bmtwva1131gpefvb1xv, then I can have URL link like www.shortstring.com/input#bmtwva1131gpefvb1xv. From there I would decode bmtwva1131gpefvb1xv into comma separate numbers.

Best Answer

This isn't really much of an improvement from Nathan Hughes' solution, but the longer the Strings are, the more of a savings you get.

Encoding: create a String starting with "1", making each of the numbers in the source string 2 digits, thus "0" becomes "00", "5" becomes "05", "99" becomes "99", etc. Represent the resulting number in base 36.

Decoding: Take the base 36 number/string, change it back to base 10, skip the first "1", then turn every 2 numbers/letters into an int and rebuild the original string.

Example Code:

    String s = "1,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2,2,3,0,4,0,0,0,4,0,3";

    // ENCODE the string
    StringTokenizer tokenizer = new StringTokenizer(s,",");
    StringBuilder b = new StringBuilder();
    b.append("1");  // This is a primer character, in case we end up with a bunch of zeroes at the beginning
    while(tokenizer.hasMoreTokens()) {
        String token = tokenizer.nextToken().trim();
        if(token.length()==1) {
            b.append("0");
            b.append(token);
        }
        else {
            b.append(token);
        }
    }

    System.out.println(b);
    // We get this String: 101020000000000000000000000000000000000010202030004000000040003

    String encoded = (new BigInteger(b.toString())).toString(36);
    System.out.println(encoded);
    // We get this String: kcocwisb8v46v8lbqjw0n3oaad49dkfdbc5zl9vn


    // DECODE the string

    String decoded = (new BigInteger(encoded, 36)).toString();
    System.out.println(decoded);
    // We should get this String: 101020000000000000000000000000000000000010202030004000000040003

    StringBuilder p = new StringBuilder();
    int index = 1;   // we skip the first "1", it was our primer
    while(index<decoded.length()) {
        if(index>1) {
            p.append(",");
        }
        p.append(Integer.parseInt(decoded.substring(index,index+2)));
        index = index+2;
    }

    System.out.println(p);
    // We should get this String: 1,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2,2,3,0,4,0,0,0,4,0,3

I don't know of an easy way to turn a large number into base 64. Carefully chosen symbols (like +,,-) are ok to be URL encoded, so 0-9, a-z, A-Z, with a "" and "-" makes 64. The BigInteger.toString() method only takes up to Character.MAX_RADIX which is 36 (no uppercase letters). If you can find a way to take a large number and change to base 64, then the resulting encoded String will be even shorter.

EDIT: looks like this does it for you: http://commons.apache.org/codec/apidocs/org/apache/commons/codec/binary/Base64.html