C++ – Efficient algorithm to count number of substrings divisible by 3

algorithmscdynamic programming

Given a string of decimal digits, I have to find the number of all substrings divisible by 3 in the range L to R [both inclusive], where L & R are index[1-based] of the specified string
string length <= 100000

I've tried the Naive approach of iterating over all possible substrings & obtaining the answer, but that is not fast enough, especially for multiple pairs L & R.

Then I tried a DP approach. I can obtain all possible substrings divisible by 3 in the whole string, i.e. I'm unable to code the DP solution that gives result for given range.
DP Solution that I tried (not completely sure about It):

for(i=0 ; i<n ; i++) {
    for(j=0 ; j<3 ; j++) {
        dp[i][j]=0 ;
    }
    int curr = (input[i]-'0')%3 ;
    dp[i][curr]++ ;
    if(i) {
        for(j=0 ; j<3 ; j++) {
            if(curr % 3 == 0) { dp[i][j] += dp[i-1][j]; }
            if(curr % 3 == 1) { dp[i][j] += dp[i-1][(j+2)%3]; }
            if(curr % 3 == 2) { dp[i][j] += dp[i-1][(j+1)%3]; }
        }
    }
}
long long sum = 0;
for(i=0 ; i<n ; i++) { sum += dp[i][0] ; }

Can this solution be modified to give results for given range [L,R] ?

After searching alot, I learnt that range queries problems are solved by Segment Tree, but I'm unsure as how Segment Tree + Lazy Propagation can help in this question.

So what is a performant way to count all substrings divisible by 3 in the range L to R [both inclusive]?

EDIT:
Input: First Line contains the given string. Lines after that contain two integers denoting L and R respectively, where both L and R are index(1-based) of string.

Input:

301524
1 2
4 6
3 5

Output:

3
1
1

Explanation:
When L=1 & R=2 we get 3 possible substrings, {3}, {0}, {30} & all these when considered as a decimal number are divisible by 3. Hence the output.
When L=4 & R=6 we get 6 possible substrings, {5} , {52}, {524}, {2}, {24}, {4} & out of these only {24} is divisible by 3.

Repetitions in substrings like 3333 count multiple times, so for L=1 to R=4 the answer would be 10, since we have four times 3, three times 33, two times 333 and one time 3333 (all divisible by 3).

Best Answer

Cute dynamic programming problem. Here is a Java solution and quick explanation.

The basic question to ask yourself is if I knew subproblem X, I could solve solve problem Y easily.

In this case, problem Y is the number of substrings divisible by 3, the subproblem X is the number of substrings modulo 3 that terminate at the previous character for each possible mod (that is remained 0, 1, and 2).

If you know that at the previous position, there were 2 substrings that terminated there that had a residue of zero, 3 with a residue of one, and 1 with a residue of two, then given the current number and its residue, it is trivial to determine the residues of all the strings that terminate at the current character.

If the current number's residue is one (e.g., the number is 1, 4, or 7), then the substrings terminating on the previous number with a residue of one now have a residue of two, those with a residue of two now have a residue of zero, and those with a residue of zero now have a residue of one plus 1 more for the current digit since you added a new possible substring of length one.

For example, if you had the string 521438 and you knew the number of strings terminating at the 3 for each residue (2, 2, and 1 respectively for residues 0, 1, and 2), and since 8 mod 3 is 2, you know that all those with residue zero now have residue 2, those with residue two now have residue one, and those with residue one now have residue zero, so (2, 1, and 2 repectively), plus you have a new string of residue 2 so you have 2, 1, and 3 now including the current number.

After you process this in linear time, for all the substrings, you add up the all those with residue zero terminating in all locations.

Here is the code:

// Takes constant space and linear time.
public static void main(String[] args) {

    // You really only need these numbers mod 3.
    int[] s = new int [] { 5,8,1,4,6,2 };
    int left = 3;
    int right = 4;

    int[] found = new int[3];
    int[] last_found = new int[3];
    int sum_found = 0;

    for(int i = left; i <= right; ++i) {

        int res = s[i-1] % 3;

        // leaving the +0 just to show the symmetry.
        // Basically, rotate by the residue and +1 for the new substring.
        // This can be done as a single array, but this is clearer I think.
        // (Also, a proper mathematical modulus would be easier too.)
        found[(res+0) % 3] = last_found[0] + 1;
        found[(res+1) % 3] = last_found[1];
        found[(res+2) % 3] = last_found[2];

        sum_found += found[0];

        // Swap the current and last arrays to make top of the loop simpler.
        int[] swap = last_found;
        last_found = found; 
        found = swap;
    }

    System.out.println( sum_found );
}

Code Edit

The code above removed the table and just keeps track of the last position. It does this with two arrays of length three and swapping between them. It could be done with a single array, but it makes the code more complex (and probably doesn't even perform better in the micro-optimization sense either).

It is now linear time and constant space while obeying the Left and Right requests. It is like a number of other DP algorithms, if you notice each iteration you are only looking back the the Ith-1 iterations, then you can usually elide the full table.

It also keeps track of the sum along with way (now required since the final array doesn't exist anymore either). I didn't fully understand the problem at first, and it appears to have gone through some edits along the way.

Related Topic