Fast indexing of k-combinations

algorithmscombinatorics

I'm revisiting an old problem that I was working on some time ago.

A typical scenario is "3 bits are set within an 8 bit integer", i.e. 00000111.

All unique combinations with 3 set bits can easily be generated (in order) by nested loops. What I'm interested in is the mapping index <-> combination, i.e. "00001011" would be the second combination (or value "1" in a zero-based index).

So far, I ran through all combinations and stored them in a table, making the look up index -> conversation a O(1) operation. The other direction is O(ln(n)) with bisect search.

The downside, however, is that this obviously is heavy on the memory if we increase the domain, up to a point where it's not feasible.

What would be a simple way to calculate the n.th combination or the index for a given combination? Order of combinations would be nice, but is not mandatory.

Best Answer

Generating the n-th combination is called a "unranking" algorithm. Note that permutations and combinations can often equated by the way the problem is parameterized. Without knowing exactly what the problem is, it is difficult to recommend the exact right approach, and in fact, for most combinatoric problems there are usually several different ranking/unranking algorithms possible.

One good resource is "Combinatorial Algorithms" by Kreher and Stinson. This book has many good ranking and unranking algorithms clearly explained. There are more advanced resources, but I would recommend Kreher as a starting point. As an example of an unranking algorithm consider the following:

/** PKSUL : permutation given its rank, the slots and the total number of items
 *  A combinatorial ranking is number of the permutation when sorted in lexicographical order
 *  Example:  given the set { 1, 2, 3, 4 } the ctItems is 4, if the slot count is 3 we have:
 *     1: 123    7: 213   13: 312   19: 412
 *     2: 124    8: 214   14: 314   20: 413
 *     3: 132    9: 231   15: 321   21: 421
 *     4: 134   10: 234   16: 324   22: 423
 *     5: 142   11: 241   17: 341   23: 431
 *     6: 143   12: 243   18: 342   24: 432
 *  From this we can see that the rank of { 2, 4, 1 } is 11, for example. To unrank the value of 11:
 *       unrank( 11 ) = { 11 % (slots - digit_place)!, unrank( remainder ) }
 * @param rank           the one-based rank of the permutation
 * @param ctItems        the total number of items in the set
 * @param ctSlots        the number of slots into which the permuations are generated
 * @param zOneBased      whether the permutation array is one-based or zero-based
 * @return               zero- or one-based array containing the permutation out of the set { ctItems, 1,...,ctItems }
 */
public static int[] pksul( final int rank, final int ctItems, final int ctSlots, boolean zOneBased ){
    if( ctSlots <= 0 || ctItems <= 0 || rank <= 0 ) return null;
    long iFactorial = factorial_long( ctItems - 1 ) / factorial_long( ctItems - ctSlots );
    int lenPermutation = zOneBased ? ctSlots + 1 : ctSlots;
    int[] permutation = new int[ lenPermutation ];
    int[] listItemsRemaining = new int[ ctItems + 1 ];
    for( int xItem = 1; xItem <= ctItems; xItem++ ) listItemsRemaining[xItem] = xItem; 
    int iRemainder = rank - 1;
    int xSlot = 1;
    while( true ){
        int iOrder = (int)( iRemainder / iFactorial ) + 1;
        iRemainder = (int)( iRemainder % iFactorial );
        int iPlaceValue = listItemsRemaining[ iOrder ];
        if( zOneBased ){
            permutation[xSlot] = iPlaceValue;
        } else {
            permutation[xSlot - 1] = iPlaceValue;
        }
        for( int xItem = iOrder; xItem < ctItems; xItem++ ) listItemsRemaining[xItem] = listItemsRemaining[xItem + 1]; // shift remaining items to the left
        if( xSlot == ctSlots ) break;
        iFactorial /= ( ctItems - xSlot );
        xSlot++;
    }
    if( zOneBased ) permutation[0] = ctSlots;
    return permutation;
}

This is permutation unranking, but as mentioned above, in many cases you can convert a combination unranking into an equivalent permutation problem.

Related Solutions

Algorithm – Create Single Integer Index from Two Integers

As your least-significant index (Y) is bounded, to get the right sort order, you can just do index = X*10 + Y and order index on its numerical value.

If the overall index is too large to fit into an integer value, you can simulate numerical ordering by padding the shorter numbers with leading zeros until all values in the comparison are the same length.

Algorithms – Compute k-Permutation Without Repeating and No Duplicates

Unless I'm mistaken:

There are 20^5 = 3,200,000 ways to pick 5 numbers out of 20 (with repeating, with duplicates)
If you don't allow for "duplicates" (picking the same number more than once), you're left with n! / (n - k)!, in this case 20! / (20 - 5)! = 1,860,480. These are k-permutations.
If, then, you don't allow for "repeating" (having collections with the same elements only in a different order), you're left with n! / ((n - k)! * k!), in this case 20! / ((20 - 5)! * 5!) = 15,504. These are called k-combinations. These are what you seem to be looking for.

If you are only looking for combinations of size 5 and want access to the entire set of combinations immediately, then I'd say then 5 nested loop approach is fine, and I don't think you'll be able to do anything faster. You just need to modify your idea a bit so you wont need to check for uniqueness or duplicate numbers and don't do unnecessary iterations. You can produce (only) ordered combinations by doing something like:

public Set<Set<Integer>> all5CombinationsOf20(){
    Set<Set<Integer>> result = new HashSet<Set<Integer>>();
    for(int i = 1; i <= 16; i++){
        for(int j = i+1; j <= 17; j++){
            for(int k = j+1; k <= 18; k++){
                for(int l = k+1; l <= 19; l++){
                    for(int m = l+1; m <= 20; m++){
                        result.add(new HashSet<Integer>(Arrays.asList(i,j,k,l,m)));
                    }
                }
            }
        }
    }
    return result;
}

Since this gives you the combinations in a ordered way [1, 2, 3, 4, 5] is produced, but [3, 2, 1, 5, 4] and [1, 1, 2, 3, 4] are not.

However this approach is not extendable to combinations of arbitrary size, then you would need recursion. I once wrote this:

public static <T> Set<Set<T>> combinations(Set<T> alphabet, int k) {
    if (alphabet.size() == 0) {
        throw new IllegalArgumentException("alphabet.size must be > 0");
    }
    if (k < 0) {
        throw new IllegalArgumentException("k must be >= 0");
    }
    if (k == 0) {
        Set<Set<T>> result = new HashSet<Set<T>>();
        result.add(new HashSet<T>());
        return result;
    } else {
        Set<Set<T>> result = new HashSet<Set<T>>();
        for (T e : alphabet) {
            Set<T> newAlphabet = new HashSet<T>(alphabet);
            newAlphabet.remove(e);
            Set<Set<T>> smallerCombinations = combinations(newAlphabet, k - 1);
            for (Set<T> smallerCombination : smallerCombinations) {
                Set<T> combination = new HashSet<T>(smallerCombination);
                combination.add(e);
                result.add(combination);
            }
        }

        return result;
    }
}

This is not necessarily fast since you are messing around the alphabet a lot, but the idea should be clear: to make a combination of size n over a certain alphabet (in your case 1..20), remove an element e from the alphabet, make a combination of size n-1 over the alphabet minus e and return the combination with e added. Reusing the same alphabet for smaller combinations an letting the HashSet get rid of duplicates would be faster.

Also these sets of combinations get large pretty fast, so you might want to be lazy and not hold them all in memory (depending one how you want to use them), if so, check https://stackoverflow.com/questions/127704/algorithm-to-return-all-combinations-of-k-elements-from-n

Best Answer

Related Solutions

Algorithm – Create Single Integer Index from Two Integers

Algorithms – Compute k-Permutation Without Repeating and No Duplicates

Related Topic