Efficient algorithm to find a the set of numbers in a range

algorithms

If I have an array of sorted numbers and every object is one of the numbers or multiplication. For example if the sorted array is [1, 2, 7] then the set is {1, 2, 7, 1*2, 1*7, 2*7, 1*2*7}. As you can see if there's n numbers in the sorted array, the size of the set is 2^n-1.

My question is how can I find for a given sorted array of n numbers all the objects in the set so that the objects is in a given interval. For example if the sorted array is [1, 2, 3 ... 19, 20] what is the most efficient algorithm to find the objects that are larger than 1000 and less than 2500 (without calculating all the 2^n-1 objects)?

Best Answer

To search for a "sweet spot" such as this, you generally use a priority queue. The idea being is that you have a list sorted by a specific requirement. The more similar a solution is to a requirement, the more likely it will be chosen from the priority queue to search for more solutions.

In this case, your solution would be a series of indices in which each pertains to a number in your array. So in other words for the solution 2*7, you'd have an array [1,2]. You could also use a binary number with n bits if you prefer. The priority queue, in order to determine which is best, it should mulitply all numbers referred to by the indices together. The sweet spot in this case is going to be the place where you are likely to find most solutions, so in this case it would be the average between the maximum and the minimum, 1750.

The algorithm goes as follows:

Add each number in its own solution array and add it to the priority queue.

Then, starting with the first (the closest to (maximum+minimum)/2), remove it from the priority queue and for each index not included in the solution, add it to the solution and reinsert into the priority queue.

If you find that the current product times each new number exceeds the maximum, you don't even add it to the queue. If it is less than the minimum, then you do nothing with it.

If it is within the minimum and the maximum, do something with it.

Continue until you have them all.

In this way, you're reducing your solution search for values near your average. You should be getting most of your solutions immediately this way, but in order to get them all, you will have to search the entire solution space 2^n-1.

Lets do a quick test for a small range of values [1,2,3,4]. I want to find all solutions whose values are between 10 and 12, then I'd first by adding all numbers in their own solution and putting them in the priority queue. The priority would then look like: [[4],[3],[2],[1]].

I take the first [4], and I append 3, 2, and 1 in their own solutions and add it to the priority queue. The priority queue then becomes:

[[4, 3], [4, 2], [4, 1], [4], [3], [2], [1]]

I see that I've already got my first value [4, 3]! Lets continue and see what happens:

[[4, 3, 1], [4, 2], [4, 1], [4], [3], [2], [1]]

Notice that [4, 3, 2] was too large and was not added. I found another solution [4, 3, 1]!

If something isn't clear ask and I'll try to clarify.

Hope that helps.

Related Solutions

C++ – Fast Algorithm to Search a Sorted Array of Floats

The code in the question (a linear search), as you rightly point out, is going to get slow for large float arrays. Technically it's O(n) where n is the number of float values in your array.

In general, the best that you can do to find a value in an ordered array is a recursive tree search of some kind (e.g. binary search), in which case you can achieve an O(log n) lookup time in the number of elements in your array. O(log n) is much better than O(n) for large values of n.

My suggested approach would therefore be a simple binary search of the array, i.e.:

Set min/max integer indexes to cover your whole float array
test the value in the middle of the range at index mid=(min+max/2) against the search value x
if x is lower than this value, set max to mid, else set min to mid
repeat (2-4) until you have found the correct value

This is an O(log n) algorithm which should be fast enough for nearly all situations. Intuitively, it works by halving the range to be searched at each step until you find the correct value.

It's really hard to beast the simple binary search, so if you have already implemented this correctly then you may be pretty close to optimal already. However, if you know the distributions of the data and/or have a limited range of lookup values (x), there are still some other more advanced tricks you can try:

Bucketing - create buckets (e.g. for each interval between two integers), each of which contains a smaller sorted list of the float values between the two bounding integers plus two values immediately below and immediately above each range. You can then start your search at (trunc(x)+0.5). This should give you a good speedup if you choose appropriately sized buckets (it's effectively increasing the branching factor of the tree.....). If integers don't work for you, then you can try buckets of some other fixed-point precision (e.g. multiples of 1/16).
Bit-mapping - if the range of possible lookup values is small enough, you could try creating a big lookup table indexed by the bitwise value of x. This will be O(1) but you may need a lot of memory which will be very unfriendly on your cache... so use with caution. This is especially nasty because you are looking up float values, so you may well need several GBs to account for all of the less significant bits......
Rounding and hashing - hash tables are probably not the best data structure for this problem, but if you can survive losing a bit of accuracy they could work - simply round off the lowest bits of your lookup values and use a hashmap to directly look up the correct value. You will have to experiment on the right trade-off between hashmap size and precision, and also ensure that all possible hash values are populated so this can be a bit tricky......
Tree-balancing - your ideal tree should have a 50% chance of going left or right. So if you create a tree based on the distribution of lookup values (x), then you can optimise the tree to produce answers with the minimal amount of tests. This is likely to be a good solution if a lot of values in your float array are very close together, since it will enable you to avoid searching these branches too often.
Crit-bit trees - these are still trees (so still O(log n)...) but some cases: you would however need to convert your floats into some fixed-point format in order to make the comparisons work

However, unless you are in a very special situation I'd probably recommend sticking with the simple binary search. Reasons:

it's much easier to implement
it's very fast for most common cases
the extra overhead of the more complex approaches (e.g. higher memory usage / cache pressure) often outweighs the minor theoretical gains
it will be more robust to future changes in the data distributions....

Algorithms – On-the-Fly Random Partition of a Range of Numbers

I created a partitioning program with space usage O(1) but running time O(N^2). You can find the source code here. In the comments there is a good explanation of the shuffling algorithm used.

The key part of this program is the shuffling step, which is the step that takes O(N^2) time. Doc Brown asked "how can you shuffle N elements in less than O(N) space"? I extracted the shuffling logic and created a separate program which is listed below.

To get the full explanation, please refer to the source code linked above. The following is a brief explanation:

The shuffling function simulates a Fisher-Yates shuffle, where you swap the array[0] with array[r], where r is a random number in the range [0..N-1]. Then you swap array[1] with array[r], where r is a new random number in the range [1..N-1]. You keep moving down the array, swapping random elements, until you reach the end of the array.

To use O(1) space, there is no array. Instead, for each new random element that we select, we need to replay the previous swaps in backwards order in order to figure out where the array element really came from. In essence, we pick a random element, and then we undo the swaps that came before it to determine where the original position of the element was. We can replay the previous swaps by simply reseeding the random number generator back to a previously saved seed.

Edit: After posting this solution, I found this stackoverflow question which lists some better ways to create a permutation of N numbers in constant space. So if you substitute one of those solutions in for my shuffling function, you can do better than O(N^2) time and still use O(1) space.

/* Given a number N, shuffle the elements from 0..N-1 and print them. */
/* This algorithm uses O(1) space but uses O(N^2) time.               */
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <time.h>

static void shuffle(int n);

int main(int argc, char *argv[])
{
    if (argc < 2) {
        printf("Usage: shuffle N\n");
        exit(0);
    }

    shuffle(atoi(argv[1]));
    return 0;
}

static void shuffle(int n)
{
    uint32_t seedOriginal = time(NULL);
    uint32_t seed         = 0;
    int      i            = 0;
    int      j            = 0;
    int      slot         = 0;

    for (i=0;i<n;i++) {
        seed = seedOriginal;
        srand(seed);

        // Skip n-i-1 random numbers.
        for (j=n-i-1;j>0;j--)
            rand();

        // Select an array slot from [i..n-1].
        slot = i + (rand() % (n - i));

        // Find out what that slot corresponds to in the original order.
        // We do this by backtracking through all the previous steps.
        for (j=i-1;j>=0;j--) {
            int r = j + (rand() % (n - j));

            // Every time we see the slot we are looking for, we switch
            // to looking for slot j instead, because at this previous step
            // we swapped array[j] with array[slot].
            if (r == slot)
                slot = j;
        }

        // Slot is now the correct element we are looking for.
        printf("%d\n", slot);
    }
}

Best Answer

Related Solutions

C++ – Fast Algorithm to Search a Sorted Array of Floats

Algorithms – On-the-Fly Random Partition of a Range of Numbers

Related Topic