Php – random unique pair numbers from two range

algorithmsPHPrandom

I want to write a function that return random unique pair numbers every time call it from a range until resetting it.
Something like this:

function randomUniquePairs($ranges, $reset = false){
   if ($reset === false){
      // some code for reseting
   }
   /*
     some code for creating random unique pair numbers
   */
   return $result;  
} 

randomUniquePairs(range(1,10), range(1,20));
/*
  this function returns for example:
   array(2,9)
*/
randomUniquePairs(range(1,10), range(1,20));
/*
  this function returns for example:
   array(5,19)
*/

randomUniquePairs(range(1,10), range(1,20));
/*
  this function returns for example:
   array(5,9)
*/
 //this function returns random unique pairs until we pass reset paramer true

I tried two approaches:

Approach 1
My first attempt was to make an array of all of possible pairs then select from them randomly, but it is very inefficient, because if ranges are so wide, it consumes a lot of memory.
the code:

  class a {
    private $asqar;

    function __construct() ($range) {

        // cycle for ranges
        foreach ($range[0] as  $v1) {
            foreach ($range[1] as $v2) {
                $asqar[] =  array(
                    $v1,
                    $v2,
                );
            }
        }
    }

    function randomUniquePairs($reset = false){

        if ($reset === true){
           $this->asgar = array();
        }

        $rndKey = array_rand($this->asgar);
        $result = $this->asqar[$rndkey];
        unset($this->asqar[$rndkey]);
        return $result;
    }
}

$c = new a(range(1,10), range(1,20));
$c->randomUniquePairs();

Approach 2
My second attempt was to write a function that produce a pair from these ranges, then store it in a variable, every time that function calls after producing a pair , it checks if this pair produced before call function recursively, it continues until it produce a unique pair.
Here's the code:

class a{ 

    private $__uniqueVariables = array();

    public function randomUniquePairs($range, $reset = false) {

        if ($reset === true){
       $this->__uniqueVariables = array();
    }

    // cycle for each value
    foreach ($range as $value) {

        // shuffle
        shuffle($value);

        // selected id
        $selectedId[] = $value[0];
    }

    // check for selected variable
    if (in_array($rndUnqNum, $this->__uniqueVariables)) {

        // return for try again
        return $this->uniqueVariable($range);
    }

    // added to current unique variables
    $this->__uniqueVariables[] = $rndUnqNum;

    // return
    return $rndUnqNum;
}

}

but this second approach has an issue that sometimes throw Fatal error: Maximum function nesting level of '100' reached.

I'm looking for a better approach to my algorithm.

Best Answer

Something you need to determine is how 'fair' you want this to be and the performance as the unused space becomes exhausted.

In essence, you want a random box from a N dimensional array. Its not the box itself that is important, but the location of the box that is important.

Under the covers, a 10x20 array is often represented as a 1x200 array with some math behind it to access the right spot. If you were to access [5][13], you are actually accessing location [5*20 + 13]. You can use the same approach to go from a number back to the position. Location 113 goes to the integer devision by 20 and remainder giving 5 r13.

So now, you don't need to actually store 200 pairs (though that isn't a lot), you just need a bitfield of 200 bits long. Generate a random number within the proper range and mark it as used in the bitfield.

Now, the question of how do you handle it when you've got a collision? This goes to the various hash collision techniques used in hash tables. Some won't work for this application, but its a good read nonetheless.

A simple approach would be once you have one collision, just start incrementing where you are looking at until you find one that wasn't used. The increment could be 1, or any number that is relativity prime to the size of the space.


Ok, I'm at a place I can sit down and write something. Its perl. Shouldn't be too far off of php and is rather straight forward.

#!/usr/bin/perl

my $x = shift @ARGV;
my $y = shift @ARGV;
my $f = shift @ARGV; # fill factor
my $t = $x * $y;    # total space
my $v = '';

foreach (1 .. int($t * $f)) {
    my $r = int(rand($t));  # random number from 0 .. $t-1
    my $yp = int($r / $x); # y' (y prime)
    my $xp = int($r % $x); # x' (x prime)

    print "Trying $r: $xp $yp...\n";
    while(vec($vec, $r, 1)) {
        $yp = int($r / $x);
        $xp = int($r % $x);
        print "\tcollision at $r: $xp $yp\n";
        $r += 1;
        $r %= $t;   # scale $r to within $t
    }
    $yp = int($r / $x);
    $xp = int($r % $x);
    vec($vec, $r, 1) = 1; # set the $r th bit
    print "\tsettled at $r: $xp $yp\n";
}

Ultimately, the 'settled' values are the ones that you want.

You read two numbers from the command line and assign them to x and y. The total search space is t - this is how big all the possible numbers are. Additionally, read a fill factor as $f, this should be a value less than 1 and is used to limit the list iteration. Setting a value greater than 1 will present an infinite loop.

I'm filling up the space to the specified value (the foreach (1 .. ($t * $f))). No, there isn't any error checking on $f to make sure it is less than or equal to 1, but there should be.

So pick a random number from 0 to $t - 1. The spot that this represents is $yp and $xp - y prime and x prime.

There is some perlish things here, vec works with a bit vector of arbitrary size. There are a number of ways of doing this in a given language, its just rather easy with perl. With Java, one could use a BitSet (this is big enough to hold maxint bits, which could represent a pair of 46340 numbers).

You then test the bit at the $rth location to see if it has been used. If it has, increment $r and roll it over if it becomes larger than the total space (so if you have a 10 and 20 (0 .. 199) and you hit 200, it becomes 0).

Once you find an unused bit, set it and output your values.

This is what the output looks like though:

Trying 96: 6 9...
        settled at 96: 6 9
Trying 117: 7 11...
        collision at 117: 7 11
        settled at 118: 8 11
Trying 115: 5 11...
        settled at 115: 5 11
Trying 153: 3 15...
        settled at 153: 3 15
Trying 90: 0 9...
        settled at 90: 0 9
Trying 140: 0 14...
        collision at 140: 0 14
        collision at 141: 1 14
        collision at 142: 2 14
        settled at 143: 3 14
Trying 73: 3 7...
        settled at 73: 3 7

This is tested on a unix system thus:

% rndpair.pl 10 20 0.5 | grep settled | sort | uniq | wc -l
     100

This shows that with 100 numbers there are 100 unique pairs printed out (just look at the 'settled' lines).

There's a fair bit of debugging information in there (you don't really need to keep resetting the value of $yp and $xp until the end.

And then there's the question of how fast is it? This is to generate 50% of the available pairs for the available space. Realize that there is some time tied up in the sort and uniq (of possibly some not small bits of text):

% time ./rndpair.pl 500 500 0.5 | grep settled | sort| uniq | wc -l
  125000

real    0m3.528s
user    0m3.901s
sys     0m0.045s

Lets kick it up a notch and remove the other applications from the chain.

% time ./rndpair.pl 5000 5000 0.5 > /dev/null

real    0m34.668s
user    0m34.482s
sys     0m0.180s

How much room did that 5k x 5k storage space take? 25,000,000 bits, or about 3 megabytes.

Note that the performance of this drops as the fill factor goes up. For a search space of 6, 80610 (a previous comment if I read it right) this runs quite quickly (note the increasing times as the fill factor goes up):

% time ./rndpair.pl 6 80610 0.5 | grep settled | wc -l
  241830

real    0m0.827s
user    0m1.562s
sys     0m0.029s
% time ./rndpair.pl 6 80610 0.75 | grep settled | wc -l
  362745

real    0m1.721s
user    0m3.151s
sys     0m0.043s
% time ./rndpair.pl 6 80610 0.9 | grep settled | wc -l
  435294

real    0m3.993s
user    0m7.031s
sys     0m0.079s
Related Topic