Something you need to determine is how 'fair' you want this to be and the performance as the unused space becomes exhausted.
In essence, you want a random box from a N dimensional array. Its not the box itself that is important, but the location of the box that is important.
Under the covers, a 10x20 array is often represented as a 1x200 array with some math behind it to access the right spot. If you were to access [5][13]
, you are actually accessing location [5*20 + 13]
. You can use the same approach to go from a number back to the position. Location 113 goes to the integer devision by 20 and remainder giving 5 r13
.
So now, you don't need to actually store 200 pairs (though that isn't a lot), you just need a bitfield of 200 bits long. Generate a random number within the proper range and mark it as used in the bitfield.
Now, the question of how do you handle it when you've got a collision? This goes to the various hash collision techniques used in hash tables. Some won't work for this application, but its a good read nonetheless.
A simple approach would be once you have one collision, just start incrementing where you are looking at until you find one that wasn't used. The increment could be 1, or any number that is relativity prime to the size of the space.
Ok, I'm at a place I can sit down and write something. Its perl. Shouldn't be too far off of php and is rather straight forward.
#!/usr/bin/perl
my $x = shift @ARGV;
my $y = shift @ARGV;
my $f = shift @ARGV; # fill factor
my $t = $x * $y; # total space
my $v = '';
foreach (1 .. int($t * $f)) {
my $r = int(rand($t)); # random number from 0 .. $t-1
my $yp = int($r / $x); # y' (y prime)
my $xp = int($r % $x); # x' (x prime)
print "Trying $r: $xp $yp...\n";
while(vec($vec, $r, 1)) {
$yp = int($r / $x);
$xp = int($r % $x);
print "\tcollision at $r: $xp $yp\n";
$r += 1;
$r %= $t; # scale $r to within $t
}
$yp = int($r / $x);
$xp = int($r % $x);
vec($vec, $r, 1) = 1; # set the $r th bit
print "\tsettled at $r: $xp $yp\n";
}
Ultimately, the 'settled' values are the ones that you want.
You read two numbers from the command line and assign them to x
and y
. The total search space is t
- this is how big all the possible numbers are. Additionally, read a fill factor as $f
, this should be a value less than 1 and is used to limit the list iteration. Setting a value greater than 1 will present an infinite loop.
I'm filling up the space to the specified value (the foreach (1 .. ($t * $f))
). No, there isn't any error checking on $f
to make sure it is less than or equal to 1, but there should be.
So pick a random number from 0 to $t
- 1. The spot that this represents is $yp
and $xp
- y prime and x prime.
There is some perlish things here, vec works with a bit vector of arbitrary size. There are a number of ways of doing this in a given language, its just rather easy with perl. With Java, one could use a BitSet (this is big enough to hold maxint bits, which could represent a pair of 46340 numbers).
You then test the bit at the $r
th location to see if it has been used. If it has, increment $r
and roll it over if it becomes larger than the total space (so if you have a 10 and 20 (0 .. 199) and you hit 200, it becomes 0).
Once you find an unused bit, set it and output your values.
This is what the output looks like though:
Trying 96: 6 9...
settled at 96: 6 9
Trying 117: 7 11...
collision at 117: 7 11
settled at 118: 8 11
Trying 115: 5 11...
settled at 115: 5 11
Trying 153: 3 15...
settled at 153: 3 15
Trying 90: 0 9...
settled at 90: 0 9
Trying 140: 0 14...
collision at 140: 0 14
collision at 141: 1 14
collision at 142: 2 14
settled at 143: 3 14
Trying 73: 3 7...
settled at 73: 3 7
This is tested on a unix system thus:
% rndpair.pl 10 20 0.5 | grep settled | sort | uniq | wc -l
100
This shows that with 100 numbers there are 100 unique pairs printed out (just look at the 'settled' lines).
There's a fair bit of debugging information in there (you don't really need to keep resetting the value of $yp
and $xp
until the end.
And then there's the question of how fast is it? This is to generate 50% of the available pairs for the available space. Realize that there is some time tied up in the sort and uniq (of possibly some not small bits of text):
% time ./rndpair.pl 500 500 0.5 | grep settled | sort| uniq | wc -l
125000
real 0m3.528s
user 0m3.901s
sys 0m0.045s
Lets kick it up a notch and remove the other applications from the chain.
% time ./rndpair.pl 5000 5000 0.5 > /dev/null
real 0m34.668s
user 0m34.482s
sys 0m0.180s
How much room did that 5k x 5k storage space take? 25,000,000 bits, or about 3 megabytes.
Note that the performance of this drops as the fill factor goes up. For a search space of 6, 80610 (a previous comment if I read it right) this runs quite quickly (note the increasing times as the fill factor goes up):
% time ./rndpair.pl 6 80610 0.5 | grep settled | wc -l
241830
real 0m0.827s
user 0m1.562s
sys 0m0.029s
% time ./rndpair.pl 6 80610 0.75 | grep settled | wc -l
362745
real 0m1.721s
user 0m3.151s
sys 0m0.043s
% time ./rndpair.pl 6 80610 0.9 | grep settled | wc -l
435294
real 0m3.993s
user 0m7.031s
sys 0m0.079s
I created a partitioning program with space usage O(1) but running time O(N^2). You can find the source code here. In the comments there is a good explanation of the shuffling algorithm used.
The key part of this program is the shuffling step, which is the step that takes O(N^2) time. Doc Brown asked "how can you shuffle N elements in less than O(N) space"? I extracted the shuffling logic and created a separate program which is listed below.
To get the full explanation, please refer to the source code linked above. The following is a brief explanation:
The shuffling function simulates a Fisher-Yates shuffle, where you swap the array[0] with array[r], where r is a random number in the range [0..N-1]. Then you swap array[1] with array[r], where r is a new random number in the range [1..N-1]. You keep moving down the array, swapping random elements, until you reach the end of the array.
To use O(1) space, there is no array. Instead, for each new random element that we select, we need to replay the previous swaps in backwards order in order to figure out where the array element really came from. In essence, we pick a random element, and then we undo the swaps that came before it to determine where the original position of the element was. We can replay the previous swaps by simply reseeding the random number generator back to a previously saved seed.
Edit: After posting this solution, I found this stackoverflow question which lists some better ways to create a permutation of N numbers in constant space. So if you substitute one of those solutions in for my shuffling function, you can do better than O(N^2) time and still use O(1) space.
/* Given a number N, shuffle the elements from 0..N-1 and print them. */
/* This algorithm uses O(1) space but uses O(N^2) time. */
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <time.h>
static void shuffle(int n);
int main(int argc, char *argv[])
{
if (argc < 2) {
printf("Usage: shuffle N\n");
exit(0);
}
shuffle(atoi(argv[1]));
return 0;
}
static void shuffle(int n)
{
uint32_t seedOriginal = time(NULL);
uint32_t seed = 0;
int i = 0;
int j = 0;
int slot = 0;
for (i=0;i<n;i++) {
seed = seedOriginal;
srand(seed);
// Skip n-i-1 random numbers.
for (j=n-i-1;j>0;j--)
rand();
// Select an array slot from [i..n-1].
slot = i + (rand() % (n - i));
// Find out what that slot corresponds to in the original order.
// We do this by backtracking through all the previous steps.
for (j=i-1;j>=0;j--) {
int r = j + (rand() % (n - j));
// Every time we see the slot we are looking for, we switch
// to looking for slot j instead, because at this previous step
// we swapped array[j] with array[slot].
if (r == slot)
slot = j;
}
// Slot is now the correct element we are looking for.
printf("%d\n", slot);
}
}
Best Answer
While the users who complain about
rand()
and recommend better RNGs are right about the quality of the random numbers, they are also missing the bigger picture. Duplicates in streams of random numbers cannot be avoided, they are a fact of life. This is the lesson of the birthday problem.On a grid of 20 * 20 = 400 possible spawn positions, a duplicate spawn point is to be expected (50% probability) even when spawning only 24 entities. With 50 entities (still only 12.5% of the whole grid), the probability of a duplicate is over 95%. You have to deal with collisions.
Sometimes you can draw all samples at once, then you can use a shuffle algorithm to draw
n
guaranteed-distinct items. You just need to generate the list of all possibilities. If the full list of possibilities is too large to store, you can generate spawn positions one at a time as you do now (just with a better RNG) and simply re-generate when a collision occurs. Even though having some collisions is likely, many collisions in a row are exponentially unlikely even if most of the grid is populated.