How to Get a Weighted Random Item in Ruby

algorithmsmathrandomruby

I have, for example, this table

+-----------------+
| fruit  | weight |
+-----------------+
| apple  |   4    |
| orange |   2    |
| lemon  |   1    |
+-----------------+

I need to return a random fruit. But apple should be picked 4 times as frequent as Lemon and 2 times as frequent as orange.

In more general case it should be f(weight) times frequently.

What is a good general algorithm to implement this behavior?

Or maybe there are some ready gems on Ruby? 🙂

PS
I've implemented current algorithm in Ruby https://github.com/fl00r/pickup

Best Answer

The conceptually simplest solution would be to create a list where each element occurs as many times as its weight, so

fruits = [apple, apple, apple, apple, orange, orange, lemon]

Then use whatever functions you have at your disposal to pick a random element from that list (e.g. generate a random index within the proper range). This is of course not very memory efficient and requires integer weights.

Another, slightly more complicated approach would look like this:

Calculate the cumulative sums of weights:
```
intervals = [4, 6, 7]
```

Where an index of below 4 represents an apple, 4 to below 6 an orange and 6 to below 7 a lemon.

Generate a random number n in the range of 0 to sum(weights).
Go through the list of cumulative sums from start to finish to find the first item whose cumulative sum is above n. The corresponding fruit is your result.

This approach requires more complicated code than the first, but less memory and computation and supports floating-point weights.

For either algorithm, the setup-step can be done once for an arbitrary number of random selections.

Related Solutions

Algorithms – Using Pi as a Crude Random Number Generator

Digging from http://www.befria.nu/elias/pi/binpi.html to get the binary value of pi (so that it was easier to convert into bytes rather than trying to use decimal digits) and then running it through ent I get the following for an analysis of the random distribution of the bytes:

Entropy = 7.954093 bits per byte.

Optimum compression would reduce the size of this 4096 byte file by 0 percent.

Chi square distribution for 4096 samples is 253.00, and randomly would exceed this value 52.36 percent of the times.

Arithmetic mean value of data bytes is 126.6736 (127.5 = random).

Monte Carlo value for Pi is 3.120234604 (error 0.68 percent).

Serial correlation coefficient is 0.028195 (totally uncorrelated = 0.0).

So yes, using pi for random data would give you fairly random data... realizing that it is well known random data.

From a comment above...

Depending on what you are doing, but I think you can use the decimals of the square root of any prime number as a random number generator. These should at least have evenly distributed digits. – Paxinum

So, I computed the square root of 2 in binary to undetake the same set of problems. Using Wolfram's Iteration I wrote a simple perl script

#!/usr/bin/perl
use strict;
use Math::BigInt;

my $u = Math::BigInt->new("2");
my $v = Math::BigInt->new("0");
my $i = 0;

while(1) {
    my $unew;
    my $vnew;

    if($u->bcmp($v) != 1) { # $u <= $v
        $unew = $u->bmul(4);
        $vnew = $v->bmul(2);
    } else {
        $unew = ($u->bsub($v)->bsub(1))->bmul(4);
        $vnew = ($v->badd(2))->bmul(2);
    }   

    $v = $vnew;
    $u = $unew;

    #print $i,"  ",$v,"\n";
    if($i++ > 10000) { last; }
}

open (BITS,"> bits.txt");
print BITS $v->as_bin();
close(BITS);

Running this for the first 10 matched A095804 so I was confident I had the sequence. The value v_n as when written in binary with the binary point placed after the first digit gives an approximation of the square root of 2.

Using ent against this binary data produces:

Entropy = 7.840501 bits per byte.

Optimum compression would reduce the size
of this 1251 byte file by 1 percent.

Chi square distribution for 1251 samples is 277.84, and randomly
would exceed this value 15.58 percent of the times.

Arithmetic mean value of data bytes is 130.0616 (127.5 = random).
Monte Carlo value for Pi is 3.153846154 (error 0.39 percent).
Serial correlation coefficient is -0.045767 (totally uncorrelated = 0.0).

Random Selection – Best Way to Choose Random Element from Weighted List

If by "best" you actually mean "fastest", then by far the fastest way (although not nearly the most efficient way) is to choose a multiplier that makes all of the weights integers, at least to whatever precision you care about, and then store that many copies of each in one large array.

For example, if you assign "score multiplier" a weight of 80%, and "extra life" a weight of 20%, then create a 5-element array with 4 references to the "score multiplier" object/variable and 1 reference to the "extra life". Then generate a random number between 1 and 5 and pick from the corresponding index. Distribution doesn't matter.

In practice, unless you've got tons of memory and a very slow CPU, this is a waste of memory. It's far, far simpler and more efficient to write a bunch of if statements comparing a random number to a low/high bound, and if the number of possible items isn't that big (let's say less than 10?) then it shouldn't be that hard to maintain.

If you've got hundreds or thousands of possible items/power-ups/whatever then this is no longer a very maintainable solution. In that case the best option I know of is to maintain a sorted list of tuples with the key as the minimum weight; you don't need to store the maximum as that's implied by the next item. For example [ [0, scoreMultiplier], [80, extraLife], [95, invincibility] ]. Then you can do a binary search on this list to find the appropriate item corresponding to a random number in O(log N) time.

Best Answer

Related Solutions

Algorithms – Using Pi as a Crude Random Number Generator

Random Selection – Best Way to Choose Random Element from Weighted List

Related Topic