Generate Unique Random 4-Digit Number from Another 4-Digit Number

algorithmsrandom

I have a database that has two columns. The first column is an index, the second is the path to a data file. There are two types of data files, X and Y. These data files are then processed and graphs are created from them. So some examples of the rows look like this:

ID___| FilePath
0001 | /X/datafile1wfre.dat
0023 | /X/datafile89_jncd.dat
2349 | /Y/datafile983jew_un.dat
3984 | /Y/datafileindj389.dat

I am then taking this table, choosing a random row from it, and showing the graph of the data file to the user. After they have spent time looking at the graph, I am then going to ask them, do you think this data file is X or Y?

Let's say someone looks at a graph and this person would like to view that graph at a later point in time. I would then give them an ID of the row. Note: There are ~4000 entries in the table.

Here's the issue, the way to file paths are added to the table, the first half of the table are the paths of X (ID's 0001 – 2000), and the second half of the database are the paths of Y (ID's 2001 – 4000). Someone could easily figure this out and once they see the ID, they would be able to make a prediction of it being X or Y just based on if the ID is above or below 2000.

Here's my goal. I would like to have an algorithm that can take a 4 digit number A and make another (different) 4 digit number B. I want B to be unique to A, no other 4 digit number could make B except A. Here's an example:

0239 would create 9834
7783 would create 3892

9834 is unique to 0239. No matter what 4 digit number you have, the only way to get 9834 is from 0239. Same with 3892, the only way to get 3892 is from 7783.

This way, I can give the generated 4 digit from the algorithm to the user without having them see the Real ID from the table.

Best Answer

There are fundamentally different solutions to this problem as you've presented it.

A truly random mapping from number A (private ID) to number B (public ID) can be created if you have an independent source of random. Every time you create another row and are assigned number A read from random and create number B. To ensure B is unique you will have to search all existing B's before assigning it. This would be the hardest for anyone to reverse engineer. It's basicly what cryptography calls a one time pad. It's also increasingly prohibitive as you get closer and closer to fully populating the space you've allowed. You eventually get to where there is only 1 number left to assign as a B. You have to wait to find it randomly and you have to search to prove uniqueness on every attempt.

A fixed transformation of number A into B by a function. This avoids becoming prohibitive even when fully populating the space. It also risks the user guessing the algorithm. This can be mitigated if instead of simply using a hash to do this, you encrypt the number A. There are encryption algorithms that produce the same size crypto text as plain text and take a cryptovariable (key). Done this way it wouldn't matter if they guessed how you created B so long as the cryptovariable (key) was still a secret. You would want to use a format preserving encryption. This gives you the ability to predict A from a B but if you index B on the database this shouldn't be needed.

If you feel that is overkill you could look into a shuffle that simply obfuscates A. This risks them guessing the shuffle unless it also uses a crypto variable.

It's also worth considering if A is even still needed. If the only thing A provides is a unique identifier then there is no point in being able to convert back to A and no reason to store A in the database when B is all that is needed. This means all you have to do is uniquely randomize the auto increment ID because this will give you B to start with. Some DB's already provide this. This way you have unique id's that don't predict x or y and avoid an unneeded level of indirection.

Related Solutions

Algorithms – Better Approaches to Shortest Path Finding in Traffic Networks

Dijkstra's finds the shortest path between a given node and all other nodes, so I expect it would be more expensive than A*. However, it looks like you're trying to pre-compute the cost & path from any node to any other? Then Dijkstra's is the way to go.

As for a simpler representation, a few things come to mind:

At many intersections, you can come & leave any way you want. It's only a a subset that you have restrictions like "no left turn." So you could use the "laths" only for intersections where you actually need them. That should greatly reduce the size right there.

You could do this automatically by looking for "equivalent laths" and combining them. Two laths are equivalent if all the links coming out are the same. E.g. if "Intersection X coming from the West" and "Intersection X coming from the South" both lead to the same set of other nodes, with the same cost, then just merge them into a single node.

Are you sure you need/want to precompute the best path, instead of computing it online? Video games typically compute these things online.

Also, how are you representing the paths? In your matrix, you only need to represent the first link in the path. For example, to get from Bob's house to Bob's work, you only need to know the first link, since when they get there, you can now look in your matrix for how to get from the first link to Bob's work, which will give you the second link, etc.

PHP Algorithms – Generating Random Unique Pair Numbers from Two Ranges

Something you need to determine is how 'fair' you want this to be and the performance as the unused space becomes exhausted.

In essence, you want a random box from a N dimensional array. Its not the box itself that is important, but the location of the box that is important.

Under the covers, a 10x20 array is often represented as a 1x200 array with some math behind it to access the right spot. If you were to access [5][13], you are actually accessing location [5*20 + 13]. You can use the same approach to go from a number back to the position. Location 113 goes to the integer devision by 20 and remainder giving 5 r13.

So now, you don't need to actually store 200 pairs (though that isn't a lot), you just need a bitfield of 200 bits long. Generate a random number within the proper range and mark it as used in the bitfield.

Now, the question of how do you handle it when you've got a collision? This goes to the various hash collision techniques used in hash tables. Some won't work for this application, but its a good read nonetheless.

A simple approach would be once you have one collision, just start incrementing where you are looking at until you find one that wasn't used. The increment could be 1, or any number that is relativity prime to the size of the space.

Ok, I'm at a place I can sit down and write something. Its perl. Shouldn't be too far off of php and is rather straight forward.

#!/usr/bin/perl

my $x = shift @ARGV;
my $y = shift @ARGV;
my $f = shift @ARGV; # fill factor
my $t = $x * $y;    # total space
my $v = '';

foreach (1 .. int($t * $f)) {
    my $r = int(rand($t));  # random number from 0 .. $t-1
    my $yp = int($r / $x); # y' (y prime)
    my $xp = int($r % $x); # x' (x prime)

    print "Trying $r: $xp $yp...\n";
    while(vec($vec, $r, 1)) {
        $yp = int($r / $x);
        $xp = int($r % $x);
        print "\tcollision at $r: $xp $yp\n";
        $r += 1;
        $r %= $t;   # scale $r to within $t
    }
    $yp = int($r / $x);
    $xp = int($r % $x);
    vec($vec, $r, 1) = 1; # set the $r th bit
    print "\tsettled at $r: $xp $yp\n";
}

Ultimately, the 'settled' values are the ones that you want.

You read two numbers from the command line and assign them to x and y. The total search space is t - this is how big all the possible numbers are. Additionally, read a fill factor as $f, this should be a value less than 1 and is used to limit the list iteration. Setting a value greater than 1 will present an infinite loop.

I'm filling up the space to the specified value (the foreach (1 .. ($t * $f))). No, there isn't any error checking on $f to make sure it is less than or equal to 1, but there should be.

So pick a random number from 0 to $t - 1. The spot that this represents is $yp and $xp - y prime and x prime.

There is some perlish things here, vec works with a bit vector of arbitrary size. There are a number of ways of doing this in a given language, its just rather easy with perl. With Java, one could use a BitSet (this is big enough to hold maxint bits, which could represent a pair of 46340 numbers).

You then test the bit at the $rth location to see if it has been used. If it has, increment $r and roll it over if it becomes larger than the total space (so if you have a 10 and 20 (0 .. 199) and you hit 200, it becomes 0).

Once you find an unused bit, set it and output your values.

This is what the output looks like though:

Trying 96: 6 9...
        settled at 96: 6 9
Trying 117: 7 11...
        collision at 117: 7 11
        settled at 118: 8 11
Trying 115: 5 11...
        settled at 115: 5 11
Trying 153: 3 15...
        settled at 153: 3 15
Trying 90: 0 9...
        settled at 90: 0 9
Trying 140: 0 14...
        collision at 140: 0 14
        collision at 141: 1 14
        collision at 142: 2 14
        settled at 143: 3 14
Trying 73: 3 7...
        settled at 73: 3 7

This is tested on a unix system thus:

% rndpair.pl 10 20 0.5 | grep settled | sort | uniq | wc -l
     100

This shows that with 100 numbers there are 100 unique pairs printed out (just look at the 'settled' lines).

There's a fair bit of debugging information in there (you don't really need to keep resetting the value of $yp and $xp until the end.

And then there's the question of how fast is it? This is to generate 50% of the available pairs for the available space. Realize that there is some time tied up in the sort and uniq (of possibly some not small bits of text):

% time ./rndpair.pl 500 500 0.5 | grep settled | sort| uniq | wc -l
  125000

real    0m3.528s
user    0m3.901s
sys     0m0.045s

Lets kick it up a notch and remove the other applications from the chain.

% time ./rndpair.pl 5000 5000 0.5 > /dev/null

real    0m34.668s
user    0m34.482s
sys     0m0.180s

How much room did that 5k x 5k storage space take? 25,000,000 bits, or about 3 megabytes.

Note that the performance of this drops as the fill factor goes up. For a search space of 6, 80610 (a previous comment if I read it right) this runs quite quickly (note the increasing times as the fill factor goes up):

% time ./rndpair.pl 6 80610 0.5 | grep settled | wc -l
  241830

real    0m0.827s
user    0m1.562s
sys     0m0.029s
% time ./rndpair.pl 6 80610 0.75 | grep settled | wc -l
  362745

real    0m1.721s
user    0m3.151s
sys     0m0.043s
% time ./rndpair.pl 6 80610 0.9 | grep settled | wc -l
  435294

real    0m3.993s
user    0m7.031s
sys     0m0.079s

Best Answer

Related Solutions

Algorithms – Better Approaches to Shortest Path Finding in Traffic Networks

PHP Algorithms – Generating Random Unique Pair Numbers from Two Ranges

Related Topic