Generate Unique Random 4-Digit Number from Another 4-Digit Number

algorithmsrandom

I have a database that has two columns. The first column is an index, the second is the path to a data file. There are two types of data files, X and Y. These data files are then processed and graphs are created from them. So some examples of the rows look like this:

ID___| FilePath
0001 | /X/datafile1wfre.dat
0023 | /X/datafile89_jncd.dat
2349 | /Y/datafile983jew_un.dat
3984 | /Y/datafileindj389.dat

I am then taking this table, choosing a random row from it, and showing the graph of the data file to the user. After they have spent time looking at the graph, I am then going to ask them, do you think this data file is X or Y?

Let's say someone looks at a graph and this person would like to view that graph at a later point in time. I would then give them an ID of the row. Note: There are ~4000 entries in the table.

Here's the issue, the way to file paths are added to the table, the first half of the table are the paths of X (ID's 0001 – 2000), and the second half of the database are the paths of Y (ID's 2001 – 4000). Someone could easily figure this out and once they see the ID, they would be able to make a prediction of it being X or Y just based on if the ID is above or below 2000.

Here's my goal. I would like to have an algorithm that can take a 4 digit number A and make another (different) 4 digit number B. I want B to be unique to A, no other 4 digit number could make B except A. Here's an example:

0239 would create 9834
7783 would create 3892

9834 is unique to 0239. No matter what 4 digit number you have, the only way to get 9834 is from 0239. Same with 3892, the only way to get 3892 is from 7783.

This way, I can give the generated 4 digit from the algorithm to the user without having them see the Real ID from the table.

Best Answer

There are fundamentally different solutions to this problem as you've presented it.

A truly random mapping from number A (private ID) to number B (public ID) can be created if you have an independent source of random. Every time you create another row and are assigned number A read from random and create number B. To ensure B is unique you will have to search all existing B's before assigning it. This would be the hardest for anyone to reverse engineer. It's basicly what cryptography calls a one time pad. It's also increasingly prohibitive as you get closer and closer to fully populating the space you've allowed. You eventually get to where there is only 1 number left to assign as a B. You have to wait to find it randomly and you have to search to prove uniqueness on every attempt.

A fixed transformation of number A into B by a function. This avoids becoming prohibitive even when fully populating the space. It also risks the user guessing the algorithm. This can be mitigated if instead of simply using a hash to do this, you encrypt the number A. There are encryption algorithms that produce the same size crypto text as plain text and take a cryptovariable (key). Done this way it wouldn't matter if they guessed how you created B so long as the cryptovariable (key) was still a secret. You would want to use a format preserving encryption. This gives you the ability to predict A from a B but if you index B on the database this shouldn't be needed.

If you feel that is overkill you could look into a shuffle that simply obfuscates A. This risks them guessing the shuffle unless it also uses a crypto variable.

It's also worth considering if A is even still needed. If the only thing A provides is a unique identifier then there is no point in being able to convert back to A and no reason to store A in the database when B is all that is needed. This means all you have to do is uniquely randomize the auto increment ID because this will give you B to start with. Some DB's already provide this. This way you have unique id's that don't predict x or y and avoid an unneeded level of indirection.

Related Topic