How to generate “language-safe” UUIDs

randomuuid

I always wanted to use randomly generated strings for my resources' IDs, so I could have shorter URLs like this: /user/4jz0k1

But I never did, because I was worried about the random string generation creating actual words, eg: /user/f*cker. This brings two problems: it might be confusing or even offensive for users, and it could mess with the SEO too.

Then I thought all I had to do was to set up a fixed pattern like adding a number every 2 letters. I was very happy with my 'generate_safe_uuid' method, but then I realized it was only better for SEO, and worse for users, because it increased the ratio of actual words being generated, eg: /user/g4yd1ck5

Now I'm thinking I could create a method 'replace_numbers_with_letters', and check that it haven't formed any words against a dictionary or something.

Any other ideas?

ps. As I write this, I also realized that checking for words in more than one language (eg: english and french, spanish, etc) would be a mess, and I'm starting to love numbers-only IDs again.

UPDATE

Some links everyone should read:

http://thedailywtf.com/Articles/The-Automated-Curse-Generator.aspx

http://blogs.msdn.com/b/oldnewthing/archive/2008/06/27/8659071.aspx

Best Answer

A couple of tips that will lower the chances of inadvertently creating meaningful words:

  • Add some non-alpha, non-numerical characters to the mix, such as "-", "!" or "_".
  • Compose your UUIDs by accumulating sequences of characters (rather than single characters) that are unlikely to occur in real words, such as "zx" or "aa".

This is some C# sample code (using .NET 4):

private string MakeRandomString()  
{  
    var bits = new List<string>()  
    {  
            "a",  
            "b",  
            "c",  
            "d",  
            "e",  
            //keep going with letters.  
            "0",  
            "1",  
            "2",  
            "3",  
            //keep going with numbers.  
            "-",  
            "!",  
            "_",  
            //add some more non-alpha, non-numeric characters.  
            "zx",  
            "aa",  
            "kq",  
            "jr",  
            "yq",  
            //add some more odd combinations to the mix.  
    };  

    StringBuilder sb = new StringBuilder();  
    Random r = new Random();  
    for (int i = 0; i < 8; i++)  
    {  
        sb.Append(bits[r.Next(bits.Count)]);  
    }  

    return sb.ToString();  
}  

This doesn't guarantee that you won't offend anyone, but I agree with @DeadMG that you cannot aim so high.

Related Topic