Unique Identifiers – Generating Secure IDs for Offline Web Apps

javascriptmongodbofflineuuid

I have a web based project that allows users to work both online and offline and I'm looking for a way to generate unique ids for records on the client side. I'd like an approach that works while a user is offline (i.e. unable to talk to a server), is guaranteed to be unique, and is secure. By "secure", I'm specifically worried about clients submitting duplicate id's (maliciously or otherwise) and thereby wreaking havoc on data integrity.

I've been doing some googling, hoping this was already a solved problem. I haven't found anything that's very definitive, especially in terms of approaches that are in use in production systems. I found some examples for systems where users will only access the data that they've created (e.g. a Todo list that's accessed on multiple devices, but only by the user who created it). Unfortunately, I need something a bit more sophisticated. I did find some really good ideas here, which are in line with how I was thinking things might work.

Below is my proposed solution.

Some Requirements

IDs should be globally unique (or at least unique within the system)
Generated on the client (i.e. via javascript in the browser)
Secure (as outlined above and otherwise)
Data can be viewed/edited by multiple users, including users who didn't author it
Doesn't cause significant performance issues for backend db's (such as MongoDB or CouchDB)

Proposed Solution

When users create an account, they would be given a uuid which was generated by the server and known to be unique within the system. This id must NOT be the same as the users authentication token. Let's call this id the users "id token".

When a user creates a new record, they generate a new uuid in javascript (generated using window.crypto when available. See examples here). This id is concatenated with the "id token" the user received when they created their account. This new composite id (server side id token + client side uuid) is now the unique identifier for the record. When the user is online and submits this new record to the backend server, the server would:

Identify this as an "insert" action (i.e. not an update or a delete)
Validate both parts of the composite key are valid uuids
Validate that the provided "id token" part of the composite id is correct for the current user (i.e. it matches the id token the server assigned to the user when they created their account)
If everything is copasetic, insert the data into the db (being careful to do an insert and not an "upsert" so that if the id does already exists it doesn't update an existing record by mistake)

Queries, updates, and deletes wouldn't require any special logic. They would simply use the id for the record in the same manner as traditional applications.

What are the advantages of this approach?

Client code can create new data while offline and know the id for that record immediately. I considered alternate approaches where a temporary id would be generated on the client which would later be swapped out for a "final" id when the system was online. However, this felt very brittle. Especially when you start thinking about creating child data with foreign keys that would also need to be updated. Not to mention dealing with urls that would change when the id changed.
By making ids a composite of a client generated value AND a server generated value, each user is effectively creating ids in a sandbox. This is intended to limit the damage that can be done by a malicious/rogue client. Also, any id collisions are on a per user basis, not global to the entire system.
Since a users id token is tied to their account, ids can only be generated in a users sandbox by clients that are authenticated (i.e. where the user successfully logged in). This is intended to keep malicious clients from creating bad ids for a user. Of course if a users auth token were stolen by a malicious client, they could do bad things. But, once an auth token has been stolen the account is compromised anyhow. In the event that this did happened, the damage done would be limited to the compromised account (not the entire system).

Concerns

Here are some of my concerns with this approach

Will this generate sufficiently unique ids for a large scale application? Is there any reason to think this will result in id collisions? Can javascript generate a sufficiently random uuid for this to work? It looks like window.crypto is fairly widely available and this project already requires reasonably modern browsers. (this question now has a separate SO question of its own)
Are there any loopholes that I'm missing which could allow a malicious user to compromise the system?
Is there reason to worry about DB performance when querying for a composite key made up of 2 uuids. How should this id be stored for best performance? Two separate fields or a single object field? Would there be a different "best" approach for Mongo vs Couch? I know that having a non-sequential primary key can cause notable performance issues when doing inserts. Would it be smarter to have an auto generated value for the primary key and store this id as a separate field? (this question now has a separate SO question of its own)
With this strategy, it would be easy to determine that a particular set of records was created by the same user (since they'd all share the same publicly visible id token). While I don't see any immediate issues with this, it's always better to not leak more info about internal details than is needed. Another possibility would be to hash the composite key, but that seems like it may be more trouble than it's worth.
In the event that there is an id collision for a user, there's not a simple way to recover. I suppose the client could generate a new id, but this seems like a lot of work for an edge case that really shouldn't ever happen. I'm intending to leave this unaddressed.
Only authenticated users can view and/or edit data. This is an acceptable limitation for my system.

Conclusion

Is above a reasonable plan? I realize some of this comes down to a judgement call based on a fuller understanding of the application in question.

Best Answer

Your approach will work. A lot of document management systems use this type of approach.

One thing to consider is that you don't need to use both the user uuid and the random item id as part of the string. You can instead hash the concatination of both. This will give you a shorter identifier, and possibly some other benefits because the resultant id's will be more evenly distributed (better balanced for indexing, and file storage if you are storing files based on their uuid).

Another option you have is to generate just a temporary uuid for each item. Then when you do connect and post them to the server, the server generates (guaranteed) uuid's for each item and returns that to you. You then update your local copy.

Related Solutions

How to generate “language-safe” UUIDs

A couple of tips that will lower the chances of inadvertently creating meaningful words:

Add some non-alpha, non-numerical characters to the mix, such as "-", "!" or "_".
Compose your UUIDs by accumulating sequences of characters (rather than single characters) that are unlikely to occur in real words, such as "zx" or "aa".

This is some C# sample code (using .NET 4):

private string MakeRandomString()  
{  
    var bits = new List<string>()  
    {  
            "a",  
            "b",  
            "c",  
            "d",  
            "e",  
            //keep going with letters.  
            "0",  
            "1",  
            "2",  
            "3",  
            //keep going with numbers.  
            "-",  
            "!",  
            "_",  
            //add some more non-alpha, non-numeric characters.  
            "zx",  
            "aa",  
            "kq",  
            "jr",  
            "yq",  
            //add some more odd combinations to the mix.  
    };  

    StringBuilder sb = new StringBuilder();  
    Random r = new Random();  
    for (int i = 0; i < 8; i++)  
    {  
        sb.Append(bits[r.Next(bits.Count)]);  
    }  

    return sb.ToString();  
}

This doesn't guarantee that you won't offend anyone, but I agree with @DeadMG that you cannot aim so high.

Javascript – Client side authentication through signatures instead of passwords

The DB backend is "insecure". It is possible to tamper the data (many persons have access to the machine, perhaps it's "in the cloud") [...]

What I want to be able to do is make very difficult for third parties that have read/write access to the DB machine [...]

Your goal shouldn't be to make it very difficult for third parties to modify the data from your database. Your goal should be to prevent anyone but the administrators to access both the server and the database.

What you are doing right now is like giving your bank account number and password to complete strangers and then ask how you make it difficult for those strangers to take your money. Well, you might think of not sharing the account information in the first place.

It's also unclear what “in the cloud” has to do with security. Most cloud providers such as Amazon or Microsoft do a great job of hosting virtual machines in a very secure manner.

Therefore:

Get rid of those “third parties” who can SSH your server with su access. This is the wrong way of hosting web applications. Don't do that. And if you do, please, inform your users on the website that all personal information they give you can and will be shared with complete strangers.
Use TLS. There are no reasons not to use one, especially since free services exist.
Don't trust client-side. No, you can't sign JavaScript, and in general, on client-side, you can't protect your users neither from themselves, nor from any crap they install on their machines. Viruses would be able to inject themselves into a browser and modify JavaScript; if you do a mechanism which checks if JavaScript is tampered, they would be able to tamper the checking mechanism itself.
While you can encrypt information on client-side, there are few good reasons limited to few cases where you would actually do it. The reason is that, once again, client machine shouldn't be trusted, because it's usually difficult for a user to secure his device. The server, on the other hand, can be secured very easily, and with TLS, man in the middle attacks could effectively be prevented.

Thus, ensure both your servers and the communication between the servers and the clients are secure.

A few notes:

Sadly to make the software compatible with cell phones I will need to keep the iterations of PBKDF2 low [...] (perhaps I could use Scrypt instead of PBKDF2... It should be more resistant to GPU attacks)

This is why you need to compute the hash on the server. If you reduce the number of iterations, you're making it easier for the attacker to bruteforce the hashes.

Searching for a different algorithm is a wrong solution here. Remember that an attacker will have powerful CPUs and GPUs. Your goal is to slow an attacker, and you do that by increasing the iterations.

After this I could add a two step verification (using the algorithm used by Google Authenticator, for example). The key of this would be shared between user and server.

Which means that your “third parties” will have full access to those keys. In this context, two-step verification makes things much worse: it gives you a sense of security, while providing absolutely no benefit.