Unique Identifiers – Generating Secure IDs for Offline Web Apps

javascriptmongodbofflineuuid

I have a web based project that allows users to work both online and offline and I'm looking for a way to generate unique ids for records on the client side. I'd like an approach that works while a user is offline (i.e. unable to talk to a server), is guaranteed to be unique, and is secure. By "secure", I'm specifically worried about clients submitting duplicate id's (maliciously or otherwise) and thereby wreaking havoc on data integrity.

I've been doing some googling, hoping this was already a solved problem. I haven't found anything that's very definitive, especially in terms of approaches that are in use in production systems. I found some examples for systems where users will only access the data that they've created (e.g. a Todo list that's accessed on multiple devices, but only by the user who created it). Unfortunately, I need something a bit more sophisticated. I did find some really good ideas here, which are in line with how I was thinking things might work.

Below is my proposed solution.

Some Requirements

  1. IDs should be globally unique (or at least unique within the system)
  2. Generated on the client (i.e. via javascript in the browser)
  3. Secure (as outlined above and otherwise)
  4. Data can be viewed/edited by multiple users, including users who didn't author it
  5. Doesn't cause significant performance issues for backend db's (such as MongoDB or CouchDB)

Proposed Solution

When users create an account, they would be given a uuid which was generated by the server and known to be unique within the system. This id must NOT be the same as the users authentication token. Let's call this id the users "id token".

When a user creates a new record, they generate a new uuid in javascript (generated using window.crypto when available. See examples here). This id is concatenated with the "id token" the user received when they created their account. This new composite id (server side id token + client side uuid) is now the unique identifier for the record. When the user is online and submits this new record to the backend server, the server would:

  1. Identify this as an "insert" action (i.e. not an update or a delete)
  2. Validate both parts of the composite key are valid uuids
  3. Validate that the provided "id token" part of the composite id is correct for the current user (i.e. it matches the id token the server assigned to the user when they created their account)
  4. If everything is copasetic, insert the data into the db (being careful to do an insert and not an "upsert" so that if the id does already exists it doesn't update an existing record by mistake)

Queries, updates, and deletes wouldn't require any special logic. They would simply use the id for the record in the same manner as traditional applications.

What are the advantages of this approach?

  1. Client code can create new data while offline and know the id for that record immediately. I considered alternate approaches where a temporary id would be generated on the client which would later be swapped out for a "final" id when the system was online. However, this felt very brittle. Especially when you start thinking about creating child data with foreign keys that would also need to be updated. Not to mention dealing with urls that would change when the id changed.

  2. By making ids a composite of a client generated value AND a server generated value, each user is effectively creating ids in a sandbox. This is intended to limit the damage that can be done by a malicious/rogue client. Also, any id collisions are on a per user basis, not global to the entire system.

  3. Since a users id token is tied to their account, ids can only be generated in a users sandbox by clients that are authenticated (i.e. where the user successfully logged in). This is intended to keep malicious clients from creating bad ids for a user. Of course if a users auth token were stolen by a malicious client, they could do bad things. But, once an auth token has been stolen the account is compromised anyhow. In the event that this did happened, the damage done would be limited to the compromised account (not the entire system).

Concerns

Here are some of my concerns with this approach

  1. Will this generate sufficiently unique ids for a large scale application? Is there any reason to think this will result in id collisions? Can javascript generate a sufficiently random uuid for this to work? It looks like window.crypto is fairly widely available and this project already requires reasonably modern browsers. (this question now has a separate SO question of its own)

  2. Are there any loopholes that I'm missing which could allow a malicious user to compromise the system?

  3. Is there reason to worry about DB performance when querying for a composite key made up of 2 uuids. How should this id be stored for best performance? Two separate fields or a single object field? Would there be a different "best" approach for Mongo vs Couch? I know that having a non-sequential primary key can cause notable performance issues when doing inserts. Would it be smarter to have an auto generated value for the primary key and store this id as a separate field? (this question now has a separate SO question of its own)

  4. With this strategy, it would be easy to determine that a particular set of records was created by the same user (since they'd all share the same publicly visible id token). While I don't see any immediate issues with this, it's always better to not leak more info about internal details than is needed. Another possibility would be to hash the composite key, but that seems like it may be more trouble than it's worth.

  5. In the event that there is an id collision for a user, there's not a simple way to recover. I suppose the client could generate a new id, but this seems like a lot of work for an edge case that really shouldn't ever happen. I'm intending to leave this unaddressed.

  6. Only authenticated users can view and/or edit data. This is an acceptable limitation for my system.

Conclusion

Is above a reasonable plan? I realize some of this comes down to a judgement call based on a fuller understanding of the application in question.

Best Answer

Your approach will work. A lot of document management systems use this type of approach.

One thing to consider is that you don't need to use both the user uuid and the random item id as part of the string. You can instead hash the concatination of both. This will give you a shorter identifier, and possibly some other benefits because the resultant id's will be more evenly distributed (better balanced for indexing, and file storage if you are storing files based on their uuid).

Another option you have is to generate just a temporary uuid for each item. Then when you do connect and post them to the server, the server generates (guaranteed) uuid's for each item and returns that to you. You then update your local copy.