C# – Using a GUID as a Primary Key

cdomain-driven-designsql server

I generally use auto increment IDs as Primary Keys in databases. I am trying to learn the benefits of using GUIDs. I have read this article: https://betterexplained.com/articles/the-quick-guide-to-guids/

I realise that these GUIDs are used to identify objects at the application level. Are they also stored as the primary key at the database level. For example, say I had the following class:

public class Person
{
public GUID ID;
public string Name;
..

//Person Methods follow
}

Say I wanted to create a new person in memory and then insert the Person into a database. Can I just do this:

Person p1 = new Person();
p1.ID=GUID.NewGUID();
PersonRepository.Insert(p1);

Say I had a database containing millions and millions of rows with a GUID as the Primary Key. Will this always be unique? Am I even understanding GUIDs correctly?

I read this article earlier: http://enterprisecraftsmanship.com/2014/11/15/cqs-with-database-generated-ids/. It confuses me a little as it appears to recommend a happy medium between GUIDs and integers as Primary Keys.

Edit 11/06/18

I have come to believe that Guids are more suitable than ints for my requirements. I am using CQRS more these days and GUIDs fit in more nicely.

I do notice that some developers model the GUIDs as strings in the domain model e.g. here: https://github.com/dotnet-architecture/eShopOnContainers/blob/dev/src/Services/Ordering/Ordering.Domain/AggregatesModel/BuyerAggregate/Buyer.cs – in this case: IdentityGuid is a GUID modelled as a string. Is there any reason to do this other than what is stated here: Use a custom value object or a Guid as an entity identifier in a distributed system?. Is it "normal" to model the GUID as a string or should I be modelling it as a GUID in the model and database?

Best Answer

GUIDs are by definition "Globally Unique IDentifiers". There's a similar but slightly different concept in Java called UUIDs "Universally Unique IDentifiers". The names are interchangeable for all practical use.

GUIDs are central to how Microsoft envisioned database clustering to work, and if you need to incorporate data from sometimes connected sources, they really help prevent data collisions.

Some Pro-GUID Facts:

  • GUIDs prevent key collisions
  • GUIDs help with merging data between networks, machines, etc.
  • SQL Server has support for semi-sequential GUIDS to help minimize index fragmentation (ref, some caveats)

Some Ugliness with GUIDs

  • They are big, 16 bytes each
  • They are out of order, so you can't sort on ID and hope to get the insertion order like you can on auto-increment ids
  • They are more cumbersome to work with, particularly on small data sets (like look up tables)
  • The new GUID implementation is more robust on SQL Server than it is in the C# library (you can have sequential GUIDS from SQL Server, in C# it is random)

GUIDs will make your indexes bigger, so the disk space cost of indexing a column will be higher. Random GUIDs will fragment your indexes.

If you know you aren't going to synchronize data from different networks, GUIDs can carry more overhead than they are worth.

If you have a need to ingest data from sometimes connected clients, they can be a lot more robust for preventing key collisions than relying on setting sequence ranges for those clients.