SQL Guid Primary Key Join Performance

azure-sql-databasedatabase-designsqlsql server

I'm currently using GUIDs as a NONCLUSTERED PRIMARY KEY alongside an INT IDENTITY column.

The GUIDs are required to allow offline creation of data and synchronisation – which is how the entire database is populated.

I'm aware of the implications of using a GUID as a clustered primary key, hence the integer clustered index but does using a GUID as a primary key and therefore foreign keys on other tables have significant performance implications?

Would a better option to use an integer primary/foreign key, and use the GUID as a client ID which has a UNIQUE INDEX on each table? – My concern there is that entity framework would require loading the navigation properties in order to get the GUID of the related entity without significant alteration to the existing code.

The database/hardware in question is SQL Azure.

Best Answer

You can also create foreign keys against unique key constraints, which then gives you the option to foreign key to the ID identity as an alternative to the Guid.

i.e.

Create Table SomeTable
(
    UUID UNIQUEIDENTIFIER NOT NULL,
    ID INT IDENTITY(1,1) NOT NULL,

    CONSTRAINT PK PRIMARY KEY NONCLUSTERED (UUID),
    CONSTRAINT UQ UNIQUE (ID)
)
GO

Create Table AnotherTable
(
    SomeTableID INT,

    FOREIGN KEY (SomeTableID) REFERENCES SomeTable(ID)
)
GO

Edit

Assuming that your centralized database is a Mart, and that only batch ETL is done from the source databases, if you do your ETL directly to the central database (i.e. not via Entity Framework), given that all your tables have UUID FK's after re-population from the distributed databases, you'll need to either map the INT UKCs during ETL or fix them up after the import (which would require a temporary NOCHECK constraint step on the INT FK's).

Once ETL is loaded and INT keys are mapped, I would suggest you ignore / remove the UUID's from your ORM model - you would need to regenerate your EF navigation on the INT keys.

A different solution would be required if you update the central database directly or do continual ETL and do use EF for the ETL itself. In this case, it might be less total I/O just to leave the PK GUID as FKs for RI, drop the INT FK's altogether, and choose other suitable columns for clustering (minimizing page reads).