Should Lookup Tables Enumerating Strings Have an Integer Primary Key?

database-designrelational-database

When I learned relational databases, the prof said that one would "almost always" want an artificial int as the primary key in a table, but did not specify what the exceptions are. At some time I stopped using them for junction tables, and never had a problem.

Now I am making a database with a lot of lookup tables, and wonder whether this is a case where leaving artificial keys out wouldn't make for a cleaner design and simple programming.

A toy example: assume that this is a mockup of the UI I want to achieve.
example for a UI view

The design option with artificial IDs would be (Type is a foreign key):

LiteraryWork
Title                           Type
Winnie The Pooh                 1 
The Nightingale and the Rose    2 
Snowwhite                       2

LiteraryWorkType 
ID TypeName 
1  Novel 
2  Fairy Tale

And the option without them uses the Type name itself as the key (again, the column Type is properly declared as a foreign key):

LiteraryWork
Title                           Type
Winnie The Pooh                 Novel 
The Nightingale and the Rose    Fairy Tale 
Snowwhite                       Fairy Tale 

LiteraryWorkType 
TypeName 
Novel 
Fairy Tale

I tend towards using the second option, because I would need one less join when showing data on the screen. (I don't want to get rid of the lookup table entirely because I want to be able to restrict the values users may enter, for example by giving them a drop-down list bound to the lookup table). The only disadvantage I can think of is that, when a stakeholder says "but I want my UI to say 'story', not 'fairy tale'", I would have to update all data rows in the LiteraryWork table. I can live with this, as I don't expect it to happen often in my case.

Does the first design have any other advantages I am missing? Which of the two options is considered best practice, and why?

Edit2 As I understand it, the existing answers are afraid that I am trying to break normalization, as in

LiteraryWork
Title                           Type        LiteraryWorkTypeIsFiction
Winnie The Pooh                 Novel       Yes
The Nightingale and the Rose    Fairy Tale  Yes
Snowwhite                       Fairy Tale  Yes

To make it clear: the above is not what I am trying to do. Instead, if there really was more information pertaining to LiteraryWorkType, and I was using string IDs, I would record it this way:

LiteraryWork
Title                           Type
Winnie The Pooh                 Novel 
The Nightingale and the Rose    Fairy Tale 
Snowwhite                       Fairy Tale 

LiteraryWorkType 
TypeName         IsFiction
Novel            Yes
Fairy Tale       Yes
Conference paper No

The only difference to the "typical" database design would be that the ID is a nvarchar, not an integer. Which certainly has its drawbacks in storage needed, as pointed out, but I don't see which normalization rule it is supposed to hurt.

But this example aside, I am not trying to use string IDs when there actually is more information to be recorded about a LiteraryWorkType (so that LiteraryWorkType should be considered an entity in its own right). I am speaking about cases as simple as the toy example I gave at the beginning: the whole second table exists only because SQL has no "enum" type, and each data record in it consists of nothing but a single word, unique between records.

Best Answer

Repeating a variable length string over many rows (both data and indexes) is far less efficient than storing a tinyint value.

  • The string "Fairy Tale" takes 12 bytes at least on most systems including 2 for length.
  • You have denormalised and added data modification anomaly risk
  • You then have case sensitivity and collation to take into account for comparisons

Edit:

Your main problem is database size and bloating because rows are longer then necessary. This means less rows per page and more memory use for queries. See these for why

I've seen huge databases that don't use lookup tables (designed by Hibernate ORM on MySQL) and have long strings repeated. By my estimate, the database could have been 60% smaller at least.

Normalisation isn't an issue if you are using lookup table on the natural key. Which you have clarified

Related Topic