When I learned relational databases, the prof said that one would "almost always" want an artificial int as the primary key in a table, but did not specify what the exceptions are. At some time I stopped using them for junction tables, and never had a problem.
Now I am making a database with a lot of lookup tables, and wonder whether this is a case where leaving artificial keys out wouldn't make for a cleaner design and simple programming.
A toy example: assume that this is a mockup of the UI I want to achieve.
The design option with artificial IDs would be (Type is a foreign key):
LiteraryWork
Title Type
Winnie The Pooh 1
The Nightingale and the Rose 2
Snowwhite 2
LiteraryWorkType
ID TypeName
1 Novel
2 Fairy Tale
And the option without them uses the Type name itself as the key (again, the column Type is properly declared as a foreign key):
LiteraryWork
Title Type
Winnie The Pooh Novel
The Nightingale and the Rose Fairy Tale
Snowwhite Fairy Tale
LiteraryWorkType
TypeName
Novel
Fairy Tale
I tend towards using the second option, because I would need one less join when showing data on the screen. (I don't want to get rid of the lookup table entirely because I want to be able to restrict the values users may enter, for example by giving them a drop-down list bound to the lookup table). The only disadvantage I can think of is that, when a stakeholder says "but I want my UI to say 'story', not 'fairy tale'", I would have to update all data rows in the LiteraryWork
table. I can live with this, as I don't expect it to happen often in my case.
Does the first design have any other advantages I am missing? Which of the two options is considered best practice, and why?
Edit2 As I understand it, the existing answers are afraid that I am trying to break normalization, as in
LiteraryWork
Title Type LiteraryWorkTypeIsFiction
Winnie The Pooh Novel Yes
The Nightingale and the Rose Fairy Tale Yes
Snowwhite Fairy Tale Yes
To make it clear: the above is not what I am trying to do. Instead, if there really was more information pertaining to LiteraryWorkType, and I was using string IDs, I would record it this way:
LiteraryWork
Title Type
Winnie The Pooh Novel
The Nightingale and the Rose Fairy Tale
Snowwhite Fairy Tale
LiteraryWorkType
TypeName IsFiction
Novel Yes
Fairy Tale Yes
Conference paper No
The only difference to the "typical" database design would be that the ID is a nvarchar, not an integer. Which certainly has its drawbacks in storage needed, as pointed out, but I don't see which normalization rule it is supposed to hurt.
But this example aside, I am not trying to use string IDs when there actually is more information to be recorded about a LiteraryWorkType (so that LiteraryWorkType should be considered an entity in its own right). I am speaking about cases as simple as the toy example I gave at the beginning: the whole second table exists only because SQL has no "enum" type, and each data record in it consists of nothing but a single word, unique between records.
Best Answer
Repeating a variable length string over many rows (both data and indexes) is far less efficient than storing a tinyint value.
Edit:
Your main problem is database size and bloating because rows are longer then necessary. This means less rows per page and more memory use for queries. See these for why
I've seen huge databases that don't use lookup tables (designed by Hibernate ORM on MySQL) and have long strings repeated. By my estimate, the database could have been 60% smaller at least.
Normalisation isn't an issue if you are using lookup table on the natural key. Which you have clarified