So, I've been reading up on identifying vs. non-identifying relationships in my database design, and a number of the answers on SO seem contradicting to me. Here are the two questions I am looking at:
- What's the Difference Between Identifying and Non-Identifying Relationships
- Trouble Deciding on Identifying or Non-Identifying Relationship
Looking at the top answers from each question, I appear to get two different ideas of what an identifying relationship is.
The first question's response says that an identifying relationship "describes a situation in which the existence of a row in the child table depends on a row in the parent table." An example of this that is given is, "An author can write many books (1-to-n relationship), but a book cannot exist without an author." That makes sense to me.
However, when I read the response to question two, I get confused as it says, "if a child identifies its parent, it is an identifying relationship." The answer then goes on to give examples such as Social Security Number (is identifying of a Person), but an address is not (because many people can live at an address). To me, this sounds more like a case of the decision between primary key and non-primary key.
My own gut feeling (and additional research on other sites) points to the first question and its response being correct. However, I wanted to verify before I continued forward as I don't want to learn something wrong as I am working to understand database design. Thanks in advance.
Best Answer
The technical definition of an identifying relationship is that a child's foreign key is part of its primary key.
See?
book_id
is a foreign key, but it's also one of the columns in the primary key. So this table has an identifying relationship with the referenced tableBooks
. Likewise it has an identifying relationship withAuthors
.A comment on a YouTube video has an identifying relationship with the respective video. The
video_id
should be part of the primary key of theComments
table.It may be hard to understand this because it's such common practice these days to use only a serial surrogate key instead of a compound primary key:
This can obscure cases where the tables have an identifying relationship.
I would not consider SSN to represent an identifying relationship. Some people exist but do not have an SSN. Other people may file to get a new SSN. So the SSN is really just an attribute, not part of the person's primary key.
Re comment from @Niels:
I suppose so. I hesitate to say yes, because we haven't changed the logical relationship between the tables by using a surrogate key. That is, you still can't make a Comment without referencing an existing Video. But that just means video_id must be NOT NULL. And the logical aspect is, to me, really the point about identifying relationships.
But there's a physical aspect of identifying relationships as well. And that's the fact that the foreign key column is part of the primary key (the primary key is not necessarily a composite key, it could be a single column which is both the primary key of Comments as well as the foreign key to the Videos table, but that would mean you can store only one comment per video).
Identifying relationships seem to be important only for the sake of entity-relationship diagramming, and this comes up in GUI data modeling tools.