Database design for polymorphic data

database-designinheritancepolymorphism

I have an application that needs to log communications with users over several different mediums: Email, SMS, Voice, Website Announcements, etc.. in a traditional database.

I have considered 3 approaches to modeling these different types of data:

  1. Single Table Inheritance:
    Store them all together in a single table (i.e., comm_message) with discriminator field of some sort to indicate the type of communications (e.g. message_type). This means that some fields in the table won't be used for each type–and it means that the same message may be duplicated in several different rows in the table (if the message is sent via more than one medium).

  2. Message "has" transports
    Have a message table (comm_message) and then transports table (comm_transports) with the various different mediums of communication. A many-to-many relationship between messages and transports would mean one row for each message in the message table–but that row might have several different transports. If additional specific information is needed for a particular transport, it could be in it's own table (i.e. comm_sms, comm_email, etc..) that is linked off the many-to-many table. I.e., a "has-a" approach.

  3. Class Table Inheritance:
    Create a base message table (comm_message) and then another table for each medium with fields specific to it (inheritance). My ORM (LLBLGen) would facilitate this approach by using shared PKs for the different tables. In this approach there would be a row in the base table (comm_message), plus rows in each of the related tables for each transport (comm_email, comm_sms, etc.) but there would be no many-to-many relationship. Rather the records across different tables would share the same PK (1-1). This would be more of an "is-a" approach.

Context: This is a medium sized application (around 100 tables) that I'll be maintaining for many years–so I'd like to get this "right". I'll often need to present all the communications info together in the UI in a grid, reports, etc..

Which should I use? Why?

Best Answer

Follow composition over inheritance as composition lends itself well to relational databases.

Say you want to get all short messages:

SELECT * FROM Message INNER JOIN ShortMessage ON ShortMessage.message = Message.id

Say you want to get all short messages and emails:

SELECT * FROM Message 
   LEFT OUTER JOIN ShortMessage ON ShortMessage.message = Message.id 
   LEFT OUTER JOIN Email ON Email.message = Message.id

This will effectively create a result set very much like what your first option is, with a lot of null fields.

So the basic idea here is that a message potentially has a email. Depending on whether or not you define Email.message to be UNIQUE, you can make sure there's at most one email corresponding to a message. This setup (like all alternatives you proposed) allows a single message to have multiple different transports, which is actually conceivable in the real world.

Advantages of this approach:

  • (over 1 & 2) Your database is normalized, which usually gives you stuff for free
    • if you want to add new transport types without altering any existing tables (not something you want to do in a huge database)
    • You don't have to store loads of nulls
  • (over 3) You can query all messages in one query, so you can get all messages to a specific recipient like so:

    SELECT IFNULL(Email.body, ShortMessage.body) as text FROM Message 
      LEFT OUTER JOIN ShortMessage ON ShortMessage.message = Message.id 
      LEFT OUTER JOIN Email ON Email.message = Message.id
      WHERE 
        Email.recepient = "john.doe@example.com" 
        OR ShortMessage.recipient = "01189998819991197253"
    

    And voilĂ , you've got yourself a list of all text ever sent to john doe.

However, if you're using an ORM, you might actually use the power to perform queries such as that. I am have no practical experience with C#, but from my understanding as an outsider, Linq lends itself well to accessing relational databases for what they are, instead of trying to shoehorn objects semantics onto records (which always makes you hit a very thick wall at some point down the road).

Related Topic