Is a one-to-none relationship one-to-one or one-to-many

database-designsql

I am working on an existing database design. There is one table in which each row is created with only half the columns populated (the remainder are initially all NULL, except for an is_populated column), and later a single UPDATE to populate the remaining columns (usually weeks to a month later). None of the data in any of the columns are intended to change after being populated (i.e. when they're no longer NULL). Even though the system is quite old there seem to be many more non-updated rows than updated rows.

Let's say that initially there are 5 populated columns and 5 NULL columns. After the UPDATE all 10 columns are populated. It's not allowed for the data to become unpopulated after the first UPDATE.

Is this truly a one-to-one relationship? Should both halves of the data have been initially stored in a single table or should they have been split into two? Are there any negative performance implications if doing a LEFT JOIN on the first table's primary key (which would end up returning exactly the same structure) instead of querying a single table? If I was to create a similar structure in the future should I follow this design or separate the two concerns?

Best Answer

This is an optional 1 to 1 relationship or a 1 to (0,1) relationship. A true identity relationship would be 1 to (1,1).

I find this notation useful in understanding the scale of the relationship. A one to many relationship could be 1 to (0,10), 1 to (1,5), 1 to (1, *). The fist digit is always 0 (optional) or 1 (mandatory) while the second specifies an upper boundary or unlimited/unspecified.

A many to many relationship resolves to two 1 to relationships to the required join table.

I generally find it is not useful to move optional columns out of the table. In this case, all 10 columns are required, but it is possible to create a record without having the full set of data.

The one case where I have seen identity relationships make sense is inventory-like tables, where there is tombstone data which doesn't change much and frequently changing counts. The access rights to the tombstone data are often different as well. As the system scales out this often becomes a one to many relationship with inventory counts for each location. Joins are created to only one of the tables with the same identity, if the required data is in only one of the tables.

Related Solutions

Database – How would you design a user database with custom fields

Please consider this as an alternative. The previous two examples will both require that you make changes to the schema as the application's scope grows in addition the "custom_column" solution is difficult to extend and maintain. Eventually you'll end up with Custom_510 and then just imagine how awful this table will be to work with.

First let's use your Companies schema.

[Companies] ComnpanyId, COMPANY_NAME, CREATED_ON

Next we'll also use your Users schema for top level required attributes that will be used/shared by all companies.

[Users] UserId, COMPANY_ID, FIRST_NAME, LAST_NAME, EMAIL, CREATED_ON

Next we build a table where we will define our dynamic attributes that are specific to each companies custom user attributes. So here an example value of the Attribute column would be "LikeMusic":

[UserAttributeDefinition] UserAttributeDefinitionId, CompanyId, Attribute

Next we define a UserAttributes table that will hold user attribute values

[UserAttributes] UserAttributeDefinitionId, UserId, Value

This can be modified in many ways to be better for performance. You can use multiple tables for UserAttributes making each one specific to the data type being stored in Value or just leave it as a VarChar and work with it as a keyvalue store.

You also may want to move CompanyId off of the UserAttributeDefiniton table and into a cross reference table for future proofing.

Database – DB Schema design : single table with more columns vs multiple tables with fewer colomns

Any thing ("Entity") that can exist on its own, independently of anything else, should have its own table.

User: id, name, hashed_password, join_date, birth_date

Group: id, name

Relationships between things require generally require "linking" tables.

Post: id, user_id, group_id, post_date, post_title, post_content

The key to success is proper indexing of any field where you join between tables or on which you filter results.
Also, consider using a dummy (non-NULL) Group value for posts to a user's own "wall" - NULLs are often not included in indexes, which will make your queries for these posts run [far] slower.

Best Answer

Related Solutions

Database – How would you design a user database with custom fields

Database – DB Schema design : single table with more columns vs multiple tables with fewer colomns

Related Topic