An efficient design to store variables for different product lines in ER database

database-designMySQL

This question is about refactoring existing database design.

My data flow is

  1. User generates some data for product lines A, B, C
  2. Data is saved into the database once
  3. Data is later retrieved multiple times

Current design has 3 tables: data_a, data_b, data_c, where each table shares some columns that are identical (in name) and some that are unique to that product line.

For example, same-name columns in each table are weight, unit_system and a few others. The differently-named columns have values that represent physical quantities of the particular product line. Those are named using various alphanumeric identifiers, like a, b5, e2, and there is a different set of them for different product line. Those sets can share elements, i.e. b5 can be in more than one table, but then something like t1 can be in one table but not the others.

Problem

Currently when there is a need to add some value say x9 to product line a, I would update the database schema for data_a to have column x9. I make the values of x9 as 0 for existing column rows, and new records will begin to populate with the actual x9 values. Then I update the code in relevant places to insert x9 into the table or retrieve it from the table.

Existing design

data_a(id, item_id, shared, different_a)
data_b(id, item_id, shared, different_b)
data_c(id, item_id, shared, different_c)

enter image description here

where shared columns is a group of columns that is identical in each table, while different are columns that are disjointed in theory, as they represent 3 different product lines, but actually may share some similarly-named elements, as some variable names are the same for different product lines.

Proposed design

This is where I'm struggling. Because I don't see a good clean design that is also efficient. I wanted to get rid of the need to alter database schema every time there is a new variable added to a product line. And I believe I can do that, but I also want to make an efficient design, and I don't see one.

But this is my try:

Keep primary key, foreign key and shared column names in a single table:

data(id, item_id, shared)

Create a single table for variables only (variables are ones found in different sets):

data_variables(id, item_id, data_id, variable, value)

enter image description here

I am not sure if this design will be worth the trouble, because … I will actually be storing more data – all the extra data_id or all the extra item_id values for each variable name. There are 15 to 30 variable names for each product line. I will be storing 15 to 30 item_id (or data_id) fields in the new design data_variables table, where in the old design there was only one item_id value per table row.

Question:

Is there a more efficient design that also does not require changes in schema design for every addition/deletion/modification of variable name in a product line? Might it be best to stick with existing design despite the trouble of altering schema when needing to add new variables?

Using JSON for variable "different" fields

one_data_table(id, item_id, product_line, shared, json_encoded_value_pairs);

Decision to not use EAV (Entity–attribute–value) Model

In my case Entities change very rarely if at all (on the order of years), and attributes change rarely as well, on the order of months or more. As such, reworking the database design to use EAV is probably not a good fit for my case.

That aside, I am still debating on my JSON Design.

Best Answer

So I understand you don't want to have fields from data_* in item because they're not really the same thing. So how about something like this schema below? It's similar to your original design, but it adds a new Common_data table between the item table and the data_* tables.

Item
----
 - item_id
 - (other item-focused fields)

Common_Data
-----------
 - common_data_id
 - item_id - FK to item.item_id
 - shared_field_1
 - shared_field_2
 - (many fields that are already shared in data_a, data_b, and data_c)
 - data_type (can be "data a", "data b", "data c")

data_a
------
 - data_a_id
 - common_data_id  - FK to common_data.common_data_id
 - different_a

data_b
------
 - data_b_id
 - common_data_id  - FK to common_data.common_data_id
 - different_b

data_c
------
 - data_c_id
 - common_data_id  - FK to common_data.common_data_id
 - different_c

Pros:

  • simplifies your shared data, moving it all up to a common data table.
  • similar to existing design - maybe some of your existing code can be salvaged
  • new shared data only needs to be added to one place.
  • simpler to implement.

Cons:

  • might not be flexible enough if you think you will soon have data_d, data_e, etc... and then remove older ones.
  • still requires schema (and possibly code) changes when new data_*-specific fields are added.

I'd avoid going the EAV route unless you really need the flexibility of it.

Related Topic