An efficient design to store variables for different product lines in ER database

database-designMySQL

This question is about refactoring existing database design.

My data flow is

User generates some data for product lines A, B, C
Data is saved into the database once
Data is later retrieved multiple times

Current design has 3 tables: data_a, data_b, data_c, where each table shares some columns that are identical (in name) and some that are unique to that product line.

For example, same-name columns in each table are weight, unit_system and a few others. The differently-named columns have values that represent physical quantities of the particular product line. Those are named using various alphanumeric identifiers, like a, b5, e2, and there is a different set of them for different product line. Those sets can share elements, i.e. b5 can be in more than one table, but then something like t1 can be in one table but not the others.

Problem

Currently when there is a need to add some value say x9 to product line a, I would update the database schema for data_a to have column x9. I make the values of x9 as 0 for existing column rows, and new records will begin to populate with the actual x9 values. Then I update the code in relevant places to insert x9 into the table or retrieve it from the table.

Existing design

data_a(id, item_id, shared, different_a)
data_b(id, item_id, shared, different_b)
data_c(id, item_id, shared, different_c)

where shared columns is a group of columns that is identical in each table, while different are columns that are disjointed in theory, as they represent 3 different product lines, but actually may share some similarly-named elements, as some variable names are the same for different product lines.

Proposed design

This is where I'm struggling. Because I don't see a good clean design that is also efficient. I wanted to get rid of the need to alter database schema every time there is a new variable added to a product line. And I believe I can do that, but I also want to make an efficient design, and I don't see one.

But this is my try:

Keep primary key, foreign key and shared column names in a single table:

data(id, item_id, shared)

Create a single table for variables only (variables are ones found in different sets):

data_variables(id, item_id, data_id, variable, value)

I am not sure if this design will be worth the trouble, because … I will actually be storing more data – all the extra data_id or all the extra item_id values for each variable name. There are 15 to 30 variable names for each product line. I will be storing 15 to 30 item_id (or data_id) fields in the new design data_variables table, where in the old design there was only one item_id value per table row.

Question:

Is there a more efficient design that also does not require changes in schema design for every addition/deletion/modification of variable name in a product line? Might it be best to stick with existing design despite the trouble of altering schema when needing to add new variables?

Using JSON for variable "different" fields

one_data_table(id, item_id, product_line, shared, json_encoded_value_pairs);

Decision to not use EAV (Entity–attribute–value) Model

In my case Entities change very rarely if at all (on the order of years), and attributes change rarely as well, on the order of months or more. As such, reworking the database design to use EAV is probably not a good fit for my case.

That aside, I am still debating on my JSON Design.

Best Answer

So I understand you don't want to have fields from data_* in item because they're not really the same thing. So how about something like this schema below? It's similar to your original design, but it adds a new Common_data table between the item table and the data_* tables.

Item
----
 - item_id
 - (other item-focused fields)

Common_Data
-----------
 - common_data_id
 - item_id - FK to item.item_id
 - shared_field_1
 - shared_field_2
 - (many fields that are already shared in data_a, data_b, and data_c)
 - data_type (can be "data a", "data b", "data c")

data_a
------
 - data_a_id
 - common_data_id  - FK to common_data.common_data_id
 - different_a

data_b
------
 - data_b_id
 - common_data_id  - FK to common_data.common_data_id
 - different_b

data_c
------
 - data_c_id
 - common_data_id  - FK to common_data.common_data_id
 - different_c

Pros:

simplifies your shared data, moving it all up to a common data table.
similar to existing design - maybe some of your existing code can be salvaged
new shared data only needs to be added to one place.
simpler to implement.

Cons:

might not be flexible enough if you think you will soon have data_d, data_e, etc... and then remove older ones.
still requires schema (and possibly code) changes when new data_*-specific fields are added.

I'd avoid going the EAV route unless you really need the flexibility of it.

Related Solutions

Database Design for Inventory

Do not make a table for every product.

This problem has been solved many ways. Try this:

Make a product (or products) table, put your common product attributes in the product table, then make an attribute table and a productattribute table, something like:

attribute
---------
attributeId
attributeName
attributeDescription


productattribute
----------------
productId
attributeId
attributeValue

Then you can assign arbitrary attributes to your products.

You can take the schema a little further if you like, using:

attribute
---------
attributeId
attributeName
attributeDescription
attributeType // [float, int, varchar, etc.]


productAttribute
----------------
productId
attributeId
attributeValueFloat
attributeValueInt
attributeValueVarChar

This technique complicates your schema a bit, so be aware of what you are getting into if you decide to use the extra columns to store values by data type.

An alternative that might just suit your needs a little better would be to use a NoSQL Store like MongoDB for your products. Then you can store the attributes you need for each type of product without resorting to the complexity of a relational database schema.

Using MySql 5.7 JSON columns for EAV

Neither EAV nor choosing a JSON column are bad approaches in your case, but which one is really better for you depends on what you want to do with the data once it's stored in the database.

If all you want is to have a product with user-defined attributes and you want to read the product as a whole going the JSON way is going to provide a better performance for you, because the whole product will be located within one table, you can simply decode the retrieved JSON from the database and do with it as you please on the frontend.

If you however want to not only read the product as a whole but, with a future insight, maybe introduce the ability to filter out products with certain attributes (let's say color), using the EAV approach would increase performance of this operation, as you could filter out products whose attribute names directly match the one you're searching for.

SELECT
    pa.ProductId
FROM
    product_attributes pa
WHERE
    pa.`Name` = "color"

Should you have this feature with the JSON column, going through the JSON attribute model takes up more resources than direct string comparsion.

As a developer of REST API backend for mobile applications, an example I am often working with is providing to the user an overview of push notifications they have received through some notification centre within the mobile client.

Because I am not planning to do heavy querying on the data on a frequent basis, the JSON column is completely fine. I just want to provide the data existing in different formats to the user when they query it, so I take the data out of the database and dump it to the user. It's even better because the REST APIs surface is JSON, so I am not even required to do any additional formatting as I would be required to do in the case of an EAV model.

Best Answer

Related Solutions

Database Design for Inventory

Using MySql 5.7 JSON columns for EAV

Related Topic