Sql – Manufacturing Products Database Design

asp.netdatabasedatabase-designsql server

I am designing a database from scratch from manufacturing products. The key design is as follows:
(Currently I am working on MS Access 2007 but my final implementation is SQL Server i.e. SQL Server 2008 being the latest version. Also I would be using only Express edition at this movement. The application will involve a .NET storefront interfaced with this database)

[1].Each product may be divided into various categories.
(I have decided to assume that if a product does not have a category, I will forcefully insert a category called by the name 'BLANK', rather than calling it NULL, in this way I can add categories to the product in future if required)
1 Product can have many categories.
1 Category can belong to more than 1 products.
Categories may also be changed to belong to a different product in future.

[2].Each product category may be divided into various subcategories.
(Again, I have decided to assume that if a product category does not have a subcategory, I will forcefully insert a subcategory called by the name 'BLANK', rather than calling it NULL, in this way I can add subcategories to a product category in future if required)
1 Product Category can have many subcategories.
1 SubCategory can belong to more than 1 product Category.
SubCategories may also be changed to belong to a different product category in future.

[3].Each product will have different number of attributes.
Each attribute will have different attribute values
Values of some attributes will be dependent on the values of other attributes
Values of some attributes which may not be directly dependent on the value of other attributes will still be indirectly dependent based on the availability of the values of other attributes.

The design I have thereby developed is as follows:

0000_Product_Line
ProductLineID(PK)
ProductLineName

0001_Product_Line_Category
ProductLineCategoryID(PK)
ProductLineCategoryName

0002_Product_Line_Category_Association
[PK]ProductLineID(FK Referring ProductLineID from table 0000_Product_Line)
[PK]ProductLineCategoryID(FK Referring ProductLineCategoryID from table 0001_Product_Line_Category)

0003_Product_Line_Sub_Category
ProductLineSubCategoryID(PK)
ProductLineSubCategoryName

0004_Product_Line_Category_And_Sub_Category_Association
[PK]ProductLineCategoryID(FK Referring ProductLineCategoryID from table 0001_Product_Line_Category)
[PK]ProductLineSubCategoryID(FK Referring ProductLineSubCategoryID from table 0003_Product_Line_Sub_Category)

0005_Product_Line_Attributes
AttributeID(PK)
AttributeName

0006_Product_Line_Attribute_Values
[PK]AttributeID(FK Referring AttributeID from table 0005_Product_Line_Attributes)
[PK]AttributeValues

0007_Product_Line_Attributes_Association
[PK]ProductLineSubCategoryID(FK Referring ProductLineSubCategoryID from table 0003_Product_Line_Sub_Category)
[PK]AttributeID(FK Referring AttributeID from table 0005_Product_Line_Attributes)

Now the biggest issues I have are how do I manage the attributes for various products (finally product subcategories, as I am mapping everything to product subcategories because even if a product has or does not have a category or subcategory I will still forcefully insert a BLANK category and subcategory)

Suppose I have the following values:

Product Line:
A
B

Product Line Categories
A A1
A A2
A A3
B B1
B B2

Product Line Sub Categories
A A1 a
A A1 b
A A1 c
A A2 x
A A2 y
B B1 m
B B1 n
B B2 p
B B2 q
B B2 r

Attributes
a att1
a att2
a att3
b att2
b att4
c att1
c att3
c att4
x att2
x att3
y att4
y att5
m att3
m att7
n att1
n att7
p att2
p att3
p att5
q att4
q att6
r att1
r att6

Attribute Values
att1 Value1
att1 Value2
att1 Value3
att2 Value1
att2 Value2
att2 Value3
att2 Value4
att3 Value1
att3 Value2
att4 Value1
att4 Value2
att4 Value3
att4 Value4
att4 Value5
……

Now For Example: att1 and att2 are related
So if user selects a Value for att1, it determines what all values of att2 should be made available to the user
Similarly if user selects Value for att2, it determines what all values of att3 should be made available to the user

There is no hierarchy that, if att1 and att2 are related, user should select att1 first and att2 next OR viceversa
User can select the value for either att1 OR att2 and it should filter down the other

Another example: att2 and att3 and att4 are related
So if user selects a Value for att2, it will determine what all values for att3 and att4 should be made available to the user.
Similarly if user selects a Value for att3, it will determine what all values for att2 and att4 should be made available to the user.
Similarly if user selects a Value for att4, it will determine what all values for att2 and att3 should be made available to the user.

Here also, there is no hierarchy that, if att2 and att3 and att4 are related, user should select att2 first and att3 next and att4 next OR viceversa User can select the value for either att2 OR att3 OR att4 and it should filter down the other related attributes

How do I go about establishing relationship between various attributes so that my filter using where clause would work appropriately? If they all were columns I could easily use group by, but these are rows.

I have done a lot of research on internet and came up with the following options:
1. Use one large table for all the products and their attributes. But the size of this table will be immensely huge with innumerable null values in various columns and it doesnt server the purpose as reading using selects would affect performance hugely.

  1. Use Sparse columns in SQL Server 2008 to avoid the effects of null values. However even though sparse columns save space for values which are null, they require more than normal storage space for values which are not null. also the table size is still huge with innumerable columns affecting the read performance.

  2. I should use XML for storing data. I am not sure about how I would implement it and what would be the advantages or disadvantages.

  3. I should break it down into separate small tables for each subcategory representing only its own attributes as columns rather than that of the entire product or its category. Then I would pass table name as parameters in my application code. However this will have the issue of SQL injection. The resolution would be to use Stored Procedure with parameterized queries. However the performance would still be affected as the database engine cannot optimize the query plans for such queries where joins are done only to reach a subcategory but after that a tablename is called by a parameter passing the subcategory.

  4. I should write SQL queries to prepare a view of each subcategories representing its own attributes (rather than having separate table for each subcategory). However to create views I would need to have relationships stored somewhere because some attributes are dependent on each other directly based on the values and some attributes are dependent on each other indirectly based on the avilability of values. So I will have this table of attribute and values (as I have already designed) and then I will need a many to many recursive relationship table. But this many to many recursive relationship table will be immensely huge and the number of self-joins that will take place will drastically affect the performance. Also again the viewname will need to be passed as a parameter to a dynamic sql or parameterized stored procedure.

  5. The last option I came across was EAV database design which has its own pros and cons including complex query writing as well as monitoring the performance on my own because database engine cannot optimize the query plans for such a database. Also such an EAV database loses most of its relational essence and the constraints on various values are far dificult to enforce which will involve a lot of triggers.

I am not sure what would be the best option to design my database.
The database would be mostly accessed on SELECT queries through the application. Any insertion for a product or its category or subcategory or its attributes or values will be done by data entry. (The direct inserts will be done for registering customer and product vendors/manufacturers which i am presuming at this moment will be a separate database. Then i would need to develop some kind of mapping to map the vendors/manufacturers to the products they provide or manufacture)
Users will be able to filter down to a subcategory, select its attribute values and buy the product online.
There is also a need for a filter (by filter i mean queries using where clause) required where the user will not know what kind of product they want, so they will select the attributes they want to filter down through and it should map to one of the subcategories and finally the user should be able to select the values for the attributes.

I am really confused with how do I implement my database and facing lot of challenges being alone in the IT team in a small company.

Any help from you all will be greatly appreciated and considered a huge asset.

Best Answer

Have you though about having one table with a parent/child relationship. (Self referencing table). Initally looking at your tables, this would work up to the Attributes table.

e.g. Table structure (3 Columns)

ID Name ParentID
0 A NULL
1 B NULL
2 A1 0
3 A2 0
4 B1 1
5 B2 1
6 a 2
7 b 2
8 x 4
9 att1 6
10 att2 6
11 att2 8
12 att3 8

You can work out the hierarchy structure by self joining to the table using the parentID to get the parent group.

The first two entries in the example are from the Product Line table.They have no ParentID
The next 4 entries (A1,A2,B1,B2) are from the Product Line Categories table, which have a parentID to reference back to the Product Line Data (A,B)
After that the next three entries are from the Product Line Sub Categories table. Again notice how there ParentID's reference the ID for (A1, A2, B1, B2) The final 4 Attributes refernce the data in the Product Line Sub Categories table.

With this method you can stick all the required information in the one table, also means you wount have any null columns as you will only store the required data

NOTE: The key part to understand is the 3rd column ParentID which stores the reference to the parent table in your current design