Relational Database Design – How to Allow Users to Create Custom Fields

database-designrelational-database

Ive been looking around for a way to design a database for large amounts of leads (users) with custom fields.

Ive check this out (How would you design a user database with custom fields), but this solution would limit the amount of custom fields.

So far ive been designing it in three tables:

leads                    (ID, phone, email)
leads_fields             (ID, name, type, required)
leads_fields_content     (ID, fields_id, lead_id, content)

Users can create all the leads_fields they need, fx. 20 fields.

So when I have a lead, I would go trough and check in leads_fields_content for lead_id, get that collection and get the corresponding leads_fields that is referenced.

I see this working – However, one client wants to upload 300.000 leads from day one. So thats 300.000 rows in leads. Then lets say that there are 20 fields for each lead. That would then create 300.000*20 rows in leads_fields_content which is 6.000.000 rows. Thats only for one client.

My question: is this at all the correct way of designing this, taking into account the amount of time it would take to go trough 300.000 rows, and then 6.000.000 afterwards? And this would only get exponentially bigger.

Best Answer

If your customers want to be completely free to define their own custom fields, your approach in general is fine. However, for the given numbers, you need to take some precautions to keep the system performant and scalable.

  • I guess when there are 300.000 leads with 20 fields, there is a high chance of having lots of those content entries empty for the non-required fields. In the suggested design, you can simply leave out those records from the leads_fields_content table, no need to create a record with an empty content for those. So if there is only a small percentage of the fields filled with content, you get only that small percentage of records in the leads_fields_content, not necessarily 6 millions. Your design already allows to store sparse tables efficiently!

  • Proper indexing should be obligatory. However, this is way easier for fields which are known at design time. So if you know some standard fields which are typically required for any customer and don't really need to be customizable (like a lead's name), it would probably be a good idea to make it a fixed part of the lead table instead. Then you can create a specific index for those fields.

  • If each of your customers has his own "custom schema", it is clear they want their data 100% separated from other customers. For example, you will never get any requirements for searching over the content of different customers. So it will probably be best to separate the data of each customer physically. How you accomplish this depends on the DBMS you are using: separate tables per customer, tables in different schemas, different table spaces, same table in different "logical" databases on one server, or completely different database instances per customer, maybe on different machines - it all depends what kind of DB system you are using, how many customers you have and how scalable you need the system to be.

Related Topic