Web Applications – Is It Bad Practice to Allow User Defined Fields?

database-designpatterns-and-practicesweb-applications

Generally speaking, is it considered bad practice to allow for user created fields in a database for a webapp?

For example, I am making a home inventory webapp for my wife, and she is going to want to define her own fields for different items. I was planning to allow her to create item categories, and add "features" to those categories. Features would just be key/value stored as strings. That way if she had a category called "Audio CDs" for example, she could add features for stuff like "artist", "tracks", etc. But in another category like "furniture", she could add features for stuff like "material" (wood, plastic, etc). Then any item could belong to one (or many) categories, adding those features to the item.

I can see issues where searching by these features requires string comparisons, there's no validation of data, etc. Following agile methodology, maybe it would be better to just have her come up with new categories and attributes and I would just have to create new tables as we go. In my example, it's a small userbase (2 of us) and the amount of records created would be small, so not too bad.

Generally speaking though, how do people handle something like this in "real life"?

Best Answer

When you start getting to "user defined fields" as is often found in bug trackers, customer resource management, and similar business tools is that they are not backed with a table with a bajillion fields (if they are, then thats likely a problem of its own).

Instead what you find are Entity Attribute Value table designs and the associated administration tool to manage the valid attributes.

Consider the following table:


  +--------------+
  | thing        |
  |--------------|
  | id           |
  | type         |
  | desc         |
  | attr1        |
  | attr2        |
  | attr3        |
  | attr4        |
  | attr5        |
  +--------------+

This is after you've added a few attributes. Instead of attr1 pretend it reads artist or tracks or genre or whatever attributes the thing has. And instead of 5, what if it was 50. Clearly that is unmanageable. It also requires an update of the model and redeployment of the application to handle a new field. Not ideal.

Now consider the following table structure:


  +--------------+     +---------------+    +-------------+
  | thing        |     | thing_attr    |    | attr        |
  |--------------|     |---------------|    |-------------|
  | id           |<---+|  thing_id (fk)|  +>| id          |
  | type         |     |  attr_id  (fk)|+-+ | name        |
  | desc         |     |  value        |    |             |
  +--------------+     +---------------+    +-------------+

You've got your thing with its basic fields. You have two more tables. One with the attributes. Each field is a row in the attr table. And then there is the thing_attr with a pair of foreign keys relating back to the thing table and the attr table. And this then has a value field where you store whatever the value of the field for that entity would be.

And now you've got a structure where the attr table can be updated at runtime and new fields can be added (or removed) on the fly without significant impact to the overall application.

The queries are a little bit more complex and validation becomes more complex too (either funky stored procedures or all client side). Its a trade off in design.

Consider also the situation where some day you need to do a migration and you come back to the application to find that there are now a half dozen or so more attributes than the schema you originally distributed. This makes for ugly migrations and upgrades where the Entity Attribute Value table, when used correctly, can be cleaner. (Not always, but can be.)

Are there any downsides to just modifying the schema at runtime? If the user thinks a thing needs a new attribute, just dynamically add a column to the table?

If you are working with the appropriate flavor of nosql database, you could probably do this (note that the appropriate flavor of the nosql for this would probably be a key-value store which is, well, the EAV table for relational ones described above) without too much trouble. However it comes with all the compromises for nosql which are described elsewhere in great detail.

If you are instead working on a relational database - you need to have the schema. Adding the column dynamically means some subset of the following things are true:

You are doing meta-database programming. Instead of being able to cleanly map this column to that field with a nice ORM, you are probably doing things like select * and then doing some complex code to find out what the data actually is (see Java's ResultSetMetaData) and then storing that in a map (or some other datatype - but not nice fields in the code). This then throws away a fair bit of type and typo safety that you have with the traditional approach.
You've likely abandoned the ORM. This means you're writing raw sql for all the code instead of letting the system do the work for you.
You've given up on doing clean upgrades. What happens when the customer adds a field with one name that your next version also uses? In the matchmaking site the upgrade that wants to add a hasdate field for storing a timestamp has already been defined as hasdate with a boolean for a successful match... and your upgrade breaks.
You're trusting that the customer doesn't break the system by using some reserved word that breaks your queries too... somewhere.
You've bound yourself to one database brand. The DDL of different databases is different. Database types are the easiest example of this. varchar2 vs text and the like. Your code to add the column would work on MySQL but not Postgres or Oracle or SQL Server.
Do you trust the customer to actually add the data well? Sure, the EAV is far from ideal but now you've got some horrendous obscure table names that you the developer didn't add, with the wrong type of index (if any), with no constraints added in the code where there need to be and so on.
You've given schema modification privileges to the user running the application. Little Bobby Drop Tables isn't possible when you're restricted to SQL rather than DDL (sure you can do a delete * from students instead, but you can't really mess up the database in bad ways). The number of things that can go wrong with schema access either from an accident or malicious activity skyrockets.

This really boils down to "don't do it." If you really want this, go with a known pattern of the EAV table structure or a database that is entirely dedicated to this structure. Don't let people create arbitrary fields in a table. The headaches just aren't worth it.

Best Answer

Related Solutions

Database Design – Best Practices for Tagging System

Practical thesql schema advice for eCommerce store – Products & Attributes

Related Topic