Is Storing a List of Strings in a Single Database Field a Bad Idea?

data structuresdatabasedatabase-designrelational-database

Recently, I started to work on some legacy system. People that developed it, came to an idea to store list of strings in single field of database table. Lets say that it is an identifier for object that does not have any representation nor data in database. The range of that identifiers will be relatively small in production.

On the other hand, my intuitions and "good design taste" tells me that it should be represented in separate table (similar to a table used for representing many-to-many relations).

Is their approach really bad and it would be better to start a refactoring? If yes, what bad consequences the original design can cause in future? Are there any relational design principles that explain that approach?

Edit to response for comments:

As I suppose, they didn't use this approach to solve some specific problem like hierarchical structuring in a tricky way. The most probable scenario was the case that they were simply working under time pressure and needed to implement new features as quick as possible.

I am sure that previously the field represented single value. They were going to implement feature to store more then one value and tried to avoid database migrations.

Best Answer

The datamodel isn't normalised; to be so it would need a separate table as you say. In that regard, it's not particularly good datamodelling practice.

Whether it was done for a good reason or not is difficult to determine. Conceivably, coding simplification or performance may have been motivators. As likely is that the field originally contained one identifier, requirements changed and the devs didn't have time or inclination to re-factor.

Probably more important is whether or not you should refactor yourself. In similar circumstances I would not pre-emptively refactor a case like this by default. I would consider it if one of the following applied:

  1. you have evidence that this causes problems e.g. from legacy issue logs
  2. you know for a fact that that you'll be making functional changes in that area
  3. the code that handles the data is particularly complex and difficult to reason about.

What I would do, and TBH I would recommend this whenever you take over a legacy application, is start a wiki (or equivalent) and document cases like this. For example,

  • issues you've found such as the datamodelling wrinkle
  • changes you plan to implement
  • changes you don't plan to implement but would if there were time
  • areas of code that are difficult to reason about
  • areas of code that you've found hard to maintain.

I've found that this is a useful aide memoire for me as I work in and/or come back to a codebase. It also can be very helpful to your successor when they, in turn, need to start learning the codebase.

Related Topic