Database Design – Is a ‘Record Status’ Column Bad Practice?

database-design

I have to clarify first that the status column is not intended to reflect the status of a real-world item represented by the record (row) in the table. Rather, it is intended to show the status of the record itself.

It can be as simple as Active/Inactive or complicated like Approved/Deleted/Locked/Pending/Rejected, etc. The status can be stored on a boolean/short integer column or a single-character column, with mappings like true/1 = Active or A = Approved.

The basic idea is to have a recycle bin/trash-like recovery support in the application (and simulate it in the database). If there is a front-end GUI or other interface that can supposedly let a user "delete" records, it doesn't actually delete the record in the table, but simply changes the record status to Inactive or Deleted. When the interface fetches records, it always gets the records that only match the condition that the status is Active or Approved.

If the user makes a mistake and the "deleted" record (in user's perspective) needs to be recovered, a DBA can easily patch the record back to being Active or Approved, which would be better than searching for backups and hopefully finding the original record there. Or the interface itself can let the user view deleted records in a separate view, and restore them as needed, or even permanently delete them (deleting the actual record).

My questions:

Is this a good practice, or a bad practice?
Does it affect normalization of the data?
What are the potential pitfalls?
Is there any alternative method of achieving the same goal? (see note)
How can you have the database enforce unique constraints on the data for a certain status only (but allow any number of duplicates for other statuses)?
Why don't databases provide a "recycle bin"-like feature or table-tracking/recovery natively, so we can let interfaces delete the actual records without worry?

Note: I read about maintaining a separate history table but that seems worse in terms of storage and having to generate triggers and keep the triggers up-to-date with the tracked table's schema.

Best Answer

I know this as a "Soft Delete"; just marking a record as "deleted", even though it really isn't.

Is this a good practice, or a bad practice?

It depends.
If this is something that your users need [a lot] then it's probably a good thing. In the vast majority of cases, though, I would argue that it's adding [a lot of] overhead for little benefit.

Does it affect normalization of the data?

No, but it will affect your Indexing of that data.
Ensure that you include the "deleted" column in your indexes, so that these rows get excluded as early as possible in your queries.

What are the potential pitfalls?

Your data becomes a little more complex. Everything that goes anywhere near the data needs to "know" about these extra, "not-really-there" records. Or, you have to create Views on those tables that exclude these rows and use these views in, say, your Reporting Tool of Choice.

Your database may increase in size. If you're not really deleting these rows then they're still there, taking up space. This may or may not be an issue, especially since you've included them in your indexes, so the space they consume is multiplied up.

Is there any alternative method of achieving the same goal? (see note)

Not really, no.

How can you have the database enforce unique constraints on the data for a certain status only (but allow any number of duplicates for other statuses)?

Not easily. Declarative Referential Integrity (foreign key clauses) is the cleanest way to implement this and its easy for things like Reporting tools to pick up on these rules to determine the relationships between tables. Such rules apply to all records, regardless of "status" (and there's no way around that).

The alternative is to use Triggers, snippets of procedural code that enforce the referential integrity between tables and do all the clever, conditional stuff that you need. That's good for your particular case, but most of the benefits of Declarative R.I. go out of the window - there's no [externally] detectable relationships between your tables; that's all "hidden" in the triggers.

Why don't databases provide a "recycle bin"-like feature or table-tracking/recovery natively, so we can let interfaces delete the actual records without worry?

Why would they?

These are databases, after all, not file systems or spreadsheets.

What they do, they [can] do very, very well.

What they don't do, there probably hasn't been much demand for.

Best Answer

Related Solutions

How to store record statuses (like pending, complete, draft, cancelled…)

Database Design – Designing a System to Archive Databases with Referential Integrity

Related Topic