Option #2, using reference tables, is the standard way of doing it. It has been used by millions of programmers, and is known to work. It is a pattern, so anyone else looking at your stuff will immediately know what is going on. There exist libraries and tools that work on databases, saving you from lots and lots of work, that will handle it correctly. The benefits of using it are innumerable.
Is it wasteful? Yes, but only slightly. Any half-decent database will always keep such frequently joined small tables cached, so the waste is generally imperceptible.
All other options that you described are ad hoc and hacky, including MySQL's enum
, because it is not part of the SQL standard. (Other than that, what sucks with enum
is MySQL's implementation, not the idea itself. I would not mind seeing it one day as part of the standard.)
Your final option #3 with using a plain integer is especially hacky. You get the worst of all worlds: no referential integrity, no named values, no definitive knowledge within the database of what a value stands for, just arbitrary integers thrown all over the place. By this token, you might as well quit using constants in your code, and start using hard-coded values instead. circumference = radius * 6.28318530718;
. How about that?
I think you should re-examine why you find reference tables onerous. Nobody else finds them onerous, as far as I know. Could it be that it is because you are not using the right tools for the job?
Your sentence about having to "encode things and deal with integers", or having to "create elaborate programming constructs", or "creating new object oriented entities on the programming side", tells me that perhaps you may be attempting to do object-relational mapping (ORM) on the fly dispersed throughout the code of your application, or in the best case you may be trying to roll your own object-relational mapping mechanism, instead of using an existing ORM tool for the job, such as Hibernate. All these things are a breeze with Hibernate. It takes a little while to learn it, but once you have learned it, you can really focus on developing your application and forget about the nitty gritty mechanics of how to represent stuff on the database.
Finally, if you want to make your life easier when working directly with the database, there are at least two things that you can do, that I can think of right now:
Create views that join your main tables with whatever reference tables they reference, so that each row contains not only the reference ids, but also the corresponding names.
Instead of using an integer id for the reference table, use a CHAR(4) column, with 4-letter abbreviations. So, the ids of your categories would become "TEST", "DSGN", "PROG", "OTHR". (Their descriptions would remain proper English words, of course.) It will be a bit slower, but trust me, nobody will notice.
Finally, when there are only two types, most people just use a boolean column. So, that "standard/exception" column would be implemented as a boolean and it would be called "IsException".
Enums are intended for use cases when you have literally enumerated every possible value a variable could take. Ever. Think use cases like days of the week or months of the year or config values of a hardware register. Things that are both highly stable and representable by a simple value.
Keep in mind, if you're making an anti-corruption layer, you can't avoid having a switch statement somewhere, because of the design you're wrapping, but if you do it right you can limit it to that one place and use polymorphism elsewhere.
Best Answer
I can see the redundancy in your enum, this should be two separate columns, each of which can be an enum:
enum for col1:
enum for col2:
You can even add a check constraint to ensure sane values between the two enum columns (assuming you're using a RDBMS that supports check constraints - hello MySQL... you listening?)
Let the database enforce all the rules between data. With the rules well defined at the lowest level, the rest of the system gains this assurance.