Programming Practices – Handling Indeterminate Boolean Values

database-designprogramming practices

We're building a web application for company, which administration existed only in Excel sheets so far. We're almost done by now, but recently I was assigned a task to import all their data from those sheets to our new system. The system is built in Java, but as this import is just one-time thing I decided to write the scripts in Python instead and import it directly with SQL queries. Here comes the problem. The new data models contains some new attributes, which aren't included in their existing data. In most cases, this isn't a problem, I just put a null where I can't find the information. But then I ran into a few attributes, which are booleans and cannot be NULL by default. First I tried to just allow null for those fields in our database, but my senior dev told me to not do it, as it would cause a issues in our system in the future. And now I'm not quite sure what to do. Obvious solution is to default every unknown boolean value to false, but I think that is wrong too, because I actually don't know, whether it is false.

Example: Let's say you have a entity Car which has a hasRadio parameter. Now you need to import data to this data model, but in data there are only columns "Model" and "Color", nothing about it having or not having radio. What do you put in a "hasRadio" column, if it cannot be null by design?

What is the best approach in this situation? Should we just tell the company to manually fill in the missing data? Or default it to false?

Best Answer

This is mainly a requirements analysis problem, and it has nothing to do with the fact the data in stake is "boolean". If you have to initialize tables in a database, or in any other kind of data storage, and you have incomplete input for some columns, you first need to find out what the users of the system or your customer think would be the right default value for those columns, and you need to find this out for every single attribute, there is no generally correct answer.

This will typically lead to one of the following cases:

  • there is a good default value for the specific column, users don't mind if the value is initially the same for all records, they can set the correct values easily afterwards when needed

  • there is a rule how to determine the ideal default value from other information, so you can put this rule into code

  • the users or your customer will extend the input data and provide the missing values (maybe manually), before it gets imported into the database

  • there is no good default value for the specific column and/or any record, the data should be imported either, but the users want to know for which of the records the particular value is already initialized and for which not. So they can enter the value afterwards, and track for which records the value is already correctly set and for which not.

The last case requires something like NULL to represent the uninitialized or unknown state, even for a boolean value, if your senior likes it or not. If there is some obscure technical reason which forbids the use of a NULL value for a specific column, you need to simulate the "unknown" state in a different way, either by introducing an additional boolean column (like hasRadioIsUnknown), or by using a 3-valued enumeration instead of a boolean (like HasNoRadio=0,HasRadio=1, Unknown=2). But speak to your senior again, after you made a thorough requirements analysis, to make sure such a workaround is really necessary.

Related Topic