User Input Validation – How to Handle Invalid Input

user-interactionvalidation

I have been thinking about this issue for a while and I would be curious to have opinions from other developers.

I tend to have a very defensive style of programming. My typical block or method looks like this:

T foo(par1, par2, par3, ...)
{
    // Check that all parameters are correct, return undefined (null)
    // or throw exception if this is not the case.

    // Compute and (possibly) return result.
}

Also, during the computation, I check all pointers before dereferencing them. My idea is that, if there is some bug and some NULL pointer should appear somewhere, my program should handle this nicely and simply refuse to continue the computation. Of course it can notify of the problem with an error message in the log or some other mechanism.

To put it in a more abstract way, my approach is

if all input is OK --> compute result
else               --> do not compute result, notify problem

Other developers, among them some colleagues of mine, use another strategy. E.g., they do not check pointers. They assume that a piece of code should be given correct input and it should not be responsible for what happens if the input is wrong. Also, if a NULL pointer exception crashes the program, a bug will be found more easily during testing and have more chances of being fixed.

My answer to this is normally: but what if the bug is not found during testing and appears when the product is already being used by the customer? What is a preferred way for the bug to manifest itself? Should it be a program that does not perform a certain action, but can still continue to work, or a program that crashes and needs to be restarted?

Summarizing

Which of the two approaches to handling wrong input would you advise?

Inconsistent input --> no action + notification

Inconsistent input --> undefined behaviour or crash

Edit

Thanks for the answers and suggestions.
I am a fan of design by contract too. But even if I trust the person who has written the code calling my methods (maybe it is myself), there can still be bugs, leading to wrong input. So my approach is to never assume a method is passed correct input.

Also, I would use a mechanism to catch the problem and notify about it. On a development system, it would e.g. open a dialog to notify the user. In a production system it would just write some information to the log. I do not think that extra checks can lead to performance problems. I am not sure if assertions are enough, if they are switched off in a production system: maybe some situation will occur in production that had not occured during testing.

Anyway, I was really surprised that many people follow the opposite approach: they let the application crash "on-purpose" because they maintain that this will make it easier to find bugs during testing.

Best Answer

You've got it right. Be paranoid. Don't trust other code, even if it's your own code. You forget things, you make changes, code evolves. Don't trust outside code.

A good point was made above: what if the inputs are invalid but the program does not crash? Then you get garbage in the database and errors down the line.

When asked for a number (e.g. price in dollars or number of units) I like to enter "1e9" and see what the code does. It can happen.

Four decades ago, getting my B.S. in Computer Science from U.C.Berkeley, we were told that a good program is 50% error handling. Be paranoid.

Related Solutions

C Programming – When to Check Pointers for NULL

Invalid null pointers can either be caused by programmer error or by runtime error. Runtime errors are something a programmer can't fix, like a malloc failing due to low memory or the network dropping a packet or the user entering something stupid. Programmer errors are caused by a programmer using the function incorrectly.

The general rule of thumb I've seen is that runtime errors should always be checked, but programmer errors don't have to be checked every time. Let's say some idiot programmer directly called graph_get_current_column_color(0). It will segfault the first time it's called, but once you fix it, the fix is compiled in permanently. No need to check every single time it's run.

Sometimes, especially in third party libraries, you'll see an assert to check for the programmer errors instead of an if statement. That allows you to compile in the checks during development, and leave them out in production code. I've also occasionally seen gratuitous checks where the source of the potential programmer error is far removed from the symptom.

Obviously, you can always find someone more pedantic, but most C programmers I know favor less cluttered code over code that is marginally safer. And "safer" is a subjective term. A blatant segfault during development is preferable to a subtle corruption error in the field.

Coding Standards – Validation of Input Parameter in Caller: Code Duplication?

It depends. Deciding where to put validation should be based on the description and strength of the contract implied (or documented) by the method. Validation is a good way to bolster adherence to a specific contract. If for whatever reason the method has a very strict contract, then yes, it is up to you to check before calling.

This is an especially important concept when you create a public method, because you are basically advertising that some method performs some operation. It better do what you say it does!

Take the following method as an example:

public void DeletePerson(Person p)
{            
    _database.Delete(p);
}

What is the contract implied by DeletePerson? The programmer can only assume that if any Person is passed in, it will be deleted. However, we know that this isn't always true. What if p is a null value? What if p doesn't exist in the database? What if the database is disconnected? Therefore, DeletePerson does not appear to fulfill its contract well. Sometimes, it deletes a person, and sometimes it throws a NullReferenceException, or a DatabaseNotConnectedException, or sometimes it does nothing (such as if the person is already deleted).

APIs like this are notoriously difficult to use, because when you call this "black box" of a method, all sorts of terrible things can happen.

Here are a couple of ways you can improve the contract:

Add validation and add an exception to the contract. This makes the contract stronger, but requires that the caller perform validation. The difference, however, is that now they know their requirements. In this case I communicate this with a C# XML comment, but you could instead add a throws (Java), use an Assert, or use a contract tool like Code Contracts.
```
///<exception>ArgumentNullException</exception>
///<exception>ArgumentException</exception>
public void DeletePerson(Person p)
{            
    if(p == null)
        throw new ArgumentNullException("p");
    if(!_database.Contains(p))
        throw new ArgumentException("The Person specified is not in the database.");

    _database.Delete(p);
}
```
_{Side note: The argument against this style is often that it causes excessive pre-validation by all calling code, but in my experience this is often not the case. Think of a scenario where you are trying to delete a null Person. How did that happen? Where did the null Person come from? If this is a UI, for example, why was the Delete key handled if there is no current selection? If it were already deleted, shouldn't it have been removed from the display already? Obviously there are exceptions to this, but as a project grows you will often thank code like this for preventing bugs to permeate deep into the system.}
Add validation and code defensively. This makes the contract looser, because now this method does more than just deletes the person. I changed the method name to reflect this, but might not be necessary if you are consistent in your API. This approach has its pros and cons. The pro being that that you can now call TryDeletePerson passing in all sorts of invalid input and never worry about exceptions. The con, of course, is that users of your code will probably call this method too much, or it might make debugging difficult in cases where p is null. This could be considered a mild violation of the Single Responsibility Principle, so keep that mind if a flame war erupts.
```
public void TryDeletePerson(Person p)
{            
    if(p == null || !_database.Contains(p))
        return;

    _database.Delete(p);
}
```

Combine approaches. Sometimes you want a little of both, where you want external callers to follow the rules closely (to force them to code responsible), but you want your private code to be flexible.

///<exception>ArgumentNullException</exception>
///<exception>ArgumentException</exception>
public void DeletePerson(Person p)
{            
    if(p == null)
        throw new ArgumentNullException("p");
    if(!_database.Contains(p))
        throw new ArgumentException("The Person specified is not in the database.");

    TryDeletePerson(p);
}

internal void TryDeletePerson(Person p)
{            
    if(p == null || !_database.Contains(p))
        return;

    _database.Delete(p);
}

In my experience, concentrating on the contracts you implied rather than a hard rule works best. Defensive coding appears to work better in cases where it's hard or difficult for the caller to determine whether an operation is valid. Strict contracts appear to work better where you expect the caller to only make method calls when they really, really make sense.

Best Answer

Related Solutions

C Programming – When to Check Pointers for NULL

Coding Standards – Validation of Input Parameter in Caller: Code Duplication?

Related Topic