Best Practices for Name Validation

validation

I have a form and I was wondering about best practices when validating names (specifically throwing out characters which do not typically make up a name e.g. 123%^*$£ though theoretically could) and if it's sensible to carry out anything more than checking for presence. I've often read that you shouldn't try validating a name but it got me wondering because surely there's a lot of data we can throw out.

For context a consultant involved in the same project as me has asked if we can validate against silly data appearing in any of the name fields in a form, that is first name, middle names and last name. I think it's relevant to state that I am a UK based developer as I'm sure naming laws will weigh into this question.

An example of an issue we had was that a user accidentally entered the date of birth in a name field – e.g. 16/06/1987. This was technically a valid name on our system but when this data reached an external API it crashed it. This was a mistake and I think it could have prevented with more strict validation.

I found the UK deed poll guidelines which impose restrictions on changing your name or title: http://www.deedpoll.org.uk/AreThereAnyRestrictionsOnNames.html

These specify that punctuation without phonetic significance are not allowed as well (among other things). However note that these are only guidelines – I'm not sure that the actual legal limitations of a British name are.

Would it be sensible to perform name validation based on these guidelines?
Have you ever heard of anyone implementing something like this?

Best Answer

In general it is problematic to try to impose restrictions like this, not just for names but for many types of data. The robustness principle is really the best way to go, in my experience: "Be conservative in what you send, be liberal in what you accept."

I once was filing my taxes through a commonly used US tax application and my employer for part of that year had been sold to a company based in Canada (where postal codes contain letters and numbers.) When I went to enter the postal code, I was informed that the postal code I was trying to enter was invalid because it must only contain numbers. Then it pestered me repeatedly that the postal code was missing. It was really obnoxious and unnecessary.

In your situation, if you have issues with certain types data, you should probably have a exception process after the input has been captured and inform a human to evaluate the situation. Even then, if you happen to get unlucky enough to get someone actually named '2016/09/08' then you are SOL.