I've always validated my user input based on a list of valid/allowed characters, rather than a list of invalid/disallowed characters (or simply no validation). It's just a habit I picked up, probably on this site and I've never really questioned it until now.
It makes sense if you wish to, say, validate a phone number, or validate an area code, however recently I've realised I'm also validating input such as Bio Text fields, User Comments, etc. for which the input has no solid syntax.
The main advantage has always seemed to be: Validating allowed chars reduces the risk of you missing a potentially malicious character, but increases the risk the of you not allowing a character which the user may want to use. The former is more important.
But, providing I am correctly preventing SQL Injection (with prepared statements) and also escaping output, is there any need for this extra barrier of protection? It seems to me as if I am just allowing practically every character on the keyboard, and am forgetting to allow some common characters.
Is there an accepted practice for this situation? Or am I missing something obvious?
Thanks.
Best Answer
Any implementation trying to detect "malicious characters" is flawed, when you look at the combined properties of such an implementation:
I'd go so far as to say that validating allowed characters reduces security, because it encourages sloppy implementation (lack of testing/escaping). If you escape where necessary you can instead just test the "nasty" characters, and if they work, you have pretty much guaranteed that other nasty characters will also be harmless to the system.
All this is of course not to say that some characters are nonsensical in some fields, such as
two
in a numeric field. But even this is often not trivial:1,000 == 1
in much of Europe, and1'000
is a valid way to write 1000 in some places. You don't want to tell those users their way of writing is wrong.(0)
in Switzerland) which you have to use within the country, and have to exclude when using a language code.Gerard 't Hooft
), numbers (John Doe the 5th
), punctuation (John Doe, M.D.
), and SQL: