Why not just make non-parameterized queries return an error

rdbmsSecuritysql

SQL injection is a very serious security issue, in large part because it's so easy to get it wrong: the obvious, intuitive way to build a query incorporating user input leaves you vulnerable, and the Right Way to mitigate it requires you to know about parameterized queries and SQL injection first.

Seems to me that the obvious way to fix this would be to shut down the obvious (but wrong) option: fix the database engine so that any query received that uses hard-coded values in its WHERE clause instead of parameters returns a nice, descriptive error message instructing you to use parameters instead. This would obviously need to have an opt-out option so that stuff like ad-hoc queries from administrative tools will still run easily, but it should be enabled by default.

Having this would shut down SQL injection cold, almost overnight, but as far as I know, no RDBMS actually does this. Is there any good reason why not?

Best Answer

There are too many cases where using a literal is the right approach.

From a performance standpoint, there are times that you want literals in your queries. Imagine I have a bug tracker where once it gets big enough to worry about performance I expect that 70% of the bugs in the system will be "closed", 20% will be "open", 5% will be "active" and 5% will be in some other status. I may reasonably want to have the query that returns all active bugs to be

SELECT *
  FROM bug
 WHERE status = 'active'

rather than passing the status as a bind variable. I want a different query plan depending on the value passed in for status-- I'd want to do a table scan to return the closed bugs and an index scan on the status column to return the active loans. Now, different databases and different versions have different approaches to (more or less successfully) allow the same query to use a different query plan depending on the value of the bind variable. But that tends to introduce a decent amount of complexity that needs to be managed to balance out the decision of whether to bother re-parsing a query or whether to reuse an existing plan for a new bind variable value. For a developer, it may make sense to deal with this complexity. Or it may make sense to force a different path when I have more information about what my data is going to look like than the optimizer does.

From a code complexity standpoint, there are also plenty of times that it makes perfect sense to have literals in SQL statements. For example, if you have a zip_code column that has a 5 character zip code and sometimes has an additional 4 digits, it makes perfect sense to do something like

SELECT substr( zip_code, 1, 5 ) zip,
       substr( zip_code, 7, 4 ) plus_four

rather than passing in 4 separate parameters for the numeric values. These aren't things that will ever change so making them bind variables only serves to make the code potentially more difficult to read and to create the potential that someone will bind parameters in the wrong order and end up with a bug.

Related Topic