SQL Query and Java Constant Abuse

coding-styleconstsql

I am currently charged with taking over a lot of code that is written with fields names placed into Java constant at the top of the file, and then the SQL queries constructed using string concatenation with the field names. At the top of the file is the following block:

private static final String TABLE_VALID_USER = "valid_user_table";
private static final String TABLE_USER_MANAGEMENT = "V1_USER_MANAGEMENT_TABLE";
private static final String TABLE_ROLE_DEFINITION = "V1_ROLE_DEFINITION_TABLE";
private static final String TABLE_ROLE_MANAGEMENT = "V1_ROLE_MANAGEMENT_TABLE";
private static final String TABLE_USER_STATUS_DEF = "V1_USER_STATUS_DEFNITION";

private static final String COLUMN_NAME_OPEN_ID = "hashed_open_id";

private static final String COLUMN_NAME_INTERNAL_USER_ID = "internal_user_id";
private static final String COLUMN_NAME_USER_ID = "hashed_open_id";
private static final String COLUMN_NAME_USER_NAME = "name";
private static final String COLUMN_NAME_ROLE_ID = "role_id";
private static final String COLUMN_NAME_ROLE_NAME = "role_name";
private static final String COLUMN_NAME_USER_STATUS_CODE = "status_code";
private static final String COLUMN_NAME_USER_STATUS_NAME = "status_name";

private static final String DEFAULT_USER_STATUS = "Pending";

private static final String AS_NAME_COUNT = "cnt";

Later in the code, there are SQL queries that are constructed like this using these constants in the string construction.

What you have to remember is that these are very far apart in the code, so to find out what a field name is, you have to scroll up and down all the time. To be able to read the query, requires a lot of mental exercise. I realize for the original programmer this is not a big deal, because the column might be hashed_open_id but that programmer felt that the constant name COLUMN_NAME_USER_ID was a perfectly acceptable replacement for that. The good news is that this is explained to be the user id.

The bad news is that the actual column in the database is named hashed_open_id which is not obvious. When you read the code, you have learn this new, arbitrary symbol COLUMN_NAME_USER_ID and you have to remember that it really means hashed_open_id and you have to do this for every column in the table! It would seem to double the amount of things that you have to learn in order to read the code.

To understand the query, you need to have knowledge of the database. When using the database, you have to have the real column names in mind. The Java constant symbols are not helpful when dealing with the SQL data browser.

String sql = 
    "SELECT "
    + "COUNT(" + COLUMN_NAME_OPEN_ID +") as " + AS_NAME_COUNT + ", " 
    + "FROM "
        + TABLE_VALID_USER + " " 
    + "WHERE "
        + COLUMN_NAME_OPEN_ID + "= ? "
        + ";";

Wouldn't it be better to just use the real table names and the real column names directly? If you did so, the query would be in the code as:

    String sql = 
        "SELECT COUNT(hashed_open_id) as cnt, " 
        + "FROM valid_user_table " 
        + "WHERE hashed_open_id= ? ;";

This latter query is clearly easier to read and understand, especially if you know the actual table structure. You can easily see exactly what the table name is being manipulated, and what columns are involved. Go back to the original, and try looking up the values in the constants to arrive at the same understanding. And you HAVE to know the table structure because it is manipulated by other classes and other programs. Does anyone really believe that using constants as an extra level of symbol redirection in this way is a benefit?

Theoretically, if you decide to change a column name, then you go to one place, and it propagates through the entire code. Hey, that is cool, but how often do you do that? And it only changes this class … there are other classes and programs, some written in different languages, that manipulate the table. One change of the constant will never change everything.

Readability is important. Isn't the ability to read, review, and quickly understand code by the programmers that are going to maintain the code more important than the rare convenience of changing a column? Consider code that lives for 10 years. How many new programmers will have to read this code? How many times will a column name change?

And, imagine that you do need to change a column name from hashed_open_id to standard_user_id is it really that hard to do a search and replace? You probably had to do the same thing in some SQL scripts used to create the table. The constant is not a universal symbol for the column name. The column name is a universal symbol for the column name. Why not use it directly everywhere? The point of a column name is to be a symbol for the data in the column.

Can anyone think of any reason that an extra level of indirection to make Java constants for each column is worth the problems that it causes?

Best Answer

I agree with what you say. Having worked in several companies on a variety of platforms I have not seen the level of abstraction you describe. Occassionally I will write code snippets and construct SQL from them, but couched in application and problem domain terms, not DB design terms. For example
where-cluase-for-employee = "<several lines of SQL>"
whereas
where-clause-for-contractor="<a different complex predicate>".

SQL is a programming language. No one would consider it best practice to construct a Java method call as a string then execute it. Why would this be done for SQL?

Maintainability and ease of change are important, too. I believe that code should be written with consideration for the unfortunate on-call person who has to fix it at 3am, never having seen it before. Make code efficient and compact by all means, but not so much that the writer's the only person who can understand it.

There is a different approach which can abstract table and column names from the Java and keep SQL as a compact block. It can also improve runtimes, depending on the applications workload. Put the SQL into a stored procedure. Good SP naming conventions will say exactly what it does (read_count_by_user for the example in the question), and the contract between application and SQL is defined. If the returned columns change the application must change, too, but this is a problem you face currently. Impact analysis is a text search of the source code. Modularity is improved. Tuning and re-writes can be done on a per-procedure basis.

If the developers' only gripe is with the DB naming convention ask for a set of views to be defined with developer-friendly names for tables and columns. For the simple use cases shown these will have no impact on performance.

Related Topic