Database Design – Benefits of Predefined Data Types in Relational Databases

data typesdatabaserelational-databasesql

I'm working with an SQL database right now, and this has always made me curious, but Google searches don't turn much up: Why the strict data types?

I understand why you'd have a few different data types, for example like how differentiating between binary and plain text data is important. Rather than storing the 1s and 0s of binary data as plaintext, I now understand that it's more efficient to store the binary data as its own format.

But what I don't understand is what the benefit is of having so many different data types:

  • Why mediumtext, longtext, and text?
  • Why decimal, float, and int?
  • etc.

What is the benefit of telling the database "There'll only be 256 bytes of plain text data in entries to this column." or "This column can have text entries of up to 16,777,215 bytes"?

Is it a performance benefit? If so, why does knowing the size of the entry before hand help performance? Or rather is it something else altogether?

Best Answer

SQL is a statically-typed language. This means you have to know what type a variable (or field, in this case) is before you can use it. This is the opposite of dynamically-typed languages, where that is not necessarily the case.

At its core, SQL is designed to define data (DDL) and access data (DML) in a relational database engine. Static typing presents several benefits over dynamic typing to this type of system.

  • Indexes, used for quickly accessing specific records, work really well when the size is fixed. Consider a query that utilizes an index, possibly with multiple fields: if the data types and sizes are known ahead of time, I can very quickly compare my predicate (WHERE clause or JOIN criteria) against values in the index and find the desired records faster.

  • Consider two integer values. In a dynamic type system, they may be of variable size (think Java BigInteger, or Python's built-in arbitrary-precision integers). If I want to compare the integers, I need to know their bit length first. This is an aspect of integer comparison that is largely hidden by modern languages, but is very real at the CPU level. If the sizes are fixed and known ahead of time, an entire step is removed from the process. Again, databases are supposed to be able to process zillions of transactions as quickly as possible. Speed is king.

  • SQL was designed back in the 1970s. In the earlier days of microcomputing, memory was at a premium. Limiting data helped keep storage requirements in check. If an integer never grows past one byte, why allocate more storage for it? That is wasted space in the era of limited memory. Even in modern times, those extra wasted bytes can add up and kill the performance of a CPU's cache. Remember, these are database engines that may be servicing hundreds of transactions per second, not just your little development environment.

  • Along the lines of limited storage, it is helpful to be able to fit a single record in a single page in memory. Once you go over one page, there are more page misses and more slow memory access. Newer engines have optimizations to make this less of an issue, but it is still there. By sizing data appropriately, you can mitigate this risk.

  • Moreso in modern times, SQL is used to plug in to other languages via ORM or ODBC or some other layer. Some of these languages have rules about requiring strong, static types. It is best to conform to the more strict requirements, as dynamically-typed languages can deal with static types easier than the other way around.

  • SQL supports static typing because database engines need it for performance, as shown above.

It is interesting to note that there are implementations of SQL that are not strongly-typed. SQLite is probably the most popular example of such a relational database engine. Then again, it is designed for single-threaded use on a single system, so the performance concerns may not be as pronounced as in e.g. an enterprise Oracle database servicing millions of requests per minute.

Related Topic