Database API – Why Clients Can’t Query the DB Directly Using SQL

apidatabaserestsql

Given that a lot of modern web applications just consume JSON data from the sever, what is the point in having a REST api at all instead of just using SQL's query language?

Nowadays people are starting to use GraphQL. From what I understand is a query language from the client to the sever, which then gets translated into queries from the sever to the database.

I get that you don't want to let clients execute arbitrary code on your db. But why can't you just do something like execute the queries through a heavily restricted database user, with some rate-limiting and timeouts to avoid spammy requests?

Best Answer

It is possible but not generally wise to offer direct SQL access to a database.

  • I have to make the database publicly reachable. This alone would make me nervous, as it increases the attack surface and possibility of data loss.

Normal configurations would keep the DB on an intranet and a webserver in a DMZ. The webserver can then be whitelisted to connect to the DB.

  • I have to prevent unauthorized access or changes to data and ensure a consistent data model. No user should be able to corrupt my database. Because different users have access to different records, I would need to create one database user account per end user. I may have to develop a system of database views and stored procedures to make the data model fit this permission model.

    A normal web app checks permission in the web application code on the server, and therefore doesn't have to restrict itself to the database's permission model. This allows permission models on a more suitable granularity, e.g. problem domain concepts instead of database tables. Such a scheme can be much simpler, and therefore more secure.

  • I cannot restrict the specific SQL queries that will be submitted. The queries may be written suboptimally, thus hogging performance. Unless the database provides explicit support, I cannot add caching or rate limiting.

    In contrast, you can tune the queries that you use in a webserver, and use numerous caching technologies.

  • When users connect directly to the DB, the database has become your application platform. This represents a substantial lock-in into a particular technology and into a particular data model. Scaling a database server is also more difficult than scaling application servers.

    In contrast, most web applications get some amount of scalability by default just from running the database and any application logic on different servers.

If you keep all of that in mind, it is possible to create database-centric applications. This can be a good approach e.g. for intranet applications where you can trust all users. But due to the quite substantial problems this is not generally advisable.

GraphQL is weird because this turns your webserver into half a database server. However, you get the possibility to implement custom security models, implement caching, and offer a more convenient API than raw SQL.

“Real” REST APIs (i.e. not GraphQL) are conceptually very straightforward, and therefore make it very easy to implement security checks (probably one or two if-conditions per endpoint), and are very easy to cache. Their drawback (which GraphQL addresses) is that you often get more data than you need, and only get a single resource per request.

Related Topic