Database – Why not use SQL instead of GraphQL

apiArchitecturedatabaseweb servicesweb-development

Recently I learned about GraphQL which claims to be superior to RESTful. However, I started wondering why don't we simply put SQL statements into an HTTP GET request.

For example, in GraphQL I would write

{
  Movie(id: "cixos5gtq0ogi0126tvekxo27") {
    id
    title
    actors {
      name
    }
  }
}

Which isn't much simpler than its SQL counterpart

SELECT id, title FROM movies 
  WHERE id = cixos5gtq0ogi0126tvekxo27;

SELECT actors.name FROM actors, actors_movies 
  WHERE actors.id == movies.actor_id 
  AND movie.id == cixos5gtq0ogi0126tvekxo27;

Maybe we can URL-encode the query and send to the server

GET endpoint?q=SELECT%20id%2C%20title%20FROM%20movies%20WHERE%20id%20%3D%20cixos5gtq0ogi0126tvekxo27%3B%0ASELECT%20actors.name%20FROM%20actors%2C%20actors_movies%20WHERE%20actors.id%20%3D%3D%20movies.actor_id%20AND%20movie.id%20%3D%3D%20cixos5gtq0ogi0126tvekxo27%3B HTTP/1.1

Yes, the query URL can be too long, but you can put it into the body of a POST request if you don't care about REST compliance. (By the way, I think the HTTP RFC need be revised for REST to make sense: capping the length of query strings mixes implementation with specification at the very beginning)

Directly issuing SQL from the client also has the advantage of

  1. No server-side code/library is required to parse GraphQL, reducing development time.
  2. No server-side overhead is needed to parse GraphQL, reducing runtime.
  3. SQL statements are much more flexible than GraphQL because (in most cases) the latter will reduce to SQL anyway.
  4. Everyone knows SQL.

So, what the advantages GraphQL have over SQL?

Best Answer

Basically, abstraction.

SQL requires your clients to know your exact database structure, which is not good. On top of that, analysing the SQL in order to perform special operations based on the value sent as the input is a really difficult thing to do. There are entire softwares which are pretty much responsible only for that. Do you know what those are? If you have guessed the databases, you are right.

Thanks to not exposing the SQL directly, you are not limiting the consumer of the API to the internal representation of your database. You easily expose only what you want to expose.

And since clients of the API depend only on the abstraction, you are free to have as many layers as possible between the API input and the actual database (security, caching, loading data from multiple databases on a single request,...).

For public services, exposing a database directly is pretty much never the right approach. If you however have a few internal systems, sure, your approach might make sense but even then it might just be easier to connect to application A's database directly from Application B by giving the database credentials to the Application B, rather than trying to come up with a custom HTTP interface for the database SQL language.


Why can't I just compare the URL (or SQL query) against keys in Redis before performing the actual query on the RDBMS?

Because it's not easy. Even if someone uses a very simple query, such as:

SELECT st.id, jt.name
FROM some_table st
INNER JOIN join_table jt ON jt.some_table_id = st.id
WHERE st.name = 'hello
world' AND st.type = 'STANDARD'

how do you make sure the result is properly cached? This query includes newlines, but someone could just as well write the query in the following way:

SELECT st.id, jt.name FROM some_table st INNER JOIN join_table jt ON jt.some_table_id = st.id WHERE st.name = 'hello
world' AND st.type = 'STANDARD'

and it's still supposed to be cached in the same way as the one above. I have specifically included a where in which a string search contains a new line, so simply finding line endings and replacing them with a space is not going to work here, parsing the request correctly would be much more complicated.

And even if you do fix that, another query could switch the order of conditions and the query would look like this:

SELECT st.id, jt.name
FROM some_table st
INNER JOIN join_table jt ON jt.some_table_id = st.id
WHERE st.type = 'STANDARD' AND st.name = 'hello
world'

and another request could contain a redundant WHERE argument, like this:

SELECT st.id, jt.name
FROM some_table st
INNER JOIN join_table jt ON jt.some_table_id = st.id
WHERE st.type = 'STANDARD' AND st.name = 'hello
world' AND st.stype = 'STANDARD'

All of those queries are still supposed to return the same result, should be cached in the same way. But handling all of the possible options is pretty much impossible. That's why you cannot simply compare the URL against keys in Redis.

Related Topic