Flat Files vs Database/API – Transport Between Frontend and Backend

databasefile handlingmessage-queueprogramming practices

I've got an application which has generated a rather heated discussion between a couple of the developers.

Basically, it's split into a web layer and a backend layer. The web layer collects information by a simple web form, stashes this data as a JSON document (literally a .json file) into a watch folder used by the back end. The back end polls this folder every few seconds, picks the file up, and carries out its functions.

The files themselves are very simple (i.e. all string data, no nesting), and around 1-2k at their largest, with the system spending most of its time idle (but bursting up to 100 messages at any given time). The backend processing step takes about 10 minutes per message.

The argument comes in when one developer suggests that using the filesystem as a messaging layer is a bad solution, when something such as a relational database (MySQL), noSQL database (Redis), or even a plain REST API call should be used instead.

It should be noted that Redis is used elsewhere in the organization for queued message handling.

The arguments I've heard break down as follows


In favor of flat files:

  • Flat files are more reliable than any other solution, since the file only gets moved from a "watch" folder, to a "processing" folder after it's picked up, and finally to a "done" folder when finished. There's zero risk of messages disappearing barring very low level bugs which would break other things anyways.

  • Flat files require less technical sophistication to understand – just cat it. No queries to write, no risk of accidentally popping a message off the queue and having it be gone forever.

  • File management code is simpler than database APIs from a programming standpoint, since it's part of every language's standard library. This reduces the overall complexity of the code base and the amount of third party code that must be brought in.

  • The YAGNI principle states that flat files work just fine right now, there's no demonstrated need for changing to a more complicated solution, so leave it.

In Favor of a database:

  • It's easier to scale a database than a directory full of files

  • Flat files have a risk of someone copying a "done" file back to the "watch" directory. Due to the nature of this application (virtual machine management), this could result in catastrophic data loss.

  • Requiring more technical sophistication to T/S the app means that uneducated staff are less likely to screw something up by just poking at things.

  • DB connection code, especially for something like Redis, is at least as robust as the standard library file management functions.

  • DB connection code is visibly (if not functionally) simpler from a developer standpoint, since its higher level than file manipulation.


From what I can see, both developers have a lot of valid points.

So of these two people, the pro-files dev, or the pro-databases dev, which one is more in line with software engineering best practice, and why?

Best Answer

Switching to a solution involving databases or the queuing systems mentioned by Ewan would

  • create dependency on a new, complex system in both backend and frontend
  • introduce unnecessary complexity and a sh*tload of new points of failure
  • increase cost (including cost of ownership)

Moving/renaming files within a single volume is guaranteed to be atomic on all current OSes, whatever their difficulties might be with regard to things like file/record locking. OS-level rights management should be sufficient for locking out the unwashed and to prevent thoughtless/accidental mis-manipulation by authorised operators (admins/devs). Hence databases have nothing to offer at all, as long as the performance of the current solution is up to snuff.

At our company we have used similar file-based interfaces for decades with great success. Lots of other things have come and gone, but these interfaces have remained because of their utter simplicity, reliability and minimal coupling/dependencies.

Related Topic