Database – Big amount of text: database vs file

databasefile-storageMySQLperformance

I have a web application with medium-high traffic where users can save designs made by fabricjs, so each saved design is a json with many parameters.

So basically two days after launch users have stored almost 300 rows of designs, PhpMyAdmin says a total of 31 MB of size.

Each json is stored as a longtext data type so I'm concerned about the increase size of the database with time.

I've checked the COMPRESS MySQL function and made some tests about it and the conclusion is that the compression ratio is about 30 – 40%, so it may be a bit low for what I want to achieve.

Another thing to know is that the saving process is automatically made by the application (this is a must by requirement), so when user downloads the design as a PNG file, its json automatically saved. All the saved designs can be continued on any other moment by user.

My goal is try to minimize space consuming in my server, is saving as files filesystem a valid alternative to minimize MySQL size (so don't penalize queries performance) or it would penalize server's CPU load times with read & write disk operations?

Best Answer

This is exactly what premature optimization is about. Do the math. 31 MB for two days means 5.5 GB per year, which is absolutely nothing. Don't you have more important stuff to worry about, like, I don't know, the indexes?

This doesn't mean you shouldn't try to minimize disk usage when it's easy to do. You had an excellent idea to enable compression: in one click, you gained 35%, meaning that from 5.5 GB per year, you're now at 3.6 GB/year. Not bad. Well, unless compression is now killing your CPU, meaning that you made your app much slower, and your electricity bill much higher.

However, other optimizations may not be that successful. The choice between storing on disk vs. storing in database, or the choice between basic compression and some custom, more powerful but also more CPU-intensive compression, should be done empirically, based on actual data. Not by asking a question to strangers on a Q&A website and expecting them to guess what's best for you.

Stick with the solution which is easy and straightforward to implement. When that part of the application will actually become the bottleneck, come back and redesign this part, by comparing the different implementations specific to your case. For instance, with a bit of background, you may discover that implementing JSON differential storage would remove the bottleneck. Or you may find a different solution from the observation of the actual usage of your app by the users.

Note that while postponing the decision of location where data will actually be stored, you'll also be encouraged to design your app in a way that future changes will be easier to perform. If you expect to be able to move the data from the database to flat files, you'll probably also be able to migrate more easily to, say, Amazon S3.

Related Topic