Debian – How to store data on a machine whose power gets cut at random

corruptiondebianfilesystems

I have a virtual machine (Debian) running on a physical machine host. The virtual machine acts as a buffer for data that it frequently receives over the local network (the period for this data is 0.5s, so a fairly high throughput). Any data received is stored on the virtual machine and repeatedly forwarded to an external server over UDP. Once the external server acknowledges (over UDP) that it has received a data packet, the original data is deleted from the virtual machine and not sent to the external server again. The internet connection that connects the VM and the external server is unreliable, meaning it could be down for days at a time.

The physical machine that hosts the VM gets its power cut several times per day at random. There is no way to tell when this is about to happen and it is not possible to add a UPS, a battery, or a similar solution to the system.

Originally, the data was stored on a file-based HSQLDB database on the virtual machine. However, the frequent power cuts eventually cause the database script file to become corrupted (not at the file system level, i.e. it is readable, but HSQLDB can't make sense of it), which leads to my question:

How should data be stored in an environment where power cuts can and do happen frequently?

One option I can think of is using flat files, saving each packet of data as a file on the file system. This way if a file is corrupted due to loss of power, it can be ignored and the rest of the data remains intact. This poses a few issues however, mainly related to the amount of data likely being stored on the virtual machine. At 0.5s between each piece of data, 1,728,000 files will be generated in 10 days. This at least means using a file system with an increased number of inodes to store this data (the current file system setup ran out of inodes at ~250,000 messages and 30% disk space used). Also, it is hard (not impossible) to manage.

Are there any other options? Are there database engines that run on Debian that would not get corrupted by power cuts? Also, what file system should be used for this? ext3 is what is used at the moment.

The software that runs on the virtual machine is written using Java 6, so hopefully the solution would not be incompatible.

Best Answer

Honestly your best approach here is to either fix the power-cuts, or deploy a different system in a better location.

Yes there are systems such as redis which will store data in an append-only-log for replay, but you risk corruption at lower levels - e.g. if your filesystem is scrambled then the data on disk is potentially at risk.

I appreciate any improvement would be useful to you, but really the problem is not one that can be solved given the scenario you've outlined.