Mounting the filesystem with sync
specified in fstab would probably help. I suspect someone will have a recommendation better suited for your particular application.
I begun initial research on filesystems used with flash storage, as I want to custom-build a home theater PC as an appliance. You may find a different storage solution better suited for your device. Unfortunately, I have yet to find something I prefer so I do not have a detailed recommendation there.
Edit 1
According to the smb.conf(5) manpage, it supports immediate syncing within SAMBA:
strict sync (S)
Many Windows applications (including the Windows 98
explorer shell) seem to confuse flushing buffer
contents to disk with doing a sync to disk. Under
UNIX, a sync call forces the process to be sus-
pended until the kernel has ensured that all out-
standing data in kernel disk buffers has been
safely stored onto stable storage. This is very
slow and should only be done rarely. Setting this
parameter to no (the default) means that smbd(8)
ignores the Windows applications requests for a
sync call. There is only a possibility of losing
data if the operating system itself that Samba is
running on crashes, so there is little danger in
this default setting. In addition, this fixes many
performance problems that people have reported with
the new Windows98 explorer shell file copies.
Default: strict sync = no
sync always (S)
This is a boolean parameter that controls whether
writes will always be written to stable storage
before the write call returns. If this is no then
the server will be guided by the client's request
in each write call (clients can set a bit indicat-
ing that a particular write should be synchronous).
If this is yes then every write will be followed by
a fsync() call to ensure the data is written to
disk. Note that the strict sync parameter must be
set to yes in order for this parameter to have any
affect.
Default: sync always = no
When suddenly losing power, MLC/TLC/QLC SSDs have two failure modes:
- they lose the in-flight and in-DRAM-only writes;
- they can corrupt any data-at-rest stored in the lower page of the NAND cell being programmed.
The first failure condition is obvious: without power protection, any data which are not on stable storage (ie: NAND itself) but on volatile cache only (DRAM) will be lost. The same happens with classical mechanical disks (and that alone can wreak havoc on filesystem which does not properly issue fsyncs).
The second failure condition is a MLC+ SSDs affair: when reprogramming the high page bit for storing new data, an unexpected power loss can destroy/alter the lower bit (ie: previous committed data) also.
The only true, and most obvious, solution is to integrate a power-loss-protected DRAM cache (generally using battery/supercaps), as done since forever by high-end RAID controllers; this, however, increase drive cost/price. Consumer drives typically have no power-loss-protected caches; rather, they use an array of more economical solutions as:
- partially protected write cache (ie: Crucial M500/M550/M600+);
- NAND changes journal (ie: Samsung drives, see SMART PoR attribute);
- special SLC/pseudo-SLC NAND regions to absorbe new writes without previous data at risk (ie: Sandisk, Samsung, etc).
Back to your question: your Kingstone drives are ultra-cheap ones, using unspecified controller and basically no public specs. It does not surprise me that a sudden power loss corrupted previous data. Unfortunately, even disabling the disk's DRAM cache (with the massive performance loss it commands) will not solve your problem, as previous data (ie: data-at-rest) can, and will, be corrupted by unexptected power losses. If they are based on the old Sandforce controller, even a total drive brick can be expected under the "right" circumstances.
I strongly suggest to review your UPS and, in the mid-term, to replace these aging drives.
A last note about PostgreSQL and other Linux databases: they will not disable the disk's cache and should not be exptected to do that. Rather, they isses periodic/required fsyncs/FUAs to commit key data to stable storage. This is the way things should be done unless a very compelling reason exists (ie: a drive which lies about ATA FLUSHES/FUAs).
EDIT: if possible, consider migrating to a checksumming filesystem as ZFS or BTRFS. At the very least consider XFS, which has journal checksum and, lately, even metadata checksum. If you are forced to use EXT4, consider enabling auto-fsck at startup (fsck.ext4 is very good at repair corruption).
Best Answer
Potentially, yes. There's 2 obvious routes via which this could happen.
Ext4 is a metadata journaling filesystem - it only journals the changes to the file's meta data (size, location, dates) - not the file contents (btrfs and zfs do full-data journalling at a big performance cost). So although you should never have to fsck the disk, it doesn't follow that every write operation betwen opening the file and closing + flushing the buffers will have completed. There is no transactional control over writes to the file data.
A second possibility is that the disk may be physically damaged by power spikes. Although the rest of the hardware tends to do a good job of isolating the hard disk, there will still be some leakage.
That's a very different question - this is a lot less likely. Certainly the first scenario only applies if you happen to be writing the kernel, bootloader, ramdisk etc at the time of the outage.
See also this Q&A on unix.stackexchange