filesystems – How to Make Linux Reboot Instead of Remounting Filesystem as Read-Only

corruptionfilesystemsreboot

Linux systems sometimes remount the root file system as read-only, e.g. if there's an I/O error.

I have a machine that becomes useless when this happens, and I end up rebooting it manually.

Is there a way to make Linux just automatically reboot when this happens? A read-only mount is useless to me.

Best Answer

I deduce you are using ext3 or ext4 as the file system. If so, you can mount it with the errors=panic option and configure watchdog to reboot your system in case a panic happen.

While more complex than roelvanmeer's answer (which I upvoted), it has the added bonus of working for all panic-level kernel crash.

As suggested by NikitaKipriyanov, setting the panic=5 kernel boot option can be a simpler alternative to watchdog (which has more configuration options but it is slightly more complex as result).

Related Solutions

Linux – Harddrive issue “read-only”

The filesystem is probably mounted with the option errors=remount-ro, which as the name suggests means that if an error is detected, the filesystem is immediately set to read-only, to avoid further damage.

There will be information in the kernel logs (/var/log/kern.log on most Linux distributions).

What to do next depends on the cause. Here are the most likely:

It could be a failing disk. Often you'll see IO errors reported in the kernel logs. smartctl -a /dev/sdb can tell you more. Back up your data as soon as possible and replace the disk.
It could be a problem with your RAM. Run a memtest just to make sure.
It could be a kernel bug. This is hard for mere mortals to diagnose. Make sure you have the latest kernel released for your distribution.
The filesystem could have been damaged earlier, for a reason that no longer applies (e.g. a kernel bug that has now been fixed). Running fsck should fix the problem, so unfortunately for you this case doesn't apply to you.

Linux – How to make a Linux software RAID1 detect disc corruption

You can force a check of (eg) md0 with

echo "check" > /sys/block/md0/md/sync_action

You can check the state of the test with

cat /sys/block/md0/md/sync_action

while it returns check the check is running, once it returns idle you can do a

cat /sys/block/$dev/md/mismatch_cnt

to see if the mismatch count is zero or not. Many distros automate this check to run eg weekly for you anyway, just as most industrial hardware RAIDs continually run this in the background (they often call it "RAID scrubbing") while the array is otherwise idle. Note that according to the comments in fedora's automated check file, RAID1 writes in the kernel are unbuffered and therefore mismatch counts can be non-zero even for a healthy array if the array is mounted.

So quiescing the arrays by doing this check while the VM is down, if at all possible, is probably a good idea.

I'd add that I agree with the docs when they say that

RAID cannot and is not supposed to guard against data corruption on the media

RAID is supposed to guard against complete failure of a device; guarding against incremental random failures in elements of a storage device is a job for error-checking and block-remapping, which is probably best done in the controller itself. I'm happy that the docs warn people of the limitations of RAID, especially if it's implemented on top of flaky devices. I find that frequent smartctl health checks of my drives help me to stay on top of drives which are starting to show the sort of errors that lead to out-of-sync mirrors.

Best Answer

Related Solutions

Linux – Harddrive issue “read-only”

Linux – How to make a Linux software RAID1 detect disc corruption

Related Topic