Linux – How to find cause of main file system going to read only mode

filesystemslinuxread-onlyUbuntuubuntu-12.04

Ubuntu 12.04

File system goes to readonly mode frequently.
First of all I have read this question file system is going into read only mode frequently already.
But I have to know if it's not caused by something else than dying hard drive. This is server provided by my client and I am just runing there some node.js workers + one node.js server and I am using mongodb.

From time to time (every 20-50h) system suddenly makes filesystem read only, mongodb process fails (due read-only fs) and my node workers/server (which are started by forever) are just killed.

Here is the log from dmesg – I can see there some errors and messages that FS is going to read-only, and there is also some JOURNAL error but I would like to find cause of those errors..

http://speedy.sh/Ux2VV/dmesg.log.txt


edit

smartctl -t long /dev/sda
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.5.0-23-generic] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

SMART support is: Unavailable - device lacks SMART capability.
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

What I am doing wrong? Same is for sda2.

Morover now when I type any command that not exists in shell I get this:

Sorry, command-not-found has crashed! Please file a bug report at:
https://bugs.launchpad.net/command-not-found/+filebug
Please include the following information with the report:

edit2

I just got info that this server is actually VPS and they told me that hard drives are OK and they are on RAID 10. And they told me that "forcing fsck in fstab should help"…


edit3

here is output from mount command:

/dev/sda2 on / type ext4 (rw,errors=remount-ro)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
none on /sys/fs/fuse/connections type fusectl (rw)
none on /sys/kernel/debug type debugfs (rw)
none on /sys/kernel/security type securityfs (rw)
udev on /dev type devtmpfs (rw,mode=0755)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
none on /run/shm type tmpfs (rw,nosuid,nodev)
none on /media/psf type prl_fs (rw,nosuid,nodev,sync,noatime,share,_netdev)

So there is no actually sda drive? Only sda2?


edit4

Output from fsck -N command:

root@ubuntu:~# fsck -N sda
fsck from util-linux 2.20.1
[/sbin/fsck.ext4 (1) -- /] fsck.ext4 sda /dev/sda2 

Best Answer

[26729.124569] Write(10): 2a 00 03 96 5a b0 00 00 08 00
[26729.124576] end_request: I/O error, dev sda, sector 60185264
[26729.125298] Buffer I/O error on device sda2, logical block 4593494
[26729.125986] lost page write due to I/O error on sda2

For me, that's pretty strong evidence that your /dev/sda is on its way out. You could run a smartctl test on it for confirmation (smartctl -t long /dev/sda), but I'd be inclined to replace it as soon as possible.

Edit: the smartctl command I gave is correct as written. Thanks for showing the failure mode in your question; this looks like either you have very old hardware, or there's some kind of translation layer in the way: either virtualisation, or a hardware RAID controller. Can you clarify?

May I repeat my assertion that your HDD is on its way out? Testing's all very well, but getting the hardware replaced before your system packs up and your data are lost should be your priority now. Please, at the very least make sure that your backups are completely up-to-date before wasting any more time on smartctl.

Edit 2: it's certainly worth trying what they've suggested - fscking the file system - but I have little hope that that will fix the problem because your FS isn't dropping to ro mode because of FS inconsistencies, it's dropping to ro mode because of problems talking to the underlying hardware.

If they have confidence that the underlying hardware is fine, then it's an issue between the kernel and the hardware, ie, the virtualisation layer. You should probably get your VPS provider to confirm that the distro, and the exact kernel version, that you're running are fully supported on their VPS system.