Hard drives fail, filesystem goes into read-only mode

hard drivehardware

2 days ago I got a warning message that:

A DegradedArray event had been detected on md device /dev/md1.

I contacted my data center and asked them to change the hard drive. They told me that sometimes a server reboot solves the problem. I rebooted the server but it didn't came back online.

Data center told me that both hard drives are demaged and suggested to change them and proceed with server restore.

I restored the server and the next day the / partition have gone into read-only mode. I got a message from my data center:

Dear customer,

the filesystem check has been finished. As suspected the file system structure was damaged and the server isn't able to boot. The data of /dev/md2 has been moved because of extensive filesystem errors to the directory lost+found. You can access the files via the actual triggered recovery mode (Recovery password:*** ) in the directory /mnt .
Please check and try to do a backup of your files. Afterwards do a fresh restore of the operating system.

I have also checked the SMART values of both harddisks. Both harddisks have good SMART values.

What can cause such problems with hard drives? Is it possible that my data center didn't change demaged drives and I did a new install with the same drives?

Best Answer

Your datacenter is staffed by lazy people, idiots, or quite possibly LAZY IDIOTS.

Rebooting won't (or at least shouldn't) magically fix a failed hard drive.
Re-seating the drive (a very common "trick") won't fix a drive marked failed for errors (it will eventually get knocked offline again).

The fact that your server didn't survive a reboot means you have logical corruption - Either due to multiple physical failures or some other problem.
Back everything up like they said, and change your disks out for new ones, and the next time you get a disk failure insist they change the drive & let the RAID array rebuild.

Related Topic