Linux – Hard disk error

ext3fsckhard drivelinux

There seems to be something wrong with my hard drive, but I'm not sure what, or how to proceed. The first sign of any problems was this:

I tried making a new directory on my server, but when I did so, it hung for like 30 seconds, then gave this error:

root@smallgames:~# mkdir derpherp
mkdir: cannot create directory `derpherp': Input/output error
Message from syslogd@smallgames at May  1 18:09:17 ...
 kernel:[8731601.569393] journal commit I/O error

I then tried running fsck:

root@smallgames:~# fsck
fsck from util-linux 2.20.1
e2fsck 1.41.12 (17-May-2010)
/dev/vda1: recovering journal
fsck.ext3: Bad magic number in super-block while trying to re-open /dev/vda1
e2fsck: io manager magic bad!

Running it again gives this:

root@smallgames:~# fsck
fsck from util-linux 2.20.1
fsck.ext3: Unable to resolve 'UUID=e4565c70-2bcd-40c8-ac8a-dab5bab4167c'

Running ls on anything now gives me an empty directory.
This is Debian running in a VM in Proxmox.

Running dmesg on the main server gives a lot of these:

ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000008
ata1.00: failed command: READ FPDMA QUEUED
ata1.00: cmd 60/08:00:b9:c1:34/00:00:3f:00:00/40 tag 0 ncq 4096 in
         res 41/40:08:c0:c1:34/00:00:3f:00:00/00 Emask 0x409 (media error) <F>
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1.00: configured for UDMA/133
ata1: EH complete

Output of mdadm --detail /dev/md*:

root@ks212866:~# mdadm --detail /dev/md*
mdadm: /dev/md does not appear to be an md device
/dev/md1:
        Version : 0.90
  Creation Time : Sat Nov  3 22:07:42 2012
     Raid Level : raid1
     Array Size : 10485696 (10.00 GiB 10.74 GB)
  Used Dev Size : 10485696 (10.00 GiB 10.74 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Wed May  1 21:42:44 2013
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 1
  Spare Devices : 0

           UUID : da7935e9:ed88ed4b:a4d2adc2:26fd5302
         Events : 0.67258

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       0        0        1      removed

       2       8       17        -      faulty spare   /dev/sdb1
/dev/md2:
        Version : 0.90
  Creation Time : Sat Nov  3 22:07:43 2012
     Raid Level : raid1
     Array Size : 965746624 (921.01 GiB 988.92 GB)
  Used Dev Size : 965746624 (921.01 GiB 988.92 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Wed May  1 21:42:59 2013
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 1
  Spare Devices : 0

           UUID : 70302f6a:598cdf5f:a4d2adc2:26fd5302
         Events : 0.351218

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       0        0        1      removed

       2       8       18        -      faulty spare   /dev/sdb2

Best Answer

Congratulations. You have encountered an uncorrectable read error on your first drive while your second drive had already failed.

I recommend replacing both drives. Start by replacing the second drive, wait for the rebuild to compete (if it doesn't fail, taking your entire data set with it, which is a real possibility), then replace the first drive. Then take a backup of everything. Finally, run fsck on your host, then within your guest.

However, you will likely not be able to get your data back. With drives that large, the chances of encountering an unrecoverable read error during the resync start at likely and get worse from there.