QNAP TS-859U+ RAID5 volume unmounted, e2fsck_64 hangs

data-recoverynetwork-attached-storageqnapraid5storage

We got this QNAP TS-859U+ with firmware version 3.8.1 Build 20121205 at our datacenter. It has Intel(R) Atom(TM) CPU D525 @ 1.80GHz processor and 1GB RAM, 8 of 3TB (Seagate ST33000651AS CC44) drives and they form a 7 drive RAID5 array. The other disk is a global spare.

My intention is to recover as much data as possible.

After a power failure, there was this log message:

[RAID5 Disk Volume: Drive 1 2 8 4 5 6 7] The file system is not clean. It is suggested that you run "check disk".

That RAID5 logical volume was still mounted and we had the chance to start a filesystem check from the QNAP Web GUI. But we decided to do this after work-hours not to cause any inconvenience to the users. But we never had the chance again because the device rebooted itself and the RAID5 logical volume became "Unmounted", so it wasn't possible to start a filesystem check from the GUI anymore since the "CHECK NOW" button became inactive.

I started "Bad Blocks Scan" for all drives and they all completed successfully. They all say "GOOD" for SMART information.

Then I tried to mount that volume manually via SSH and this is the output:

[~] # mount /dev/md0 /share/MD0_DATA -t ext4
wrong fs type, bad option, bad superblock on /dev/md0, missing codepage or other error

This mounting attempt's reflection on dmesg:

[  187.927061] EXT4-fs (md0): ext4_check_descriptors: Checksum for group 0 failed (50238!=44925)
[  187.927297] EXT4-fs (md0): group descriptors corrupted!

Here is a longer dmesg output from device startup:

[  181.203693] raid5: device sda3 operational as raid disk 0
[  181.203794] raid5: device sdg3 operational as raid disk 6
[  181.203893] raid5: device sdf3 operational as raid disk 5
[  181.203992] raid5: device sde3 operational as raid disk 4
[  181.204095] raid5: device sdd3 operational as raid disk 3
[  181.204199] raid5: device sdh3 operational as raid disk 2
[  181.204302] raid5: device sdb3 operational as raid disk 1
[  181.219295] raid5: allocated 119008kB for md0
[  181.219532] 0: w=1 pa=0 pr=7 m=1 a=2 r=7 op1=0 op2=0
[  181.219634] 6: w=2 pa=0 pr=7 m=1 a=2 r=7 op1=0 op2=0
[  181.219732] 5: w=3 pa=0 pr=7 m=1 a=2 r=7 op1=0 op2=0
[  181.219830] 4: w=4 pa=0 pr=7 m=1 a=2 r=7 op1=0 op2=0
[  181.219928] 3: w=5 pa=0 pr=7 m=1 a=2 r=7 op1=0 op2=0
[  181.220030] 2: w=6 pa=0 pr=7 m=1 a=2 r=7 op1=0 op2=0
[  181.220129] 1: w=7 pa=0 pr=7 m=1 a=2 r=7 op1=0 op2=0
[  181.220230] raid5: raid level 5 set md0 active with 7 out of 7 devices, algorithm 2
[  181.220402] RAID5 conf printout:
[  181.220492]  --- rd:7 wd:7
[  181.220582]  disk 0, o:1, dev:sda3
[  181.220674]  disk 1, o:1, dev:sdb3
[  181.220767]  disk 2, o:1, dev:sdh3
[  181.220859]  disk 3, o:1, dev:sdd3
[  181.220951]  disk 4, o:1, dev:sde3
[  181.221048]  disk 5, o:1, dev:sdf3
[  181.221144]  disk 6, o:1, dev:sdg3
[  181.221324] md0: detected capacity change from 0 to 17993917661184
[  182.417718]  md0: unknown partition table
[  182.680943] md: bind<sdf2>
[  184.776414] md: bind<sdg2>
[  186.852363] md: bind<sdh2>
[  187.927061] EXT4-fs (md0): ext4_check_descriptors: Checksum for group 0 failed (50238!=44925)
[  187.927297] EXT4-fs (md0): group descriptors corrupted!

I checked and the RAID is active for md0:

[~] # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] 
md0 : active raid5 sda3[0] sdg3[6] sdf3[5] sde3[4] sdd3[3] sdh3[7] sdb3[1]
      17572185216 blocks super 1.0 level 5, 64k chunk, algorithm 2 [7/7] [UUUUUUU]

md13 : active raid1 sda4[0] sdc4[7] sdh4[6] sdg4[5] sdf4[4] sde4[3] sdd4[2] sdb4[1]
      458880 blocks [8/8] [UUUUUUUU]
      bitmap: 0/57 pages [0KB], 4KB chunk

md9 : active raid1 sda1[0] sdc1[7] sdh1[6] sdg1[5] sdf1[4] sde1[3] sdd1[2] sdb1[1]
      530048 blocks [8/8] [UUUUUUUU]
      bitmap: 0/65 pages [0KB], 4KB chunk

unused devices: <none>

Superblock is persistent as well:

[~] # mdadm --detail /dev/md0
/dev/md0:
        Version : 01.00.03
  Creation Time : Tue Jun 14 13:16:30 2011
     Raid Level : raid5
     Array Size : 17572185216 (16758.14 GiB 17993.92 GB)
  Used Dev Size : 2928697536 (2793.02 GiB 2998.99 GB)
   Raid Devices : 7
  Total Devices : 7
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sun Apr 12 14:55:35 2015
          State : clean
 Active Devices : 7
Working Devices : 7
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : 0
           UUID : 43865f30:c89546e6:c4d0f23f:d3de8e1c
         Events : 16118285

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       8       19        1      active sync   /dev/sdb3
       7       8      115        2      active sync   /dev/sdh3
       3       8       51        3      active sync   /dev/sdd3
       4       8       67        4      active sync   /dev/sde3
       5       8       83        5      active sync   /dev/sdf3
       6       8       99        6      active sync   /dev/sdg3

I tried various e2fsck_64 (even e2fsck_64_qnap) command combinations like:

e2fsck_64 -f /dev/md0
e2fsck_64 -fy /dev/md0
e2fsck_64 -p /dev/md0

..of course after the "adding extra swap" ceremony, because it quickly throws a "memory allocation error" otherwise:

swapoff /dev/md8
mdadm -S /dev/md8
mkswap /dev/sda2
mkswap /dev/sdb2
mkswap /dev/sdc2
mkswap /dev/sdd2
mkswap /dev/sde2
mkswap /dev/sdf2
mkswap /dev/sdg2
mkswap /dev/sdh2
swapon /dev/sda2
swapon /dev/sdb2
swapon /dev/sdc2
swapon /dev/sdd2
swapon /dev/sde2
swapon /dev/sdf2
swapon /dev/sdg2
swapon /dev/sdh2

The scan hangs like this:

/dev/md0: Inode 255856286 has compression flag set on filesystem without compression support.

If I use e2fsck_64 -p, it also adds a CLEARED. message at the end of the line. But it doesn't go any further. Meanwhile, e2fsck_64 process' CPU usage drops to ~0,9% but it still uses around %46 memory. I doesn't look like it's making any effort. System RAM is almost full but it seems like it no more fills any swap space.

I tried adding a USB stick as a bigger swap as the user RottUlf described here: http://forum.qnap.com/viewtopic.php?p=216117 but it did't change a thing.

I also created config file at /etc/e2fsck.conf like this:

[scratch_files]
directory = /tmp/e2fsck
dirinfo = false

..and used a USB stick for that purpose:

mkdir /tmp/e2fsck
mount /dev/sds /tmp/e2fsck

..as mentioned here: http://forum.qnap.com/viewtopic.php?f=142&t=102879&p=460976&hilit=e2fsck.conf#p460976

It didn't help either.

Some documents recommend trying to run e2fsck_64 with a backup superblock but I couldn't find any:

[~] # /usr/local/sbin/dumpe2fs /dev/md0 | grep superblock
dumpe2fs 1.41.4 (27-Jan-2009)
/usr/local/sbin/dumpe2fs: The ext2 superblock is corrupt while trying to open /dev/md0
Couldn't find valid filesystem superblock.

Lastly, I tried to recreate raid with mdadm -CfR –assume-clean because I've read that it helped some people out there who experience similar issues, to get their volume mounted and see their data so they can backup:

[~] # mdadm -CfR --assume-clean /dev/md0 -l 5 -n 7 /dev/sda3 /dev/sdb3 /dev/sdh3 /dev/sdd3 /dev/sde3 /dev/sdf3 /dev/sdg3
mdadm: Defaulting to version 1.-1 metadata
mdadm: /dev/sda3 appears to contain an ext2fs file system
    size=392316032K  mtime=Thu Jan  1 02:00:00 1970
mdadm: /dev/sda3 appears to be part of a raid array:
    level=raid5 devices=7 ctime=Tue Jun 14 13:16:30 2011
mdadm: /dev/sdb3 appears to be part of a raid array:
    level=raid5 devices=7 ctime=Tue Jun 14 13:16:30 2011
mdadm: /dev/sdh3 appears to be part of a raid array:
    level=raid5 devices=7 ctime=Tue Jun 14 13:16:30 2011
mdadm: /dev/sdd3 appears to be part of a raid array:
    level=raid5 devices=7 ctime=Tue Jun 14 13:16:30 2011
mdadm: /dev/sde3 appears to be part of a raid array:
    level=raid5 devices=7 ctime=Tue Jun 14 13:16:30 2011
mdadm: /dev/sdf3 appears to be part of a raid array:
    level=raid5 devices=7 ctime=Tue Jun 14 13:16:30 2011
mdadm: /dev/sdg3 appears to contain an ext2fs file system
    size=818037952K  mtime=Thu Jan  1 02:00:00 1970
mdadm: /dev/sdg3 appears to be part of a raid array:
    level=raid5 devices=7 ctime=Tue Jun 14 13:16:30 2011
mdadm: array /dev/md0 started.

..but it didn't help, still can't mount, same errors.

We also have a beefier QNAP, model TS-EC879U-RP with firmware 3.8.4 Build 20130816. It has around 3.76 GB usable RAM and Intel(R) Xeon(R) CPU E31225 @ 3.10GHz processor. But it's completely full with another set of important data.

So, what I have in mind is to shut the both QNAPs down and take all 8 disks out marking the slot order, keep working QNAP's all 8 disks in a safe place, and put TS-859U+'s disks on TS-EC879U-RP with the correct order and run e2fsck_64 on that powerful QNAP. But I don't know if the other QNAP will correctly detect the problematic RAID at "Unmounted" state…

..or the data on the powerful QNAP will be retained after it ever manages to finish e2fsck_64'ing the "guest disks" and I put all the disks in their original slot and power on.

Any help will be greatly appreciated,

Thanks in advance..

Best Answer

The order of the disks won't matter, the configuration for the RAID is stored on the controller, which is in your older system and moving the disks to another controller will just present 8 new disks for it use. It won't know about any existing data.

Was the file system encrypted or just a standard RAID 5? Use RAID 6 next time :)

Related Topic