QNAP TS-859U+ RAID5 volume unmounted, e2fsck_64 hangs

data-recoverynetwork-attached-storageqnapraid5storage

We got this QNAP TS-859U+ with firmware version 3.8.1 Build 20121205 at our datacenter. It has Intel(R) Atom(TM) CPU D525 @ 1.80GHz processor and 1GB RAM, 8 of 3TB (Seagate ST33000651AS CC44) drives and they form a 7 drive RAID5 array. The other disk is a global spare.

My intention is to recover as much data as possible.

After a power failure, there was this log message:

[RAID5 Disk Volume: Drive 1 2 8 4 5 6 7] The file system is not clean. It is suggested that you run "check disk".

That RAID5 logical volume was still mounted and we had the chance to start a filesystem check from the QNAP Web GUI. But we decided to do this after work-hours not to cause any inconvenience to the users. But we never had the chance again because the device rebooted itself and the RAID5 logical volume became "Unmounted", so it wasn't possible to start a filesystem check from the GUI anymore since the "CHECK NOW" button became inactive.

I started "Bad Blocks Scan" for all drives and they all completed successfully. They all say "GOOD" for SMART information.

Then I tried to mount that volume manually via SSH and this is the output:

[~] # mount /dev/md0 /share/MD0_DATA -t ext4
wrong fs type, bad option, bad superblock on /dev/md0, missing codepage or other error

This mounting attempt's reflection on dmesg:

[  187.927061] EXT4-fs (md0): ext4_check_descriptors: Checksum for group 0 failed (50238!=44925)
[  187.927297] EXT4-fs (md0): group descriptors corrupted!

Here is a longer dmesg output from device startup:

[  181.203693] raid5: device sda3 operational as raid disk 0
[  181.203794] raid5: device sdg3 operational as raid disk 6
[  181.203893] raid5: device sdf3 operational as raid disk 5
[  181.203992] raid5: device sde3 operational as raid disk 4
[  181.204095] raid5: device sdd3 operational as raid disk 3
[  181.204199] raid5: device sdh3 operational as raid disk 2
[  181.204302] raid5: device sdb3 operational as raid disk 1
[  181.219295] raid5: allocated 119008kB for md0
[  181.219532] 0: w=1 pa=0 pr=7 m=1 a=2 r=7 op1=0 op2=0
[  181.219634] 6: w=2 pa=0 pr=7 m=1 a=2 r=7 op1=0 op2=0
[  181.219732] 5: w=3 pa=0 pr=7 m=1 a=2 r=7 op1=0 op2=0
[  181.219830] 4: w=4 pa=0 pr=7 m=1 a=2 r=7 op1=0 op2=0
[  181.219928] 3: w=5 pa=0 pr=7 m=1 a=2 r=7 op1=0 op2=0
[  181.220030] 2: w=6 pa=0 pr=7 m=1 a=2 r=7 op1=0 op2=0
[  181.220129] 1: w=7 pa=0 pr=7 m=1 a=2 r=7 op1=0 op2=0
[  181.220230] raid5: raid level 5 set md0 active with 7 out of 7 devices, algorithm 2
[  181.220402] RAID5 conf printout:
[  181.220492]  --- rd:7 wd:7
[  181.220582]  disk 0, o:1, dev:sda3
[  181.220674]  disk 1, o:1, dev:sdb3
[  181.220767]  disk 2, o:1, dev:sdh3
[  181.220859]  disk 3, o:1, dev:sdd3
[  181.220951]  disk 4, o:1, dev:sde3
[  181.221048]  disk 5, o:1, dev:sdf3
[  181.221144]  disk 6, o:1, dev:sdg3
[  181.221324] md0: detected capacity change from 0 to 17993917661184
[  182.417718]  md0: unknown partition table
[  182.680943] md: bind<sdf2>
[  184.776414] md: bind<sdg2>
[  186.852363] md: bind<sdh2>
[  187.927061] EXT4-fs (md0): ext4_check_descriptors: Checksum for group 0 failed (50238!=44925)
[  187.927297] EXT4-fs (md0): group descriptors corrupted!

I checked and the RAID is active for md0:

[~] # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] 
md0 : active raid5 sda3[0] sdg3[6] sdf3[5] sde3[4] sdd3[3] sdh3[7] sdb3[1]
      17572185216 blocks super 1.0 level 5, 64k chunk, algorithm 2 [7/7] [UUUUUUU]

md13 : active raid1 sda4[0] sdc4[7] sdh4[6] sdg4[5] sdf4[4] sde4[3] sdd4[2] sdb4[1]
      458880 blocks [8/8] [UUUUUUUU]
      bitmap: 0/57 pages [0KB], 4KB chunk

md9 : active raid1 sda1[0] sdc1[7] sdh1[6] sdg1[5] sdf1[4] sde1[3] sdd1[2] sdb1[1]
      530048 blocks [8/8] [UUUUUUUU]
      bitmap: 0/65 pages [0KB], 4KB chunk

unused devices: <none>

Superblock is persistent as well:

[~] # mdadm --detail /dev/md0
/dev/md0:
        Version : 01.00.03
  Creation Time : Tue Jun 14 13:16:30 2011
     Raid Level : raid5
     Array Size : 17572185216 (16758.14 GiB 17993.92 GB)
  Used Dev Size : 2928697536 (2793.02 GiB 2998.99 GB)
   Raid Devices : 7
  Total Devices : 7
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sun Apr 12 14:55:35 2015
          State : clean
 Active Devices : 7
Working Devices : 7
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : 0
           UUID : 43865f30:c89546e6:c4d0f23f:d3de8e1c
         Events : 16118285

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       8       19        1      active sync   /dev/sdb3
       7       8      115        2      active sync   /dev/sdh3
       3       8       51        3      active sync   /dev/sdd3
       4       8       67        4      active sync   /dev/sde3
       5       8       83        5      active sync   /dev/sdf3
       6       8       99        6      active sync   /dev/sdg3

I tried various e2fsck_64 (even e2fsck_64_qnap) command combinations like:

e2fsck_64 -f /dev/md0
e2fsck_64 -fy /dev/md0
e2fsck_64 -p /dev/md0

..of course after the "adding extra swap" ceremony, because it quickly throws a "memory allocation error" otherwise:

swapoff /dev/md8
mdadm -S /dev/md8
mkswap /dev/sda2
mkswap /dev/sdb2
mkswap /dev/sdc2
mkswap /dev/sdd2
mkswap /dev/sde2
mkswap /dev/sdf2
mkswap /dev/sdg2
mkswap /dev/sdh2
swapon /dev/sda2
swapon /dev/sdb2
swapon /dev/sdc2
swapon /dev/sdd2
swapon /dev/sde2
swapon /dev/sdf2
swapon /dev/sdg2
swapon /dev/sdh2

The scan hangs like this:

/dev/md0: Inode 255856286 has compression flag set on filesystem without compression support.

If I use e2fsck_64 -p, it also adds a CLEARED. message at the end of the line. But it doesn't go any further. Meanwhile, e2fsck_64 process' CPU usage drops to ~0,9% but it still uses around %46 memory. I doesn't look like it's making any effort. System RAM is almost full but it seems like it no more fills any swap space.

I tried adding a USB stick as a bigger swap as the user RottUlf described here: http://forum.qnap.com/viewtopic.php?p=216117 but it did't change a thing.

I also created config file at /etc/e2fsck.conf like this:

[scratch_files]
directory = /tmp/e2fsck
dirinfo = false

..and used a USB stick for that purpose:

mkdir /tmp/e2fsck
mount /dev/sds /tmp/e2fsck

..as mentioned here: http://forum.qnap.com/viewtopic.php?f=142&t=102879&p=460976&hilit=e2fsck.conf#p460976

It didn't help either.

Some documents recommend trying to run e2fsck_64 with a backup superblock but I couldn't find any:

[~] # /usr/local/sbin/dumpe2fs /dev/md0 | grep superblock
dumpe2fs 1.41.4 (27-Jan-2009)
/usr/local/sbin/dumpe2fs: The ext2 superblock is corrupt while trying to open /dev/md0
Couldn't find valid filesystem superblock.

Lastly, I tried to recreate raid with mdadm -CfR –assume-clean because I've read that it helped some people out there who experience similar issues, to get their volume mounted and see their data so they can backup:

[~] # mdadm -CfR --assume-clean /dev/md0 -l 5 -n 7 /dev/sda3 /dev/sdb3 /dev/sdh3 /dev/sdd3 /dev/sde3 /dev/sdf3 /dev/sdg3
mdadm: Defaulting to version 1.-1 metadata
mdadm: /dev/sda3 appears to contain an ext2fs file system
    size=392316032K  mtime=Thu Jan  1 02:00:00 1970
mdadm: /dev/sda3 appears to be part of a raid array:
    level=raid5 devices=7 ctime=Tue Jun 14 13:16:30 2011
mdadm: /dev/sdb3 appears to be part of a raid array:
    level=raid5 devices=7 ctime=Tue Jun 14 13:16:30 2011
mdadm: /dev/sdh3 appears to be part of a raid array:
    level=raid5 devices=7 ctime=Tue Jun 14 13:16:30 2011
mdadm: /dev/sdd3 appears to be part of a raid array:
    level=raid5 devices=7 ctime=Tue Jun 14 13:16:30 2011
mdadm: /dev/sde3 appears to be part of a raid array:
    level=raid5 devices=7 ctime=Tue Jun 14 13:16:30 2011
mdadm: /dev/sdf3 appears to be part of a raid array:
    level=raid5 devices=7 ctime=Tue Jun 14 13:16:30 2011
mdadm: /dev/sdg3 appears to contain an ext2fs file system
    size=818037952K  mtime=Thu Jan  1 02:00:00 1970
mdadm: /dev/sdg3 appears to be part of a raid array:
    level=raid5 devices=7 ctime=Tue Jun 14 13:16:30 2011
mdadm: array /dev/md0 started.

..but it didn't help, still can't mount, same errors.

We also have a beefier QNAP, model TS-EC879U-RP with firmware 3.8.4 Build 20130816. It has around 3.76 GB usable RAM and Intel(R) Xeon(R) CPU E31225 @ 3.10GHz processor. But it's completely full with another set of important data.

So, what I have in mind is to shut the both QNAPs down and take all 8 disks out marking the slot order, keep working QNAP's all 8 disks in a safe place, and put TS-859U+'s disks on TS-EC879U-RP with the correct order and run e2fsck_64 on that powerful QNAP. But I don't know if the other QNAP will correctly detect the problematic RAID at "Unmounted" state…

..or the data on the powerful QNAP will be retained after it ever manages to finish e2fsck_64'ing the "guest disks" and I put all the disks in their original slot and power on.

Any help will be greatly appreciated,

Thanks in advance..

Best Answer

The order of the disks won't matter, the configuration for the RAID is stored on the controller, which is in your older system and moving the disks to another controller will just present 8 new disks for it use. It won't know about any existing data.

Was the file system encrypted or just a standard RAID 5? Use RAID 6 next time :)

The Best Case Scenario

I threw together a VM to recreate your scenario. The drives are just 100 MB so I wouldn't be waiting forever on each resync, but this should be a pretty accurate representation otherwise.

Built the array as generically and default as possible - 512k chunks, left-symmetric layout, disks in letter order.. nothing special.

root@test:~# mdadm --create /dev/md0 --chunk=512 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
root@test:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid5 sdd1[3] sdc1[1] sdb1[0]
      203776 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]

unused devices: <none>

So far, so good; let's make a filesystem, and put some data on it.

root@test:~# mkfs.ext4 /dev/md0
mke2fs 1.41.14 (22-Dec-2010)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
Stride=512 blocks, Stripe width=1024 blocks
51000 inodes, 203776 blocks
10188 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=67371008
25 block groups
8192 blocks per group, 8192 fragments per group
2040 inodes per group
Superblock backups stored on blocks:
        8193, 24577, 40961, 57345, 73729

Writing inode tables: done
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 30 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.
root@test:~# mkdir /mnt/raid5
root@test:~# mount /dev/md0 /mnt/raid5
root@test:~# echo "data" > /mnt/raid5/datafile
root@test:~# dd if=/dev/urandom of=/mnt/raid5/randomdata count=10000
10000+0 records in
10000+0 records out
5120000 bytes (5.1 MB) copied, 0.706526 s, 7.2 MB/s
root@test:~# sha1sum /mnt/raid5/randomdata
847685a5d42524e5b1d5484452a649e854b59064  /mnt/raid5/randomdata

Ok. We've got a filesystem and some data ("data" in datafile, and 5MB worth of random data with that SHA1 hash in randomdata) on it; let's see what happens when we do a re-create.

root@test:~# umount /mnt/raid5
root@test:~# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
root@test:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
unused devices: <none>
root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: /dev/sdb1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 21:07:06 2012
mdadm: /dev/sdc1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 21:07:06 2012
mdadm: /dev/sdd1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 21:07:06 2012
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdd1[2] sdc1[1] sdb1[0]
      203776 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]

unused devices: <none>

The resync finished very quickly with these tiny disks, but it did occur. So here's what was bugging me from earlier; your fdisk -l output. Having no partition table on the md device is not a problem at all, it's expected. Your filesystem resides directly on the fake block device with no partition table.

root@test:~# fdisk -l
...
Disk /dev/md1: 208 MB, 208666624 bytes
2 heads, 4 sectors/track, 50944 cylinders, total 407552 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 524288 bytes / 1048576 bytes
Disk identifier: 0x00000000

Disk /dev/md1 doesn't contain a valid partition table

Yeah, no partition table. But...

root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
/dev/md1: clean, 12/51000 files, 12085/203776 blocks

Perfectly valid filesystem, after a resync. So that's good; let's check on our data files:

root@test:~# mount /dev/md1 /mnt/raid5/
root@test:~# cat /mnt/raid5/datafile
data
root@test:~# sha1sum /mnt/raid5/randomdata
847685a5d42524e5b1d5484452a649e854b59064  /mnt/raid5/randomdata

Solid - no data corruption at all! But this is with the exact same settings, so nothing was mapped differently between the two RAID groups. Let's drop this thing down before we try to break it.

root@test:~# umount /mnt/raid5
root@test:~# mdadm --stop /dev/md1

Taking a Step Back

Before we try to break this, let's talk about why it's hard to break. RAID 5 works by using a parity block that protects an area the same size as the block on every other disk in the array. The parity isn't just on one specific disk, it's rotated around the disks evenly to better spread read load out across the disks in normal operation.

The XOR operation to calculate the parity looks like this:

DISK1  DISK2  DISK3  DISK4  PARITY
1      0      1      1    = 1
0      0      1      1    = 0
1      1      1      1    = 0

So, the parity is spread out among the disks.

DISK1  DISK2  DISK3  DISK4  DISK5
DATA   DATA   DATA   DATA   PARITY
PARITY DATA   DATA   DATA   DATA
DATA   PARITY DATA   DATA   DATA

A resync is typically done when replacing a dead or missing disk; it's also done on mdadm create to assure that the data on the disks aligns with what the RAID's geometry is supposed to look like. In that case, the last disk in the array spec is the one that is 'synced to' - all of the existing data on the other disks is used for the sync.

So, all of the data on the 'new' disk is wiped out and rebuilt; either building fresh data blocks out of parity blocks for what should have been there, or else building fresh parity blocks.

What's cool is that the procedure for both of those things is the exact same: an XOR operation across the data from the rest of the disks. The resync process in this case may have in its layout that a certain block should be a parity block, and think it's building a new parity block, when in fact it's re-creating an old data block. So even if it thinks it's building this:

DISK1  DISK2  DISK3  DISK4  DISK5
PARITY DATA   DATA   DATA   DATA
DATA   PARITY DATA   DATA   DATA
DATA   DATA   PARITY DATA   DATA

...it may just be rebuilding DISK5 from the layout above.

So, it's possible for data to stay consistent even if the array's built wrong.

Throwing a Monkey in the Works

(not a wrench; the whole monkey)

Test 1:

Let's make the array in the wrong order! sdc, then sdd, then sdb..

root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --raid-devices=3 /dev/sdc1 /dev/sdd1 /dev/sdb1
mdadm: /dev/sdc1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:06:34 2012
mdadm: /dev/sdd1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:06:34 2012
mdadm: /dev/sdb1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:06:34 2012
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdb1[3] sdd1[1] sdc1[0]
      203776 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]

unused devices: <none>

Ok, that's all well and good. Do we have a filesystem?

root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
fsck.ext4: Superblock invalid, trying backup blocks...
fsck.ext4: Bad magic number in super-block while trying to open /dev/md1

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>

Nope! Why is that? Because while the data's all there, it's in the wrong order; what was once 512KB of A, then 512KB of B, A, B, and so forth, has now been shuffled to B, A, B, A. The disk now looks like jibberish to the filesystem checker, it won't run. The output of mdadm --misc -D /dev/md1 gives us more detail; It looks like this:

Number   Major   Minor   RaidDevice State
   0       8       33        0      active sync   /dev/sdc1
   1       8       49        1      active sync   /dev/sdd1
   3       8       17        2      active sync   /dev/sdb1

When it should look like this:

Number   Major   Minor   RaidDevice State
   0       8       17        0      active sync   /dev/sdb1
   1       8       33        1      active sync   /dev/sdc1
   3       8       49        2      active sync   /dev/sdd1

So, that's all well and good. We overwrote a whole bunch of data blocks with new parity blocks this time out. Re-create, with the right order now:

root@test:~# mdadm --stop /dev/md1
mdadm: stopped /dev/md1
root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: /dev/sdb1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:11:08 2012
mdadm: /dev/sdc1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:11:08 2012
mdadm: /dev/sdd1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:11:08 2012
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
/dev/md1: clean, 12/51000 files, 12085/203776 blocks

Neat, there's still a filesystem there! Still got data?

root@test:~# mount /dev/md1 /mnt/raid5/
root@test:~# cat /mnt/raid5/datafile
data
root@test:~# sha1sum /mnt/raid5/randomdata
847685a5d42524e5b1d5484452a649e854b59064  /mnt/raid5/randomdata

Success!

Test 2

Ok, let's change the chunk size and see if that gets us some brokenness.

root@test:~# umount /mnt/raid5
root@test:~# mdadm --stop /dev/md1
mdadm: stopped /dev/md1
root@test:~# mdadm --create /dev/md1 --chunk=64 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: /dev/sdb1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:21:19 2012
mdadm: /dev/sdc1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:21:19 2012
mdadm: /dev/sdd1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:21:19 2012
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
fsck.ext4: Superblock invalid, trying backup blocks...
fsck.ext4: Bad magic number in super-block while trying to open /dev/md1

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>

Yeah, yeah, it's hosed when set up like this. But, can we recover?

root@test:~# mdadm --stop /dev/md1
mdadm: stopped /dev/md1
root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: /dev/sdb1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:21:51 2012
mdadm: /dev/sdc1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:21:51 2012
mdadm: /dev/sdd1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:21:51 2012
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
/dev/md1: clean, 12/51000 files, 12085/203776 blocks
root@test:~# mount /dev/md1 /mnt/raid5/
root@test:~# cat /mnt/raid5/datafile
data
root@test:~# sha1sum /mnt/raid5/randomdata
847685a5d42524e5b1d5484452a649e854b59064  /mnt/raid5/randomdata

Success, again!

Test 3

This is the one that I thought would kill data for sure - let's do a different layout algorithm!

root@test:~# umount /mnt/raid5
root@test:~# mdadm --stop /dev/md1
mdadm: stopped /dev/md1
root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --layout=right-asymmetric --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: /dev/sdb1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:32:34 2012
mdadm: /dev/sdc1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:32:34 2012
mdadm: /dev/sdd1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:32:34 2012
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdd1[3] sdc1[1] sdb1[0]
      203776 blocks super 1.2 level 5, 512k chunk, algorithm 1 [3/3] [UUU]

unused devices: <none>
root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
fsck.ext4: Superblock invalid, trying backup blocks...
Superblock has an invalid journal (inode 8).

Scary and bad - it thinks it found something and wants to do some fixing! Ctrl+C!

Clear<y>? cancelled!

fsck.ext4: Illegal inode number while checking ext3 journal for /dev/md1

Ok, crisis averted. Let's see if the data's still intact after resyncing with the wrong layout:

root@test:~# mdadm --stop /dev/md1
mdadm: stopped /dev/md1
root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: /dev/sdb1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:33:02 2012
mdadm: /dev/sdc1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:33:02 2012
mdadm: /dev/sdd1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:33:02 2012
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
/dev/md1: clean, 12/51000 files, 12085/203776 blocks
root@test:~# mount /dev/md1 /mnt/raid5/
root@test:~# cat /mnt/raid5/datafile
data
root@test:~# sha1sum /mnt/raid5/randomdata
847685a5d42524e5b1d5484452a649e854b59064  /mnt/raid5/randomdata

Success!

Test 4

Let's also just prove that that superblock zeroing isn't harmful real quick:

root@test:~# umount /mnt/raid5
root@test:~# mdadm --stop /dev/md1
mdadm: stopped /dev/md1
root@test:~# mdadm --misc --zero-superblock /dev/sdb1 /dev/sdc1 /dev/sdd1
root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
/dev/md1: clean, 12/51000 files, 12085/203776 blocks
root@test:~# mount /dev/md1 /mnt/raid5/
root@test:~# cat /mnt/raid5/datafile
data
root@test:~# sha1sum /mnt/raid5/randomdata
847685a5d42524e5b1d5484452a649e854b59064  /mnt/raid5/randomdata

Yeah, no big deal.

Test 5

Let's just throw everything we've got at it. All 4 previous tests, combined.

Wrong device order
Wrong chunk size
Wrong layout algorithm
Zeroed superblocks (we'll do this between both creations)

Onward!

root@test:~# umount /mnt/raid5
root@test:~# mdadm --stop /dev/md1
mdadm: stopped /dev/md1
root@test:~# mdadm --misc --zero-superblock /dev/sdb1 /dev/sdc1 /dev/sdd1
root@test:~# mdadm --create /dev/md1 --chunk=64 --level=5 --raid-devices=3 --layout=right-symmetric /dev/sdc1 /dev/sdd1 /dev/sdb1
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdb1[3] sdd1[1] sdc1[0]
      204672 blocks super 1.2 level 5, 64k chunk, algorithm 3 [3/3] [UUU]

unused devices: <none>
root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
fsck.ext4: Superblock invalid, trying backup blocks...
fsck.ext4: Bad magic number in super-block while trying to open /dev/md1

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
root@test:~# mdadm --stop /dev/md1
mdadm: stopped /dev/md1

The verdict?

root@test:~# mdadm --misc --zero-superblock /dev/sdb1 /dev/sdc1 /dev/sdd1
root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdd1[3] sdc1[1] sdb1[0]
      203776 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]

unused devices: <none>

root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
/dev/md1: clean, 13/51000 files, 17085/203776 blocks
root@test:~# mount /dev/md1 /mnt/raid5/
root@test:~# cat /mnt/raid5/datafile
data
root@test:~# sha1sum /mnt/raid5/randomdata
847685a5d42524e5b1d5484452a649e854b59064  /mnt/raid5/randomdata

Wow.

So, it looks like none of these actions corrupted data in any way. I was quite surprised by this result, frankly; I expected moderate odds of data loss on the chunk size change, and some definite loss on the layout change. I learned something today.

So .. How do I get my data??

As much information as you have about the old system would be extremely helpful to you. If you know the filesystem type, if you have any old copies of your /proc/mdstat with information on drive order, algorithm, chunk size, and metadata version. Do you have mdadm's email alerts set up? If so, find an old one; if not, check /var/spool/mail/root. Check your ~/.bash_history to see if your original build is in there.

So, the list of things that you should do:

Back up the disks with dd before doing anything!!
Try to fsck the current, active md - you may have just happened to build in the same order as before. If you know the filesystem type, that's helpful; use that specific fsck tool. If any of the tools offer to fix anything, don't let them unless you're sure that they've actually found the valid filesystem! If an fsck offers to fix something for you, don't hesitate to leave a comment to ask whether it's actually helping or just about to nuke data.
Try building the array with different parameters. If you have an old /proc/mdstat, then you can just mimic what it shows; if not, then you're kinda in the dark - trying all of the different drive orders is reasonable, but checking every possible chunk size with every possible order is futile. For each, fsck it to see if you get anything promising.

So, that's that. Sorry for the novel, feel free to leave a comment if you have any questions, and good luck!

footnote: under 22 thousand characters; 8k+ shy of the length limit

Best Answer

Related Solutions

Linux – mdadm raid5 failure. set wrong drive to faulty by accident

Recover RAID 5 data after created new array instead of re-using

The Best Case Scenario

Taking a Step Back

Throwing a Monkey in the Works

(not a wrench; the whole monkey)

So .. How do I get my data??

Related Topic