Recover RAID 5 data after created new array instead of re-using

data-recoverymdadmraid5

Folks please help – I am a newb with a major headache at hand (perfect storm situation).

I have a 3 1tb hdd on my ubuntu 11.04 configured as software raid 5. The data had been copied weekly onto another separate off the computer hard drive until that completely failed and was thrown away. A few days back we had a power outage and after rebooting my box wouldn't mount the raid. In my infinite wisdom I entered

mdadm --create -f...

command instead of

mdadm --assemble

and didn't notice the travesty that I had done until after. It started the array degraded and proceeded with building and syncing it which took ~10 hours. After I was back I saw that that the array is successfully up and running but the raid is not

I mean the individual drives are partitioned (partition type f8 ) but the md0 device is not. Realizing in horror what I have done I am trying to find some solutions. I just pray that --create didn't overwrite entire content of the hard driver.

Could someone PLEASE help me out with this – the data that's on the drive is very important and unique ~10 years of photos, docs, etc.

Is it possible that by specifying the participating hard drives in wrong order can make mdadm overwrite them? when I do

mdadm --examine --scan 

I get something like ARRAY /dev/md/0 metadata=1.2 UUID=f1b4084a:720b5712:6d03b9e9:43afe51b name=<hostname>:0

Interestingly enough name used to be 'raid' and not the host hame with :0 appended.

Here is the 'sanitized' config entries:

DEVICE /dev/sdf1 /dev/sde1 /dev/sdd1

CREATE owner=root group=disk mode=0660 auto=yes

HOMEHOST <system>

MAILADDR root


ARRAY /dev/md0 metadata=1.2 name=tanserv:0 UUID=f1b4084a:720b5712:6d03b9e9:43afe51b


Here is the output from mdstat

cat /proc/mdstat 
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : active raid5 sdd1[0] sdf1[3] sde1[1]
1953517568 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]

unused devices: <none>


fdisk shows the following:

fdisk -l

Disk /dev/sda: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000bf62e

Device Boot Start End Blocks Id System
/dev/sda1 * 1 9443 75846656 83 Linux
/dev/sda2 9443 9730 2301953 5 Extended
/dev/sda5 9443 9730 2301952 82 Linux swap / Solaris

Disk /dev/sdb: 750.2 GB, 750156374016 bytes
255 heads, 63 sectors/track, 91201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000de8dd

Device Boot Start End Blocks Id System
/dev/sdb1 1 91201 732572001 8e Linux LVM

Disk /dev/sdc: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00056a17

Device Boot Start End Blocks Id System
/dev/sdc1 1 60801 488384001 8e Linux LVM

Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000ca948

Device Boot Start End Blocks Id System
/dev/sdd1 1 121601 976760001 fd Linux raid autodetect

Disk /dev/dm-0: 1250.3 GB, 1250254913536 bytes
255 heads, 63 sectors/track, 152001 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/dm-0 doesn't contain a valid partition table

Disk /dev/sde: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x93a66687

Device Boot Start End Blocks Id System
/dev/sde1 1 121601 976760001 fd Linux raid autodetect

Disk /dev/sdf: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xe6edc059

Device Boot Start End Blocks Id System
/dev/sdf1 1 121601 976760001 fd Linux raid autodetect

Disk /dev/md0: 2000.4 GB, 2000401989632 bytes
2 heads, 4 sectors/track, 488379392 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 524288 bytes / 1048576 bytes
Disk identifier: 0x00000000

Disk /dev/md0 doesn't contain a valid partition table

Per suggestions I did clean up the superblocks and re-created the array with --assume-clean option but with no luck at all.

Is there any tool that will help me to revive at least some of the data? Can someone tell me what and how the mdadm –create does when syncs to destroy the data so I can write a tool to un-do whatever was done?

After the re-creating of the raid I run fsck.ext4 /dev/md0 and here is the output

root@tanserv:/etc/mdadm# fsck.ext4 /dev/md0
e2fsck 1.41.14 (22-Dec-2010)
fsck.ext4: Superblock invalid, trying backup blocks…
fsck.ext4: Bad magic number in super-block while trying to open /dev/md0

The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193


Per Shanes' suggestion I tried

root@tanserv:/home/mushegh# mkfs.ext4 -n /dev/md0
mke2fs 1.41.14 (22-Dec-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=128 blocks, Stripe width=256 blocks
122101760 inodes, 488379392 blocks
24418969 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=0
14905 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
    4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
    102400000, 214990848

and run fsck.ext4 with every backup block but all returned the following:

root@tanserv:/home/mushegh# fsck.ext4 -b 214990848 /dev/md0
e2fsck 1.41.14 (22-Dec-2010)
fsck.ext4: Invalid argument while trying to open /dev/md0

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>

Any suggestions?

Regards!

Best Answer

Ok - something was bugging me about your issue, so I fired up a VM to dive into the behavior that should be expected. I'll get to what was bugging me in a minute; first let me say this:

Back up these drives before attempting anything!!

You may have already done damage beyond what the resync did; can you clarify what you meant when you said:

Per suggestions I did clean up the superblocks and re-created the array with --assume-clean option but with no luck at all.

If you ran a mdadm --misc --zero-superblock, then you should be fine.

Anyway, scavenge up some new disks and grab exact current images of them before doing anything at all that might do any more writing to these disks.

dd if=/dev/sdd of=/path/to/store/sdd.img

That being said.. it looks like data stored on these things is shockingly resilient to wayward resyncs. Read on, there is hope, and this may be the day that I hit the answer length limit.


The Best Case Scenario

I threw together a VM to recreate your scenario. The drives are just 100 MB so I wouldn't be waiting forever on each resync, but this should be a pretty accurate representation otherwise.

Built the array as generically and default as possible - 512k chunks, left-symmetric layout, disks in letter order.. nothing special.

root@test:~# mdadm --create /dev/md0 --chunk=512 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
root@test:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid5 sdd1[3] sdc1[1] sdb1[0]
      203776 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]

unused devices: <none>

So far, so good; let's make a filesystem, and put some data on it.

root@test:~# mkfs.ext4 /dev/md0
mke2fs 1.41.14 (22-Dec-2010)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
Stride=512 blocks, Stripe width=1024 blocks
51000 inodes, 203776 blocks
10188 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=67371008
25 block groups
8192 blocks per group, 8192 fragments per group
2040 inodes per group
Superblock backups stored on blocks:
        8193, 24577, 40961, 57345, 73729

Writing inode tables: done
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 30 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.
root@test:~# mkdir /mnt/raid5
root@test:~# mount /dev/md0 /mnt/raid5
root@test:~# echo "data" > /mnt/raid5/datafile
root@test:~# dd if=/dev/urandom of=/mnt/raid5/randomdata count=10000
10000+0 records in
10000+0 records out
5120000 bytes (5.1 MB) copied, 0.706526 s, 7.2 MB/s
root@test:~# sha1sum /mnt/raid5/randomdata
847685a5d42524e5b1d5484452a649e854b59064  /mnt/raid5/randomdata

Ok. We've got a filesystem and some data ("data" in datafile, and 5MB worth of random data with that SHA1 hash in randomdata) on it; let's see what happens when we do a re-create.

root@test:~# umount /mnt/raid5
root@test:~# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
root@test:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
unused devices: <none>
root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: /dev/sdb1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 21:07:06 2012
mdadm: /dev/sdc1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 21:07:06 2012
mdadm: /dev/sdd1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 21:07:06 2012
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdd1[2] sdc1[1] sdb1[0]
      203776 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]

unused devices: <none>

The resync finished very quickly with these tiny disks, but it did occur. So here's what was bugging me from earlier; your fdisk -l output. Having no partition table on the md device is not a problem at all, it's expected. Your filesystem resides directly on the fake block device with no partition table.

root@test:~# fdisk -l
...
Disk /dev/md1: 208 MB, 208666624 bytes
2 heads, 4 sectors/track, 50944 cylinders, total 407552 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 524288 bytes / 1048576 bytes
Disk identifier: 0x00000000

Disk /dev/md1 doesn't contain a valid partition table

Yeah, no partition table. But...

root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
/dev/md1: clean, 12/51000 files, 12085/203776 blocks

Perfectly valid filesystem, after a resync. So that's good; let's check on our data files:

root@test:~# mount /dev/md1 /mnt/raid5/
root@test:~# cat /mnt/raid5/datafile
data
root@test:~# sha1sum /mnt/raid5/randomdata
847685a5d42524e5b1d5484452a649e854b59064  /mnt/raid5/randomdata

Solid - no data corruption at all! But this is with the exact same settings, so nothing was mapped differently between the two RAID groups. Let's drop this thing down before we try to break it.

root@test:~# umount /mnt/raid5
root@test:~# mdadm --stop /dev/md1

Taking a Step Back

Before we try to break this, let's talk about why it's hard to break. RAID 5 works by using a parity block that protects an area the same size as the block on every other disk in the array. The parity isn't just on one specific disk, it's rotated around the disks evenly to better spread read load out across the disks in normal operation.

The XOR operation to calculate the parity looks like this:

DISK1  DISK2  DISK3  DISK4  PARITY
1      0      1      1    = 1
0      0      1      1    = 0
1      1      1      1    = 0

So, the parity is spread out among the disks.

DISK1  DISK2  DISK3  DISK4  DISK5
DATA   DATA   DATA   DATA   PARITY
PARITY DATA   DATA   DATA   DATA
DATA   PARITY DATA   DATA   DATA

A resync is typically done when replacing a dead or missing disk; it's also done on mdadm create to assure that the data on the disks aligns with what the RAID's geometry is supposed to look like. In that case, the last disk in the array spec is the one that is 'synced to' - all of the existing data on the other disks is used for the sync.

So, all of the data on the 'new' disk is wiped out and rebuilt; either building fresh data blocks out of parity blocks for what should have been there, or else building fresh parity blocks.

What's cool is that the procedure for both of those things is the exact same: an XOR operation across the data from the rest of the disks. The resync process in this case may have in its layout that a certain block should be a parity block, and think it's building a new parity block, when in fact it's re-creating an old data block. So even if it thinks it's building this:

DISK1  DISK2  DISK3  DISK4  DISK5
PARITY DATA   DATA   DATA   DATA
DATA   PARITY DATA   DATA   DATA
DATA   DATA   PARITY DATA   DATA

...it may just be rebuilding DISK5 from the layout above.

So, it's possible for data to stay consistent even if the array's built wrong.


Throwing a Monkey in the Works

(not a wrench; the whole monkey)

Test 1:

Let's make the array in the wrong order! sdc, then sdd, then sdb..

root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --raid-devices=3 /dev/sdc1 /dev/sdd1 /dev/sdb1
mdadm: /dev/sdc1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:06:34 2012
mdadm: /dev/sdd1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:06:34 2012
mdadm: /dev/sdb1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:06:34 2012
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdb1[3] sdd1[1] sdc1[0]
      203776 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]

unused devices: <none>

Ok, that's all well and good. Do we have a filesystem?

root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
fsck.ext4: Superblock invalid, trying backup blocks...
fsck.ext4: Bad magic number in super-block while trying to open /dev/md1

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>

Nope! Why is that? Because while the data's all there, it's in the wrong order; what was once 512KB of A, then 512KB of B, A, B, and so forth, has now been shuffled to B, A, B, A. The disk now looks like jibberish to the filesystem checker, it won't run. The output of mdadm --misc -D /dev/md1 gives us more detail; It looks like this:

Number   Major   Minor   RaidDevice State
   0       8       33        0      active sync   /dev/sdc1
   1       8       49        1      active sync   /dev/sdd1
   3       8       17        2      active sync   /dev/sdb1

When it should look like this:

Number   Major   Minor   RaidDevice State
   0       8       17        0      active sync   /dev/sdb1
   1       8       33        1      active sync   /dev/sdc1
   3       8       49        2      active sync   /dev/sdd1

So, that's all well and good. We overwrote a whole bunch of data blocks with new parity blocks this time out. Re-create, with the right order now:

root@test:~# mdadm --stop /dev/md1
mdadm: stopped /dev/md1
root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: /dev/sdb1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:11:08 2012
mdadm: /dev/sdc1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:11:08 2012
mdadm: /dev/sdd1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:11:08 2012
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
/dev/md1: clean, 12/51000 files, 12085/203776 blocks

Neat, there's still a filesystem there! Still got data?

root@test:~# mount /dev/md1 /mnt/raid5/
root@test:~# cat /mnt/raid5/datafile
data
root@test:~# sha1sum /mnt/raid5/randomdata
847685a5d42524e5b1d5484452a649e854b59064  /mnt/raid5/randomdata

Success!

Test 2

Ok, let's change the chunk size and see if that gets us some brokenness.

root@test:~# umount /mnt/raid5
root@test:~# mdadm --stop /dev/md1
mdadm: stopped /dev/md1
root@test:~# mdadm --create /dev/md1 --chunk=64 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: /dev/sdb1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:21:19 2012
mdadm: /dev/sdc1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:21:19 2012
mdadm: /dev/sdd1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:21:19 2012
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
fsck.ext4: Superblock invalid, trying backup blocks...
fsck.ext4: Bad magic number in super-block while trying to open /dev/md1

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>

Yeah, yeah, it's hosed when set up like this. But, can we recover?

root@test:~# mdadm --stop /dev/md1
mdadm: stopped /dev/md1
root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: /dev/sdb1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:21:51 2012
mdadm: /dev/sdc1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:21:51 2012
mdadm: /dev/sdd1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:21:51 2012
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
/dev/md1: clean, 12/51000 files, 12085/203776 blocks
root@test:~# mount /dev/md1 /mnt/raid5/
root@test:~# cat /mnt/raid5/datafile
data
root@test:~# sha1sum /mnt/raid5/randomdata
847685a5d42524e5b1d5484452a649e854b59064  /mnt/raid5/randomdata

Success, again!

Test 3

This is the one that I thought would kill data for sure - let's do a different layout algorithm!

root@test:~# umount /mnt/raid5
root@test:~# mdadm --stop /dev/md1
mdadm: stopped /dev/md1
root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --layout=right-asymmetric --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: /dev/sdb1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:32:34 2012
mdadm: /dev/sdc1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:32:34 2012
mdadm: /dev/sdd1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:32:34 2012
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdd1[3] sdc1[1] sdb1[0]
      203776 blocks super 1.2 level 5, 512k chunk, algorithm 1 [3/3] [UUU]

unused devices: <none>
root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
fsck.ext4: Superblock invalid, trying backup blocks...
Superblock has an invalid journal (inode 8).

Scary and bad - it thinks it found something and wants to do some fixing! Ctrl+C!

Clear<y>? cancelled!

fsck.ext4: Illegal inode number while checking ext3 journal for /dev/md1

Ok, crisis averted. Let's see if the data's still intact after resyncing with the wrong layout:

root@test:~# mdadm --stop /dev/md1
mdadm: stopped /dev/md1
root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: /dev/sdb1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:33:02 2012
mdadm: /dev/sdc1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:33:02 2012
mdadm: /dev/sdd1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Sat Jan  7 23:33:02 2012
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
/dev/md1: clean, 12/51000 files, 12085/203776 blocks
root@test:~# mount /dev/md1 /mnt/raid5/
root@test:~# cat /mnt/raid5/datafile
data
root@test:~# sha1sum /mnt/raid5/randomdata
847685a5d42524e5b1d5484452a649e854b59064  /mnt/raid5/randomdata

Success!

Test 4

Let's also just prove that that superblock zeroing isn't harmful real quick:

root@test:~# umount /mnt/raid5
root@test:~# mdadm --stop /dev/md1
mdadm: stopped /dev/md1
root@test:~# mdadm --misc --zero-superblock /dev/sdb1 /dev/sdc1 /dev/sdd1
root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
/dev/md1: clean, 12/51000 files, 12085/203776 blocks
root@test:~# mount /dev/md1 /mnt/raid5/
root@test:~# cat /mnt/raid5/datafile
data
root@test:~# sha1sum /mnt/raid5/randomdata
847685a5d42524e5b1d5484452a649e854b59064  /mnt/raid5/randomdata

Yeah, no big deal.

Test 5

Let's just throw everything we've got at it. All 4 previous tests, combined.

  • Wrong device order
  • Wrong chunk size
  • Wrong layout algorithm
  • Zeroed superblocks (we'll do this between both creations)

Onward!

root@test:~# umount /mnt/raid5
root@test:~# mdadm --stop /dev/md1
mdadm: stopped /dev/md1
root@test:~# mdadm --misc --zero-superblock /dev/sdb1 /dev/sdc1 /dev/sdd1
root@test:~# mdadm --create /dev/md1 --chunk=64 --level=5 --raid-devices=3 --layout=right-symmetric /dev/sdc1 /dev/sdd1 /dev/sdb1
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdb1[3] sdd1[1] sdc1[0]
      204672 blocks super 1.2 level 5, 64k chunk, algorithm 3 [3/3] [UUU]

unused devices: <none>
root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
fsck.ext4: Superblock invalid, trying backup blocks...
fsck.ext4: Bad magic number in super-block while trying to open /dev/md1

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
root@test:~# mdadm --stop /dev/md1
mdadm: stopped /dev/md1

The verdict?

root@test:~# mdadm --misc --zero-superblock /dev/sdb1 /dev/sdc1 /dev/sdd1
root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@test:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdd1[3] sdc1[1] sdb1[0]
      203776 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]

unused devices: <none>

root@test:~# fsck.ext4 /dev/md1
e2fsck 1.41.14 (22-Dec-2010)
/dev/md1: clean, 13/51000 files, 17085/203776 blocks
root@test:~# mount /dev/md1 /mnt/raid5/
root@test:~# cat /mnt/raid5/datafile
data
root@test:~# sha1sum /mnt/raid5/randomdata
847685a5d42524e5b1d5484452a649e854b59064  /mnt/raid5/randomdata

Wow.

So, it looks like none of these actions corrupted data in any way. I was quite surprised by this result, frankly; I expected moderate odds of data loss on the chunk size change, and some definite loss on the layout change. I learned something today.


So .. How do I get my data??

As much information as you have about the old system would be extremely helpful to you. If you know the filesystem type, if you have any old copies of your /proc/mdstat with information on drive order, algorithm, chunk size, and metadata version. Do you have mdadm's email alerts set up? If so, find an old one; if not, check /var/spool/mail/root. Check your ~/.bash_history to see if your original build is in there.

So, the list of things that you should do:

  1. Back up the disks with dd before doing anything!!
  2. Try to fsck the current, active md - you may have just happened to build in the same order as before. If you know the filesystem type, that's helpful; use that specific fsck tool. If any of the tools offer to fix anything, don't let them unless you're sure that they've actually found the valid filesystem! If an fsck offers to fix something for you, don't hesitate to leave a comment to ask whether it's actually helping or just about to nuke data.
  3. Try building the array with different parameters. If you have an old /proc/mdstat, then you can just mimic what it shows; if not, then you're kinda in the dark - trying all of the different drive orders is reasonable, but checking every possible chunk size with every possible order is futile. For each, fsck it to see if you get anything promising.

So, that's that. Sorry for the novel, feel free to leave a comment if you have any questions, and good luck!

footnote: under 22 thousand characters; 8k+ shy of the length limit

Related Topic