Linux – mdadm/LVM/RAID issue

linuxlvmmdadmraidUbuntu

Alright posting here in the hopes someone can help..

So I set up a 4×1.5TB RAID6 array. Got it built and everything. It worked fine. I copied over two more drives worth of Data, and then grew those two drives into the array. There was a couple hiccups but otherwise it worked fine and took forever.

I copied over the last drive worth of data today and followed the same steps to grow it into the array, and then of course did the "watch etc etc" command to keep a steady eye on the reshaping, because why not.

…. and it was going along fine, got to maybe 10-11% and I noticed it.. wasnt updating anymore.

This happened right as I was watching something from the array filesystem over the network.. then it suddenly locked.

I thought maybe something went hinky with the watch command, so I killed it, and just did cat /proc/mdstat from the cmdline.

and got nothing. No output or anything, the cursor goes down a line but nothing else happens.

If I try a mdadm --detail /dev/md0, same thing. Nothing happens.

if i try to ls inside the mounted directory for the array, I get the root listing, but when I try to dig any deeper into folders, ls does the same thing as the first two commands and locks up, except I can't even ctrl-c out of it.

What I think is causing the issue, is I show like 7 smbd processes that are status D, which must be left over from when I was watching the video file and it locked.. of course I can't kill them, the system wont let me.

I am of course now incredibly paranoid that something went TOTALLY pear shaped, and I"m going to lose everything. I don't want to reboot because I have no idea if that will break anything.

edit: ok.. so:

I rebooted (I actually had to power off the machine. It would not die.)

I found this: http://www.linuxquestions.org/questions/linux-server-73/raid-5-mdadm-grow-interrupted-what-to-do-next-602671/

which I tried, with no luck. I get

mdadm: superblock on /dev/sdb1 doesn't match others - assembly aborted

when I do mdadm -E on any of the drives, I get:

/dev/sdb1: Magic : a92b4efc Version : 00.91.00 UUID : 
3c455e64:c4d0d230:c109596b:d7e29b7e Creation Time : Mon Nov 23 18:59:31 2009 
Raid Level : raid6 Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB) Array 
Size : 7325679680 (6986.31 GiB 7501.50 GB) Raid Devices : 7 Total Devices : 7 
Preferred Minor : 0

Reshape pos'n : 967965760 (923.12 GiB 991.20 GB) Delta Devices : 1 (6->7)

Update Time : Tue Dec  1 20:48:48 2009
      State : active

Active Devices : 6 Working Devices : 6 Failed Devices : 1 Spare Devices : 0 Checksum : a4096474 - correct Events : 40943

 Chunk Size : 64K

  Number   Major   Minor   RaidDevice State

this 5 8 17 5 active sync /dev/sdb1

0 0 8 49 0 active sync /dev/sdd1 1 1 8 65 1 active sync /dev/sde1 2 2 8 81 2 active sync /dev/sdf1 3 3 8 97 3 active sync /dev/sdg1 4 4 8 1 4 active sync /dev/sda1 5 5 8 17 5 active sync /dev/sdb1 6 6 0 0 6 faulty removed

That "faulty removed" is /dev/sdc1, which was the drive I just added and was trying to grow the array by.

Then I tried what I found here: http://ubuntuforums.org/showpost.php?p=2526794&postcount=2

sudo mdadm --create /dev/md0 --assume-clean --level=6 --raid-devices=7 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sda1 /dev/sdb1 missing

this returns:

mdadm: /dev/sdd1 appears to be part of a raid array: level=raid6 devices=7 ctime=Mon Nov 23 18:59:31 2009 mdadm: /dev/sde1 appears to be part of a raid array: level=raid6 devices=7 ctime=Mon Nov 23 18:59:31 2009 mdadm: /dev/sdf1 appears to be part of a raid array: level=raid6 devices=7 ctime=Mon Nov 23 18:59:31 2009 mdadm: /dev/sdg1 appears to be part of a raid array: level=raid6 devices=7 ctime=Mon Nov 23 18:59:31 2009 mdadm: /dev/sda1 appears to contain an ext2fs file system size=1565523968K mtime=Mon Nov 30 23:59:03 2009 mdadm: /dev/sda1 appears to be part of a raid array: level=raid6 devices=7 ctime=Mon Nov 23 18:59:31 2009 mdadm: /dev/sdb1 appears to be part of a raid array: level=raid6 devices=7 ctime=Mon Nov 23 18:59:31 2009 Continue creating array? y mdadm: array /dev/md0 started.

cat /proc/mdstat

gives:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid6 sdb1[5] sda1[4] sdg1[3] sdf1[2] sde1[1] sdd1[0] 7325679680 blocks level 6, 64k chunk, algorithm 2 [7/6] [UUUUUU_]

and

sudo lvdisplay raid

gives:

--- Logical volume --- LV Name /dev/raid/raid VG Name raid LV UUID R1fKIQ-dS9P-iOuN-BVGW-98Wo-Cc2w-rJ5wfp LV Write Access read/write LV Status available # open 0 LV Size 5.46 TB Current LE 5589 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 252:0

Hooray! everythings wonderful! NOT.

sudo mount /dev/raid/raid /blah

gives me:

mount: wrong fs type, bad option, bad superblock on /dev/mapper/raid-raid, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so

augh if I had hair i'd be pulling it out. Help. I'd just assume not redownload 6TB of stuff 😛

Best Answer

Step one: don't do anything. Step two: Email the linux-raid mailing list.

If the reshape has gone really pear shaped, NeilB (software RAID maintainer) will try to help as much as he can.