I have had two disk failures on my RAID6 array. I have added two new disks and did the following:
- I ran mdadm /dev/md1 –remove on the two disks
- I set up my RAID on the 1st partition of each disk (for alignment reasons). As the replacement disks are aligned the same, I did a dd if=/dev/sdg (working disk) of=/dev/sde (new disk) bs=512 count=1 to copy over the partition layout. I am not sure if this is the right thing to do, as I probably copied mdadm superblock data.
- I then ran mdadm /dev/md1 –add and the two disks.
I now have this when I run mdadm –detail /dev/md1:
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
6 8 65 2 spare rebuilding /dev/sde1
3 0 0 3 removed
4 8 97 4 active sync /dev/sdg1
5 8 113 5 active sync /dev/sdh1
7 8 81 - spare /dev/sdf1
/proc/mdstat shows one disk as rebuilding, but not the other. I don't think this is right, as I think one disk is 'removed' and hasn't been replaced properly. The drive letters are exactly the same as the last two disks. Here is mdstat.
root@precise:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sdc1[0] sdd1[1]
1953379136 blocks super 1.2 [2/2] [UU]
md1 : active raid6 sdf1[7](S) sde1[6] sdb1[1] sdh1[5] sda1[0] sdg1[4]
11720521728 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/4] [UU__UU]
[>....................] recovery = 2.2% (65163484/2930130432) finish=361.0min speed=132257K/sec
unused devices: <none>`
I'd like to know (if this seems right), and what I need to do to fix the Number 3 entry and get /dev/sdf1 to take its place? I then assume that I will have a proper array again. What I find odd is adding /dev/sde1 seems to have allowed started a sync, but /dev/sdf1 has not taken the place of Number 3 Major 0 (RaidDevice 3)
All help appreciated
Cheers
Best Answer
First, let me reassure you: if your mdadm drives are partition-based (eg: sda1, etc), the first "dd" was OK and it did not cause any mdadm metadata copy (the metadata are inside the partition itself, not inside the MBR).
What you are observing is normal MDRAID behavior. You re-added the new drives using two separate mdadm -a commands, right? In this case, mdadm will first resync the first drive (putting the second one to "spare" mode) and then it will transition the second drive to "rebuilding spare" status. If you re-add the two drives with a single command (eg: mdadm /dev/mdX -a /dev/sdX1 /dev/sdY1) the rebuild will run concurrently.
Let have a look at my (testing) failed RAID6 arraid:
Re-adding the drives using two separate command (mdadm /dev/md200 -a /dev/loop6; sleep 1; mdadm /dev/md200 -a /dev/loop7) caused the following detailed report:
After some time:
Adding the two drives in a single command (mdadm /dev/md200 -a /dev/loop6 /dev/loop7) leads to that report:
So, in the end: let mdadm do its magic, then check if all drives are marked as "active".