Linux RAID – Device Missing from mdadm RAID but Exists

linuxmdadmraid

I have raid 5 over 4 devices: sda10 sdh10 sdi10 sdf10. All these devices are available, and smart doesn't show errors (sdf was, but i replaced it with new disk, resynced, but then current situation showed).

In /proc/mdstat the array looks:

md19 : inactive sdh10[2](S) sdi10[4](S) sdf10[5](S)
      1171482624 blocks super 1.2

As you can see sda10 is missing. But, according to –query, it should be ok:

=# mdadm --query --examine /dev/sda10
/dev/sda10:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 9216999b:8dab944b:564530eb:4a61e078
           Name : blob:19  (local to host blob)
  Creation Time : Sat Jan 21 21:05:44 2017
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 780988416 (372.40 GiB 399.87 GB)
     Array Size : 1171482624 (1117.21 GiB 1199.60 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : 8fdf8a46:4a84989c:e20fb280:c38053ea

Internal Bitmap : 8 sectors from superblock
    Update Time : Fri Mar 15 01:09:45 2019
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : b72fe223 - correct
         Events : 1848

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)

On another device in the array state looks different:

=# mdadm --query --examine /dev/sdh10
/dev/sdh10:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 9216999b:8dab944b:564530eb:4a61e078
           Name : blob:19  (local to host blob)
  Creation Time : Sat Jan 21 21:05:44 2017
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 780988416 (372.40 GiB 399.87 GB)
     Array Size : 1171482624 (1117.21 GiB 1199.60 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : a6b44214:e50eb531:dc76d0f6:867db6ec

Internal Bitmap : 8 sectors from superblock
    Update Time : Fri Mar 15 01:14:45 2019
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 9284b6cb - correct
         Events : 5956

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : ..AA ('A' == active, '.' == missing, 'R' == replacing)

I tried stopping the array, and assambling with –scan, or with manually providing device names, but to no avail.

I am sure, for 100%, that data on sda10, sdh10 and sdi10 is ok. sdf10 is new disk, so it can be resynced.

Is there any way I could repair this raid?

Best Answer

The strange thing is that your --examine output shows that the array consists of 4 devices, but /proc/mdstat shows member devices 2, 4 and 5 so now there are at least 6 devices (indexing starts with 0). So something went wrong. best to stop the current array and recreate it.

You should be able to stop and reassemble the MD device with:

mdadm --stop /dev/md19
mdadm --create --metadata=1.2 --level=4 -n 4 --chunk=512K --layout=left-symmetric /dev/md19 /dev/sda10 missing /dev/sdh10 /dev/sdi10

Use --examine on the other devices to check the "active device" number. It's possible that the previous incorrect assembly by the system may have overwritten this, it would be helpful if you can ensure the correct order of the devices.

The above is assuming that /dev/sdh10 is "active device 2" and /dev/sdi10 is "active device 3", and that the old device 1 has failed. Don't specify /dev/sdf10 instead of the missing, this way no resyncing occurs at first.

Try running fsck on the resulting md device. If that gives lots of errors, abort the fsck, stop the md device, and try another order.

Once you have a correctly working array, you can then add the replacement drive:

mdadm --add /dev/md19 /dev/sdf10

All of the above is from personal experience, from when the wrong drive was removed (not the faulty drive but one of the remaining working drives). Using the explicit assembly as above I recovered the array and the filesystem. It's entirely possible that it will not work for you, perhaps because the array is already damaged. To sum it up, proceed at your own risk.