Linux RAID – Device Missing from mdadm RAID but Exists

linuxmdadmraid

I have raid 5 over 4 devices: sda10 sdh10 sdi10 sdf10. All these devices are available, and smart doesn't show errors (sdf was, but i replaced it with new disk, resynced, but then current situation showed).

In /proc/mdstat the array looks:

md19 : inactive sdh10[2](S) sdi10[4](S) sdf10[5](S)
      1171482624 blocks super 1.2

As you can see sda10 is missing. But, according to –query, it should be ok:

=# mdadm --query --examine /dev/sda10
/dev/sda10:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 9216999b:8dab944b:564530eb:4a61e078
           Name : blob:19  (local to host blob)
  Creation Time : Sat Jan 21 21:05:44 2017
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 780988416 (372.40 GiB 399.87 GB)
     Array Size : 1171482624 (1117.21 GiB 1199.60 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : 8fdf8a46:4a84989c:e20fb280:c38053ea

Internal Bitmap : 8 sectors from superblock
    Update Time : Fri Mar 15 01:09:45 2019
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : b72fe223 - correct
         Events : 1848

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)

On another device in the array state looks different:

=# mdadm --query --examine /dev/sdh10
/dev/sdh10:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 9216999b:8dab944b:564530eb:4a61e078
           Name : blob:19  (local to host blob)
  Creation Time : Sat Jan 21 21:05:44 2017
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 780988416 (372.40 GiB 399.87 GB)
     Array Size : 1171482624 (1117.21 GiB 1199.60 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : a6b44214:e50eb531:dc76d0f6:867db6ec

Internal Bitmap : 8 sectors from superblock
    Update Time : Fri Mar 15 01:14:45 2019
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 9284b6cb - correct
         Events : 5956

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : ..AA ('A' == active, '.' == missing, 'R' == replacing)

I tried stopping the array, and assambling with –scan, or with manually providing device names, but to no avail.

I am sure, for 100%, that data on sda10, sdh10 and sdi10 is ok. sdf10 is new disk, so it can be resynced.

Is there any way I could repair this raid?

Best Answer

The strange thing is that your --examine output shows that the array consists of 4 devices, but /proc/mdstat shows member devices 2, 4 and 5 so now there are at least 6 devices (indexing starts with 0). So something went wrong. best to stop the current array and recreate it.

You should be able to stop and reassemble the MD device with:

mdadm --stop /dev/md19
mdadm --create --metadata=1.2 --level=4 -n 4 --chunk=512K --layout=left-symmetric /dev/md19 /dev/sda10 missing /dev/sdh10 /dev/sdi10

Use --examine on the other devices to check the "active device" number. It's possible that the previous incorrect assembly by the system may have overwritten this, it would be helpful if you can ensure the correct order of the devices.

The above is assuming that /dev/sdh10 is "active device 2" and /dev/sdi10 is "active device 3", and that the old device 1 has failed. Don't specify /dev/sdf10 instead of the missing, this way no resyncing occurs at first.

Try running fsck on the resulting md device. If that gives lots of errors, abort the fsck, stop the md device, and try another order.

Once you have a correctly working array, you can then add the replacement drive:

mdadm --add /dev/md19 /dev/sdf10

All of the above is from personal experience, from when the wrong drive was removed (not the faulty drive but one of the remaining working drives). Using the explicit assembly as above I recovered the array and the filesystem. It's entirely possible that it will not work for you, perhaps because the array is already damaged. To sum it up, proceed at your own risk.

Related Solutions

Linux – mdadm raid5 failure. set wrong drive to faulty by accident

i tried to re-create the raid with --assume-clean but this did not work.

This is what you should do. What do you mean by "did not work"? What's the message? What happened? Did you called mdadm with ALL of the original RAID array partitions?

what could i do to recover my data?

Restore from backup. If you have no backup, that's a well-deserved lesson (this is harsh and not intended to be a joke at all).

edit : given that this is an encrypted volume, you have absolutely zero chance to restore any data if you can't get the RAID correctly working. Can you post just the /proc/mdstat content? I don't understand what's the current state (your message states 2 failed drives, but only 1 failed drive is represented).

Linux – mdadm reassemble from spare disk crashed during resync

Probably the metadata knows several steps of "Does this disk belong to an array?". I am not familiar with those details; I just assume that it is similar to the dirty flag of a file system. When you take a new disk into an array then it is probably marked as spare until it is completely in sync. After that the metadata is probably changed to "is a full member of the array".

The best solution would indeed be to create a new array and restore from backup.

If you do want to give the disk a chance (knowing that the "truncation" will get the file system into really bad mood) then you should create a new array from this disk:

mdadm --create /dev/md2 --metadata=0.90 --raid-devices=2 --level=raid1 /dev/sda3 missing

Best Answer

Related Solutions

Linux – mdadm raid5 failure. set wrong drive to faulty by accident

Linux – mdadm reassemble from spare disk crashed during resync

Related Topic