I have raid 5 over 4 devices: sda10 sdh10 sdi10 sdf10. All these devices are available, and smart doesn't show errors (sdf was, but i replaced it with new disk, resynced, but then current situation showed).
In /proc/mdstat the array looks:
md19 : inactive sdh10[2](S) sdi10[4](S) sdf10[5](S)
1171482624 blocks super 1.2
As you can see sda10 is missing. But, according to –query, it should be ok:
=# mdadm --query --examine /dev/sda10
/dev/sda10:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 9216999b:8dab944b:564530eb:4a61e078
Name : blob:19 (local to host blob)
Creation Time : Sat Jan 21 21:05:44 2017
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 780988416 (372.40 GiB 399.87 GB)
Array Size : 1171482624 (1117.21 GiB 1199.60 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=0 sectors
State : clean
Device UUID : 8fdf8a46:4a84989c:e20fb280:c38053ea
Internal Bitmap : 8 sectors from superblock
Update Time : Fri Mar 15 01:09:45 2019
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : b72fe223 - correct
Events : 1848
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
On another device in the array state looks different:
=# mdadm --query --examine /dev/sdh10
/dev/sdh10:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 9216999b:8dab944b:564530eb:4a61e078
Name : blob:19 (local to host blob)
Creation Time : Sat Jan 21 21:05:44 2017
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 780988416 (372.40 GiB 399.87 GB)
Array Size : 1171482624 (1117.21 GiB 1199.60 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=0 sectors
State : clean
Device UUID : a6b44214:e50eb531:dc76d0f6:867db6ec
Internal Bitmap : 8 sectors from superblock
Update Time : Fri Mar 15 01:14:45 2019
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 9284b6cb - correct
Events : 5956
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : ..AA ('A' == active, '.' == missing, 'R' == replacing)
I tried stopping the array, and assambling with –scan, or with manually providing device names, but to no avail.
I am sure, for 100%, that data on sda10, sdh10 and sdi10 is ok. sdf10 is new disk, so it can be resynced.
Is there any way I could repair this raid?
Best Answer
The strange thing is that your --examine output shows that the array consists of 4 devices, but /proc/mdstat shows member devices 2, 4 and 5 so now there are at least 6 devices (indexing starts with 0). So something went wrong. best to stop the current array and recreate it.
You should be able to stop and reassemble the MD device with:
Use --examine on the other devices to check the "active device" number. It's possible that the previous incorrect assembly by the system may have overwritten this, it would be helpful if you can ensure the correct order of the devices.
The above is assuming that /dev/sdh10 is "active device 2" and /dev/sdi10 is "active device 3", and that the old device 1 has failed. Don't specify /dev/sdf10 instead of the
missing
, this way no resyncing occurs at first.Try running fsck on the resulting md device. If that gives lots of errors, abort the fsck, stop the md device, and try another order.
Once you have a correctly working array, you can then add the replacement drive:
All of the above is from personal experience, from when the wrong drive was removed (not the faulty drive but one of the remaining working drives). Using the explicit assembly as above I recovered the array and the filesystem. It's entirely possible that it will not work for you, perhaps because the array is already damaged. To sum it up, proceed at your own risk.