Edit:
The scenario in this wiki, where 1 drive has a slightly lower and another a significantly lower event count than the rest of the array, suggests assembling with --force
while leaving out the oldest drive, and adding it (or a new one in case the disk is actually bad) back after the array assembled in a degraded state.
Would it make sense to do this in my situation, or is it more advisable to attempt a --force
assemble with all 4 drives, given that the two out of date ones have the same event count?
Given my limited RAID knowledge I figured I'd ask about my specific situation before trying anything. Losing the data on these 4 drives wouldn't be the end of the world to me, but it'd still be nice to get it back.
I migrated a RAID5 array from an old machine to a new one without any problems at first. I used it for about 2 days until I noticed that 2 of the drives weren't listed in the BIOS boot screen. Since the array still assembled and worked fine after getting into linux I didn't think too much of it.
The next day the array stopped working, so I hooked up a PCI-e SATA card and replaced all my SATA cables. After that all 4 drives showed up in the BIOS boot screen so I'm assuming either my cables or SATA ports were causing the initial problem.
Now I'm left with a broken array though. mdadm --assemble
lists two drives as (possibly out of date)
, and mdadm --examine
shows 22717
events for the out of date drives and 23199
for the other two. This wiki entry suggests that an event count difference of <50
could be overcome by assembling with --force
, but my 4 drives are separated by 482
events.
Below is all the relevant raid info. I was aware of all 4 drives having corrupt primary GPT tables before the array broke down, but since everything was working fine at the time I hadn't gotten around to fixing that yet.
mdadm --assemble --scan --verbose
mdadm: /dev/sde is identified as a member of /dev/md/guyyst-server:0, slot 2.
mdadm: /dev/sdd is identified as a member of /dev/md/guyyst-server:0, slot 3.
mdadm: /dev/sdc is identified as a member of /dev/md/guyyst-server:0, slot 1.
mdadm: /dev/sdb is identified as a member of /dev/md/guyyst-server:0, slot 0.
mdadm: added /dev/sdb to /dev/md/guyyst-server:0 as 0 (possibly out of date)
mdadm: added /dev/sdc to /dev/md/guyyst-server:0 as 1 (possibly out of date)
mdadm: added /dev/sdd to /dev/md/guyyst-server:0 as 3
mdadm: added /dev/sde to /dev/md/guyyst-server:0 as 2
mdadm: /dev/md/guyyst-server:0 assembled from 2 drives - not enough to start the array.
mdadm --examine /dev/sd[bcde]
/dev/sdb:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 356cd1df:3a5c992d:c9899cbc:4c01e6d9
Name : guyyst-server:0
Creation Time : Wed Mar 27 23:49:58 2019
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 7813772976 (3725.90 GiB 4000.65 GB)
Array Size : 11720658432 (11177.69 GiB 12001.95 GB)
Used Dev Size : 7813772288 (3725.90 GiB 4000.65 GB)
Data Offset : 264192 sectors
Super Offset : 8 sectors
Unused Space : before=264112 sectors, after=688 sectors
State : clean
Device UUID : 7ea39918:2680d2f3:a6c3b0e6:0e815210
Internal Bitmap : 8 sectors from superblock
Update Time : Fri May 1 03:53:45 2020
Bad Block Log : 512 entries available at offset 24 sectors
Checksum : 76a81505 - correct
Events : 22717
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdc:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 356cd1df:3a5c992d:c9899cbc:4c01e6d9
Name : guyyst-server:0
Creation Time : Wed Mar 27 23:49:58 2019
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 7813772976 (3725.90 GiB 4000.65 GB)
Array Size : 11720658432 (11177.69 GiB 12001.95 GB)
Used Dev Size : 7813772288 (3725.90 GiB 4000.65 GB)
Data Offset : 264192 sectors
Super Offset : 8 sectors
Unused Space : before=264112 sectors, after=688 sectors
State : clean
Device UUID : 119ed456:cbb187fa:096d15e1:e544db2c
Internal Bitmap : 8 sectors from superblock
Update Time : Fri May 1 03:53:45 2020
Bad Block Log : 512 entries available at offset 24 sectors
Checksum : d285ae78 - correct
Events : 22717
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdd:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 356cd1df:3a5c992d:c9899cbc:4c01e6d9
Name : guyyst-server:0
Creation Time : Wed Mar 27 23:49:58 2019
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 7813772976 (3725.90 GiB 4000.65 GB)
Array Size : 11720658432 (11177.69 GiB 12001.95 GB)
Used Dev Size : 7813772288 (3725.90 GiB 4000.65 GB)
Data Offset : 264192 sectors
Super Offset : 8 sectors
Unused Space : before=264112 sectors, after=688 sectors
State : clean
Device UUID : 2670e048:4ebf581d:bf9ea089:0eae56c3
Internal Bitmap : 8 sectors from superblock
Update Time : Fri May 1 04:12:18 2020
Bad Block Log : 512 entries available at offset 24 sectors
Checksum : 26662f2e - correct
Events : 23199
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 3
Array State : A.AA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sde:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 356cd1df:3a5c992d:c9899cbc:4c01e6d9
Name : guyyst-server:0
Creation Time : Wed Mar 27 23:49:58 2019
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 7813772976 (3725.90 GiB 4000.65 GB)
Array Size : 11720658432 (11177.69 GiB 12001.95 GB)
Used Dev Size : 7813772288 (3725.90 GiB 4000.65 GB)
Data Offset : 264192 sectors
Super Offset : 8 sectors
Unused Space : before=264112 sectors, after=688 sectors
State : clean
Device UUID : 093856ae:bb19e552:102c9f77:86488154
Internal Bitmap : 8 sectors from superblock
Update Time : Fri May 1 04:12:18 2020
Bad Block Log : 512 entries available at offset 24 sectors
Checksum : 40917946 - correct
Events : 23199
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : A.AA ('A' == active, '.' == missing, 'R' == replacing)
mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Raid Level : raid0
Total Devices : 4
Persistence : Superblock is persistent
State : inactive
Working Devices : 4
Name : guyyst-server:0
UUID : 356cd1df:3a5c992d:c9899cbc:4c01e6d9
Events : 23199
Number Major Minor RaidDevice
- 8 64 - /dev/sde
- 8 32 - /dev/sdc
- 8 48 - /dev/sdd
- 8 16 - /dev/sdb
fdisk -l
The primary GPT table is corrupt, but the backup appears OK, so that will be used.
Disk /dev/sdb: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: WDC WD40EFRX-68N
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 79F4A900-C9B7-03A9-402A-7DDE6D72EA00
Device Start End Sectors Size Type
/dev/sdb1 2048 7814035455 7814033408 3.7T Microsoft basic data
The primary GPT table is corrupt, but the backup appears OK, so that will be used.
Disk /dev/sdc: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: WDC WD40EFRX-68N
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 43B95B20-C9B1-03A9-C856-EE506C72EA00
Device Start End Sectors Size Type
/dev/sdc1 2048 7814035455 7814033408 3.7T Microsoft basic data
The primary GPT table is corrupt, but the backup appears OK, so that will be used.
Disk /dev/sdd: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: WDC WD40EFRX-68N
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 1E276A80-99EA-03A7-A0DA-89877AE6E900
The primary GPT table is corrupt, but the backup appears OK, so that will be used.
Disk /dev/sde: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: WDC WD40EFRX-68N
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 11BD8020-C9B5-03A9-0860-6F446D72EA00
Device Start End Sectors Size Type
/dev/sde1 2048 7814035455 7814033408 3.7T Microsoft basic data
smartctl -a -d ata /dev/sd[bcde]
As pastebin since it exceeded the character limit: https://pastebin.com/vMVCX9EH
Best Answer
Generally speaking, you must expect data loss in this situation. Two out of your four disks were ejected out of the RAID at roughly the same point on time. When assembled back, you will have a corrupt file system.
If possible, I would only experiment futher after
dd
-ing all disks as a backup to start over.Using all 4 disks will allow you to identify which blocks differ (as there the checksum will not match), but it will not help you to compute a correct state. You could start
checkarray
after a forced re-assembly of all 4 and find the number of inconsistent blocks afterwards in/sys/block/mdX/md/mismatch_cnt
. This may or may not be interesting to estimate the "degree of brokenness" of the file system.Re-building the array can only use information from three disks to re-calculate parity. As the ejected disks have the same event count, using either of the ejected disks should result in the same (partially wrong) partity information to be re-computed.