So, according to the superblock on /dev/sdk
, there was a /dev/md5
and sdj was in there with it, but according to /dev/sdj
, there is no raid superblock. What I fear is that /dev/sdj
was added to the md5 array then /dev/sdj
was added to the volume group (instead of /dev/md5
) and at some point lvm got around to overwriting the blocks that identified it as a member of the RAID device. I fear this because I honestly can't think of any other way /dev/sdj would end up being named specifically in the LVM group and not have a raid superblock anymore.
Worst case nightmare scenario: both /dev/sdj and /dev/md5 were added to the LVM. Is your XFS partition bigger than the 5.5 TB in the LVM now? If this is the case, you should be able to get md5 back using mdadm --assemble
but you need to be sure it's started in degraded mode without sdj, so it won't overwrite the data there.
Assuming that your /dev/md5 was never used in the LVM:
(...had you ever looked at pvscan
before today?)
If you don't have backups, now is the time to start. If you do, now is the time to test them (and if they don't work, you don't have backups, see step 1).
There isn't an easy way out of this mess, and I haven't got a clue what might happen if you reboot at this point (can you unmount the filesystem?). If I was certain that what really happened was that sdj had been added as both a raid drive and as an lvm physical volume (since the lvm wasn't using the raid driver to write to sdj, none of the data written to sdj would be on sdk... perhaps this can be verified by comparing hex dumps of various chunks of /dev/sdj and /dev/sdk and someone smarter than me who knows good places to look for things that would say "this is XFS" versus "this is random gibberish or a blank drive"?), then what I'd do is this:
Start by trying to get SMART data on sdk to see if it is trustworthy or on the way out.
If sdk is good, then I would thank my lucky stars for the former admin having wasted 63GB of /dev/sdj
.
fdisk /dev/sdk
(doublecheck EVERYTHING before hitting return). Have fdisk create a partition table and an md partition (mdadm manpage says use 0xDA, but every walkthrough and my own experience says 0xFD for raid autodetect), then
mdadm --create /dev/md6 --level=1 --raid-devices=2 missing /dev/sdk1
(doublecheck EVERYTHING before hitting return). This will create a degraded raid1 array named md6 using the partition we made on sdk. These next steps are why that wasted space is important: we've lost some space due to the md superblock and due to the partition table, so our /dev/md6 is slightly smaller than /dev/sdj was. We're going to add /dev/md6 to the dedvol
volume group and instruct LVM to move the 1.82TB of logical volume from /dev/sdj to /dev/md6. LVM can handle the filesystem being active while it does this.
pvcreate /dev/md6
vgextend dedvol /dev/md6
pvmove -v /dev/sdj
(doublecheck... you get the picture. I'd also run pvscan
after pvcreate
and again after vgextend
to make sure things look right). This will begin the process of moving all the data allocated to /dev/sdj
to /dev/md6
(specifically, the command moves everything off sdj, and md6 is the only place for it to go). Several hours later either this will complete or the system will lock up trying to read from sdj. If the system crashes, you can reboot and try pvmove
without a device name to restart at the last checkpoint or just give up and reinstall from backups.
If we succeed, we remove /dev/sdj from the volume group, then remove it as a physical volume:
vgreduce dedvol /dev/sdj
pvremove /dev/sdj
Now, for the corruption-checking part. The tool for checking and fixing xfs is xfs_repair
(fsck
will run on an xfs filesystem but it does nothing at all). The bad news? It uses gigs of RAM per terabyte of filesystem, so hopefully you have a 64 bit server with a 64 bit kernel and the 64 bit xfs_repair binary (which might be named xfs_repair64) and at least 10GB of RAM+Swap (you should be able to use some of that leftover empty space in dedvol to create a swap volume, then mkswap
that volume, then swapon
that volume). The filesystem must be unmounted before running xfs_repair on it. Also, xfs_repair can detect and (attempt to) fix damage to the filesystem itself, but it may not detect damage to the data (for instance, something overwriting part of a directory inode versus something overwritten in the middle of a text file).
Finally, we need to buy a new /dev/sdj
, install it, and add it to that degraded /dev/md6
, keeping in mind that if we reboot the computer without sdj in it, it is possible sdk will move down to sdj and the new drive will be sdk instead (probably not, but best to be sure):
fdisk /dev/sdj
check to make sure that it isn't the drive we partitioned and set up already, then create a partition for md on it
mdadm /dev/md6 -a /dev/sdj1
(It is entirely possible that the errors could be due to raid and lvm duking it out over the content of sdj, rather than the drive actually failing (usually failing drives generate a lot of gibberish from the driver in dmesg
rather than just Input/Output errors) but I'm not sure I'd risk it.)
Best Answer
Please change the partition ID... You should not have created an "extended" partition, but rather left it at the default Linux (83) ID.
Your new device/partition should look similar to this: