LVM – Recovering a Nested PV

data-recoverylvmproxmox

On my Proxmox 6.4 host, I had an LVM Thin pool that was 250GB large. I created an Ubuntu VM (which used LVM for the root partition as well) on it, but accidentally oversubscribed it, so the PV inside the VM was set to 500GB.

Everything ran great for a while until I went over the hidden 250GB limit and the VM crashed with an I/O error and refuses to boot. So now I'm trying to recover the disk. The partition table of the disk appears to be intact:

$ fdisk -l /dev/vm-disks/vm-101-disk-0
Disk /dev/vm-disks/vm-101-disk-0: 500 GiB, 536870912000 bytes, 1048576000 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 65536 bytes / 65536 bytes
Disklabel type: gpt
Disk identifier: 30874BBC-0B29-4083-B5BF-E973C665D87F

Device                          Start        End    Sectors  Size Type
/dev/vm-disks/vm-101-disk-0p1    2048       4095       2048    1M BIOS boot
/dev/vm-disks/vm-101-disk-0p2    4096    2101247    2097152    1G Linux filesystem
/dev/vm-disks/vm-101-disk-0p3 2101248 1048573951 1046472704  499G Linux filesystem

I've run

$ kpartx -a /dev/vm-disks/vm-101-disk-0

to create /dev/mapper entries for the 3 partitions inside vm-101-disk-0, and that works.
If I run:

$ file -sL /dev/mapper/vm--disks-vm--101--disk--0p3
/dev/mapper/vm--disks-vm--101--disk--0p3: LVM2 PV (Linux Logical Volume Manager), UUID: fdOzWR-sPcy-hyYo-Lj2H-YEnZ-wK3c-J6biES, size: 535794024448

Then I can see that PV inside the 3rd partition of the disk. But how can I mount this somewhere in the host to start recovering data? Obviously pvscan from the host system doesn't see it since it's inside another LV. Do I have any options at all here for recovery, or did the fact that the VM thought it had 500GB when it actually didn't mean that I've damaged this beyond repair?

Best Answer

The VM just got write I/O errors when the space in the thin pool was exhausted. For a VM this looks like hard disk unexpectedly denied all writes. So if the VM was bare hardware, the first action was to be find the new hard disk and clone this bad one into it. After the HW is fixed, you may fix logical structures.

In case of virtual machines, you don't have any broken hardware, you can "fix" the "hard disk" by restoring thin volume operation. Just enlarge the thin pool, use lvextend on the thin pool LV to add some space.

And, when it is done, boot the VM from some recovery (virtual) media and do standard file system recovery. Remember, there couldn't be much difficulty; modern filesystems generally designed to withstand this kind of failure.


Monitor the thin LVM. While data space exhaustion is not such a big problem, the metadata exhaustion might have much bigger impact. Don't allow this to happen.