Lvm – What happens when you temporary lose a disk in LVM volume group

lvm

The Dell MD storage allows a maximum of 64TB virtual disk. We have some imaging staff that require more than a 100TB of storage preferably in one location.

I'll create 2 x 64TB vdisk on the same dell storage. It is then SAS connected to the server and I'll be using LVM to create a volume group consisting of 2 virtual disk to form a 128TB logical volume.

Say few years down the line. They run out of space and we need to SAS connect a separate dell storage block to the server. Now the server has two separate blocks of storage. I then add more disks from the second block of storage to volume group.

Now the volume group consist of two virtual disk from the first block and two virtual disk on the second block of storage. What would happen when one of the blocks goes offline? Would I then immediately corrupt my volume?

Best Answer

The volume (LV) will go into partial mode (see p flag in lvs output) but you may be still able to read and write to the disk unless the missing parts are accessed which will result in I/O errors (I am not saying it is a good idea to continue using filesystem in such state.)

Some applications or filesystems may not handle I/O failures well and you may loose some writes which have not made it to the disk but with journalling FS (like ext4) it is unlikely you would get FS corrupted beyond repair.

You will not be able to activate or modify partial logical volume (e.g. resize it) and it is fine. In general you do not want activating it.

The worst thing you could do at this moment is to run fsck. Do not. Not until the volume is back. Otherwise you may as well say good bye to a large part of your data.

If other LVs were added/removed while the disk was missing, you will need to run vgextend --restoremissing VG PV which will make the Volume group whole again (see m flag in pvs output.)

The mounted FS may not fully recover and you may need to umount first, (optionally running fsck now) and mount it back.

You may also want to consider setting up multipath (even with one path), which is able to hide short term outages from the system, as I/O will be queued.