Linux – Surprising corruption and never-ending fsck after resizing a filesystem

debianext3fscklinuxlvm

System in question has Debian Lenny installed, running a 2.6.27.38 kernel. System has 16Gb memory, and 8x1Tb drives running behind a 3Ware RAID card.

The storage is managed via LVM, and is exclusively comprised of ext3 filesystems.

Short version:

  • Running a KVM guest which had 1.7Tb storage allocated to it.
  • The guest was reaching a full-disk.
  • So we decided to resize the disk that it was running upon

We're pretty familiar with LVM, and KVM, so we figured this would be a painless operation:

  • Stop the KVM guest.
  • Extend the size of the LVM partition: "lvextend -L+500Gb …"
  • Check the filesystem : "e2fsck -f /dev/mapper/…"
  • Resize the filesystem: "resize2fs /dev/mapper/"
  • Start the guest.

The guest booted successfully, and running "df" showed the extra space, however a short time later the system decided to remount the filesystem read-only, without any explicit indication of error.

Being paranoid we shut the guest down and ran the filesystem check again, given the new size of the filesystem we expected this to take a while, however it has now been running for > 24 hours and there is no indication of how long it will take.

Using strace I can see the fsck is "doing stuff", similarly running "vmstat 1" I can see that there are a lot of block input/output operations occurring.

So now my question is threefold:

  • Has anybody come across a similar situation? Generally we've done
    this kind of resize in the past with zero issues.

  • What is the most likely cause? (3Ware card shows the RAID arrays
    of the backing stores as being A-OK, the host system hasn't rebooted
    and nothing in dmesg looks important/unusual)

  • Ignoring btrfs + ext3 (not mature enough to trust) should we make our larger partitions in a different filesystem in the future to avoid either this corruption (whatever the cause) or reduce the fsck time? xfs seems like the obvious candidate?