How to repair a broken-resized qcow2 disk image for libvirt/kvm

corruptionkvm-virtualizationlibvirtqcow2qemu

Today I wanted to increase the size of a VM, so I did what I always do (have done it before):

qemu-img resize diskimage.qcow2 +22GB

Then the file broke and the VM does not start any more. I tried booting the VM from CD to adjust partitioning but the system will not read the disk any more:

qemu-img check -r all diskimage.qcow2
tcmalloc: large alloc 389841715200 bytes == (nil) @  0x7fdb4ea66bf3 0x7fdb4ea88488 0x7fdb4e5674a6 0x7fdb50236a37 0x7fdb50236bc8 0x7fdb50237011 0x7fdb5023941e 0x7fdb5023d891 0x7fdb5027848b 0x7fdb5027c196 0x7fdb491efb35 0x7fdb5021ee4d (nil)
No errors were found on the image.

No errors? Sounds good, but virsh start vm does not work and the logs say:

2017-05-21T10:02:30.755824Z qemu-system-x86_64: -drive file=/.../diskimage.qcow2,format=qcow2,if=none,id=drive-virtio-disk0: could not open disk image /.../diskimage.qcow2: qcow2: Image is corrupt; cannot be opened read/write

I tried converting to raw but the conversion fails (exit 1):

qemu-img convert -f qcow2 diskimage.qcow2 -O raw diskimage.raw
qcow2: Image is corrupt: L2 table offset 0x2d623039326500 unaligned (L1 index: 0); further non-fatal corruption events will be suppressed
qemu-img: error while reading block status of sector 0: Input/output error

The process creates a 354334801920 byte file (much larger than it should have been with +22GB) but it is apparently unusable – when I try to convert it back to qcow2 I get a 200kB file.

Is there a way to extract data from the qcow2 file, or mount it read-write somehow even if there is corruption? I do not have the nbd kernel module on the machine.

Best Answer

Did you run the "qemu-img resize diskimage.qcow2 +22GB" while the QEMU process was still running with the same disk open ? If so, that would certainly explain the data corruption, as you would potentially have 2 processes writing to the qcow2 file at the same time and if both writes required qcow2 metadata allocations that could corrupt internal file data structures.

The "qemu-img check" result looks very bogus. In particular tcmalloc is complaining that it can't allocate a 360 GB block of memory. It looks like qemu-img is misinterpreting this error as success, printing the bogus message "No errors found". This is a bug you should certainly report to QEMU.

The 'convert' error just looks to be a followup to the same error that tcmalloc hit.

Unfortunately I don't have any suggestions to fix the problem - I was just going to recommend "check -r" to try to fix it. Your only likely remaining chance is to mail qemu-devel and see if any of the qcow2 maintainers have suggestions.