Linux – Qemu TRIM and discard on a physical SSD device

kvm-virtualizationlinuxqemussdvirtual-machines

I am running Windows 7 in a Qemu/KVM with a passed through GPU which I use for work-related stuff. I recently got fed up by it's unprecedented slowness due to it running off a mechanical drive, so I added an SSD to my box to 'give' to my Windows-KVM. I'm using the following qemu command-line options for the 'passed through' disk:

-drive file=/dev/disk/by-id/wwn-0x5002538d4002d61f,if=none,id=drive-scsi0-0-0-0,format=raw,discard=on" \
-device virtio-scsi-pci,id=scsi0" \
-device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0-0"

I was hoping that the guest-OS TRIM commands would actually be passed-through to the physical drive on the host, but this seems to not be the case.

Does "discard=on" only affect drives backed by image-files, and not by actual physical SSD's? If so, how would I be able to accomplish TRIM commands to the device on the guest os to be passed to the physical device on the host? Is using a image file on the host the only solution? I'm hoping for something better, since having a file-system on that disk would only create overhead, and I don't need it for anything else.

Best Answer

Research

Qemu treats discard=unmap and discard=on the same as you can see in its source code:

block.c (L 1102): if (!strcmp(mode, "on") || !strcmp(mode, "unmap"))

It also seems to support multiple of the Linux ioctls as described here for writing or discarding zeros at the block level:

block/file-posix.c (L 744): if (ioctl(s->fd, BLKDISCARDZEROES, &arg) == 0 && arg)

block/file-posix.c (L L1621): if (ioctl(aiocb->aio_fildes, BLKZEROOUT, range) == 0)

block/file-posix.c (L 1788): if (ioctl(aiocb->aio_fildes, BLKDISCARD, range) == 0)

So based on this, block passthrough with SCSI emulation using options discard=unmap,detect-zeroes=unmap, unless you are using an old Qemu machine type, or a buggy Qemu version, should both work.

Example

Found an excellent presentation here.

Lessons learned from the presentation:

  1. You must be running Qemu/KVM as root or a user with CAP_SYS_RAWIO permission for discard to not be ignored by Linux.
  2. If your passthrough device is truly a SCSI disk, it should pay attention to the real SCSI UNMAP and WRITE SAME commands, and you can use scsi-block to passthrough.
  3. If not, you will have to emulate a SCSI disk with scsi-hd, which will send the discard commands through Qemu to the Linux block layer

For me, although using scsi-block to passthrough allowed access to stats and SMART info for the real device, and regular IO worked fine, the discard command was not supported.

Since my backing device is really SATA, so IDE, not a SCSI LUN, I am guessing that is the reason for no discard support this way.

Switching from scsi-block to scsi-hd, you will lose stats and SMART info, but gain discard.. so a trade off.

Personally, I did not experience any noticeable performance drop going from 'true passthrough' to 'emulated with passthrough' for my needs.

Here is an example of Virtio SCSI with emulated SCSI and a backing block device:

    -device virtio-scsi-pci,id=scsi \
    -blockdev driver=raw,node-name=disk.0,cache.direct=on,discard=unmap,file.driver=host_device,file.aio=native,file.filename=/dev/disk/by-id/ata-Samsung_SSD_840_PRO_Series_S12PNEAD233247L \
    -device scsi-hd,drive=disk.0,bus=scsi.0

The one part you will not find in Qemu documentation is the file.driver=host_device section.. it is needed for scsi-block to work, and seems not to hurt scsi-hd either, when we are using a real block device, not a file on the host filesystem.

Test

The blktrace tool I used to test Linux block level function calls is documented here.

You can run the blktrace and blkparse programs together to intercept discard calls:

blktrace -a discard -d /dev/disk/by-id/ata-Samsung_SSD_840_PRO_Series_S12PNEAD233247L -o - | blkparse -i -

Now when you run defrag /L c: or fstrim -v / in your VM you will see a lot of discards being printed on the host.. example snippet from output:

    8,0    1      493     0.641661863  3118  Q  DS 45458024 + 728 [qemu-system-x86]
    8,0    1      494     0.641664662  3118  G  DS 45458024 + 728 [qemu-system-x86]
    8,0    1      495     0.641665920  3118  I  DS 45458024 + 728 [qemu-system-x86]
    8,0    1      496     0.641669312  3118  D  DS 45458024 + 728 [qemu-system-x86]

So that is proof enough for me that discard is working.