We are running linux (Centos 5.x) VM's on top of vmware vsphere 5.5. I am monitoring disk latency using iostat, specifically the await column, but I am noticing strange results with the device mapper/LVM vs the "physical" disks backing LVM.
Below is one set of output from iostat -x 5 on one of our fairly active VM's. The vm in question has two disks, sda with 1 partition being /boot, and sdb as our main disk with / on sdb2. While iostat shows ~20-40ms latencies for await for the sdb2 device (the only device/partition backing my volgroup / dm-0), iostat for dm-0 shows 100+ms await.
My question is: which statistic is "correct" here, as far as the real latencies the system is seeing? Is it seeing the ~20ms shown for the "physical" disk sdb, or is it really seeing 100+ms from dm-0, maybe due to some alignment / etc issues that arise when LVM gets involved? It is strange because sometimes the stats match up pretty well, others they are way off – for example, in the block of iostat output below, sdb2 shows 419 write IOPS, while dm-0 shows 39k write IOPS.
avg-cpu: %user %nice %system %iowait %steal %idle
5.78 0.00 8.42 39.07 0.00 46.73
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 15.67 39301.00 745.33 419.67 64146.67 317765.33 327.82 53.55 45.89 0.86 100.07
sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb2 15.67 39301.00 745.33 419.67 64146.67 317765.33 327.82 53.55 45.89 0.86 100.07
dm-0 0.00 0.00 761.33 39720.67 64120.00 317765.33 9.43 4933.92 121.88 0.02 100.07
Update:
I did some further reading, including the links in Gene's answer below. I know there are a lot of variables involved (virtualization, block file system, etc), but that portion of it seems sorted, per our vendors + VMware's best practices, and performance is actually very good. I really am just looking at this from the "within the VM" perspective here.
On that note, I suspect there is an issue with our partition + LVM alignment:
GNU Parted 1.8.1
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) unit s
(parted) print
Model: VMware Virtual disk (scsi)
Disk /dev/sdb: 2147483647s
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
1 63s 4192964s 4192902s primary linux-swap boot
2 4192965s 2097151999s 2092959035s primary lvm
~]# pvdisplay
--- Physical volume ---
PV Name /dev/sdb2
VG Name VolGroup00
PV Size 998.00 GB / not usable 477.50 KB
Allocatable yes (but full)
PE Size (KByte) 32768
Total PE 31936
Free PE 0
Allocated PE 31936
PV UUID tk873g-uSZA-JaWV-R8yD-swXg-lPvM-dgwPQv
Reading on alignment, it looks like your start sector should be divisible by 8, so you align on a 4kb boundary, with the standard 512b sector size. It looks like LVM is able to automatically align when you apply it to an entire disk, but since we're partiitoning out the disk first, and then making our i.e. /dev/sdb2 partition a physical device for LVM to use, I'm not sure its able to calculate an offset in that case. Per http://linux.die.net/man/5/lvm.conf, the parameter data_alignment_offset_detection: "If set to 1, and your kernel provides topology information in sysfs for the Physical Volume, the start of the aligned data area of the Physical Volume will be shifted by the alignment_offset exposed in sysfs." This is Centos5, and I don't see any of that info exposed in sysfs, only on our Centos6 and newer vm's, so it might not be able to align correctly on a physical volume.
I found this netapp whitepaper on VM partition alignment http://www.netapp.com/us/system/pdf-reader.aspx?m=tr-3747.pdf&cc=us
Specifically, there's good info in section 4.5, page 29, about properly partitioning a VM for proper alignment with LVM. I'll follow that so our new vm's are aligned correctly.
This seems like it could cause this behavior, can anyone with more knowledge/experience confirm that?
Best Answer
There's no easy answer since virtualisation is involved. You have a virtual disk sitting on top of a file system on top of a block device presented to a virtual guest which has its own driver presenting a block device to LVM. I don't know for sure if that would necessarily cause such a huge difference, but it may be possible.
Beyond that...
LVM adds overhead so there will be a difference. If your LVM and block devices aren't aligned properly that can also be a contributing factor.
Alignment isn't a simple subject that can be covered in a setting such as this. The best I can do is refer you to a couple of documents, maybe you'll find more answers in them: