A small machine (1 core, 1 GB RAM, CentOs 6.3) virtualized via Citrix Xen has 3 virtual disks with very different sizes.
> cat /etc/fstab (snippet)
...
/dev/mapper/vg_stagingnfs-lv_root / ext4 defaults 1 1 # on /dev/xvda
/dev/disk/by-uuid/8048fd86-3aa3-4cdd-92fe-c19cc97d3c2e /opt/xxx/data/nexus ext4 defaults 0 0
/dev/disk/by-uuid/58f16c69-786e-47d0-93ae-d57fb0cbd2a9 /opt/xxx/data/nfs ext4 defaults 0 0
> mount (snippet)
...
/dev/mapper/vg_stagingnfs-lv_root on / type ext4 (rw)
/dev/xvdb1 on /opt/xxx/data/nexus type ext4 (rw)
/dev/xvdc1 on /opt/xxx/data/nfs type ext4 (rw)
> df -h (snippet)
...
/dev/mapper/vg_stagingnfs-lv_root
5.5G 3.1G 2.2G 59% /
/dev/xvdb1 2.0T 60G 1.9T 4% /opt/xxx/data/nexus
/dev/xvdc1 729G 144G 548G 21% /opt/xxx/data/nfs
Device /dev/xvda
is a virtual disk inside a "storage repository" backed by a 4-Disk-Raid5. Devices /dev/xvd{b|c}
are virtual disks both inside another "storage repository" backed by another 4-Disk-Raid5. Disk performance between (let's keep it simple) xvda
and xvdb
is dramatically different:
> dd if=/dev/zero of=/root/outfile bs=1024k count=1000
1048576000 bytes (1.0 GB) copied, 8.61225 s, 122 MB/s
> dd if=/dev/zero of=/opt/xxx/data/nexus/outfile bs=1024k count=1000
1048576000 bytes (1.0 GB) copied, 86.241 s, 12.2 MB/s
I haven't spotted any obviously explaining differences via top, atop, iotop or iostat. During both dd-ings I notice 3 main commands causing load: dd, flush-xxx and jdb2/xvdbxxx. Main types of load are %sy and %wa. During dd-ing on xvda
relation %sy:%wa seems roughly like 20%:80%, during dd-ing on xvdb
it's almost looks like 0%:100%.
Now the big question: WTF? I'm running out of ideas how to further track down the root cause. Any ideas how to get to the bottom of this?
Your help is highly appreciated!
I'll add some extra information:
- both storage repositories are LVM-backed
- both are local to the Xen host
- strange: the faster storage repository contains virtual disks of > 20 other VMs (and
xvda
of this VM); disksxvdb
/xvdc
are the only disks in the slower storage repository and are only attached to this very VM. Anyway I additionally created a third virtual disk on that slow storage repository and attached it to a different VM – same effect…
Information gathered on the Xen host (mostly looking for evidence of bad disks):
# xe sr-list (snippet)
...
uuid ( RO) : 88decbcc-a88c-b368-38dd-dc11bfa723f6
name-label ( RW): Local storage 2 on xen-build2
name-description ( RW): RAID5 4x1TB 7.200 rpm MDL Disks # a.k.a. the too slow one
host ( RO): xen-build2
type ( RO): lvm
content-type ( RO): user
uuid ( RO) : b4bae2a7-02fd-f146-fd95-51f573c9b27d
name-label ( RW): Local storage
name-description ( RW): # a.k.a. the reasonably fast one
host ( RO): xen-build2
type ( RO): lvm
content-type ( RO): user
# vgscan -v (snippet)
Wiping cache of LVM-capable devices
Wiping internal VG cache
Reading all physical volumes. This may take a while...
Finding all volume groups
Finding volume group "VG_XenStorage-88decbcc-a88c-b368-38dd-dc11bfa723f6"
Found volume group "VG_XenStorage-88decbcc-a88c-b368-38dd-dc11bfa723f6" using metadata type lvm2
Finding volume group "VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d"
Found volume group "VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d" using metadata type lvm2
# lvmdiskscan (snippet)
...
/dev/sdb [ 838.33 GB] LVM physical volume # reasonably fast
/dev/sdc [ 2.73 TB] LVM physical volume # too slow
3 disks
16 partitions
2 LVM physical volume whole disks
1 LVM physical volume
# vgck -v
Finding all volume groups
Finding volume group "VG_XenStorage-88decbcc-a88c-b368-38dd-dc11bfa723f6"
Finding volume group "VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d"
# pvck -v
(no output)
# lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
MGT VG_XenStorage-88decbcc-a88c-b368-38dd- dc11bfa723f6 -wi-a- 4.00M
VHD-2190be94-2e94-4df1-a78e-b2ee1edf2400 VG_XenStorage-88decbcc-a88c-b368-38dd-dc11bfa723f6 -wi-ao 1.76G
VHD-b1971dad-60f0-4d3a-a63d-2f3184d74035 VG_XenStorage-88decbcc-a88c-b368-38dd-dc11bfa723f6 -wi-ao 741.45G
VHD-f0c7cc8f-1d69-421d-8a57-97b20c32e170 VG_XenStorage-88decbcc-a88c-b368-38dd- dc11bfa723f6 -wi-ao 2.00T
MGT VG_XenStorage-b4bae2a7-02fd-f146-fd95- 51f573c9b27d -wi-a- 4.00M
VHD-02a0d5b5-a7e5-4163-a2fa-8fd651ed6df3 VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao 20.05G
VHD-0911628d-e03a-459a-83f4-f8c699aee619 VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao 50.11G
VHD-0950ba89-401d-433f-87bb-8f1ab9407a4b VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao 30.07G
VHD-18e93da6-d18d-4c27-8ea6-4fece41c75c1 VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi--- 8.02G
VHD-1b5ced06-a788-4e72-9adf-ea648c816e2e VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi--- 256.00M
VHD-22fe1662-6b5d-49f5-b729-ec9acd7787ee VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao 120.24G
VHD-23cb8155-39c1-45aa-b6a5-bb8a961707b7 VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao 8.02G
VHD-25913e86-214f-4b7f-b886-770247c1d716 VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao 10.03G
VHD-44c5045c-6432-48cf-85d3-646e46a3d849 VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi--- 20.05G
VHD-4d5f779d-51a9-4087-b113-4d99f16d6779 VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao 50.11G
VHD-4e4749c7-8de6-4013-87cb-be53ac112f4f VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao 30.07G
VHD-503a68d4-182f-450e-8c34-7568f9472668 VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao 20.05G
VHD-5dc961e0-beb2-4ce3-b888-b16a26dd77a5 VG_XenStorage-b4bae2a7-02fd-f146-fd95- 51f573c9b27d -wi-ao 50.11G
VHD-6d4ee024-789a-46f5-8922-edf15ac415cd VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao 50.11G
VHD-7b80f83f-6a0f-4311-8d32-c8f51b547b3d VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao 120.24G
VHD-81aa93fa-dbf5-4a4a-ba21-20693508ec4a VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao 10.03G
VHD-85cb8e94-fd07-4717-8cca-871f07099fb0 VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao 50.11G
VHD-8e8f63c3-ab21-4707-8736-af0b279c9b7e VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi--- 16.00M
VHD-965cc67a-5cb9-4d79-8916-047bfd42955d VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao 64.13G
VHD-c1abfb8d-12bc-4852-a83f-ccbc6ca488b8 VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi--- 100.20G
VHD-d679959b-2749-47e2-9933-e9f008aea248 VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao 75.15G
AFAICS including more "-v"s still outputs nothing pointing to a bad disk… Any other checks that would identify a bad disk? Thx!
Best Answer
With a performance difference this large, assume a bug (:-))
In all seriousness, common accidental-pessimization problems will slow you by 10-20%. Performance bugs, like the previously-mentioned dying disk, will slow you by orders of magnitude.
As a performance engineer, most of what I see are bugs.
--dave