Linux – Xen HVM Linux DomU fails with I/O errors

linuxtimeoutvirtualizationxen

I am using Xen hardware-based virtualization for sevaral Linux DomUs. One of them keeps failing randomly with I/O errors when there is heavy I/O load in other DomUs.

dmesg contains is the following

[885434.196928] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT,SUGGEST_OK
[885434.196934] end_request: I/O error, dev sda, sector 1557062
[885434.246997] Aborting journal on device dm-1.
[885438.713821] __journal_remove_journal_head: freeing b_committed_data
[885438.728478] ext3_abort called.
[885438.728698] EXT3-fs error (device dm-1): ext3_journal_start_sb: Detected aborted journal
[885438.729192] Remounting filesystem read-only

And here a second output from a different incident:

[1532214.100163] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT,SUGGEST_OK
[1532214.100169] end_request: I/O error, dev sda, sector 3751150
[1532214.100172] Buffer I/O error on device dm-1, logical block 275514
[1532214.100442] lost page write due to I/O error on dm-1
[1547950.515890] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT,SUGGEST_OK
[1547950.515896] end_request: I/O error, dev sda, sector 5477734
[1547950.515900] Buffer I/O error on device dm-1, logical block 491337
[1547950.516358] lost page write due to I/O error on dm-1
[1547972.401281] Aborting journal on device dm-1.
[1547950.541130] ext3_abort called.
[1547950.541357] EXT3-fs error (device dm-1): ext3_journal_start_sb: Detected aborted journal
[1547950.541869] Remounting filesystem read-only
[1547950.542125] EXT3-fs error (device dm-1) in ext3_ordered_write_end: IO failure

Sectors do vary between crashes and I cannot find any erros if I check the disks (which are in an md1) from Dom0.

I experienced the same issues at work with VMWare ESX based virtualization before installing the VMWare tools, so I expect a driver issue, since there are no "Xen Tools" (like the VMWare tools) containing special drivers, I don't know how to fix the issue.

The DomUs are using Linux 2.6.24-24-server #1 SMP Tue Aug 18 16:51:43 UTC 2009 x86_64 GNU/Linux, and are based on Ubuntu 8.04.3 LTS (hardy), whereas the hypervisor and the Dom0 are Linux 2.6.26-2-xen-amd64 #1 SMP Fri Aug 14 10:19:53 UTC 2009 x86_64 GNU/Linux on Debian Lenny.

Anybody got any ideas on how to proceed?

Best Answer

Apparently some people encountered same problem and talked about it in this mail thread : http://lists.centos.org/pipermail/centos-virt/2009-June/001026.html

You should try the 'xm sched-credit' trick :).

Regards, Romain