I read some answers related to this problems.
Will the OS crash if the system partition can't be access for a short period?
But I cannot solve this problem.
When using ISCSI as Storage Repository at XenServer and DomU (VM) is in heavy disk I/O,
If ISCSI connection lost ( mainly network connection problem/ storage failover),
DomU filesystem ( specially ext3 linux filesystem ) crashed.
In this case, ext3 filesystem of DomU becomes read-only or unrecoverable.
How can I protect the filesystem of VM in case ISCSI connection lost at Dom0 ?
This is my XenServer environment.
[root@cnode01-m ~]# iscsiadm -m session
tcp: [1] 10.32.1.240:3260,2 iqn.1986-03.com.sun:02:c5544ae6-9715-6f38-f83b-a446896ac614
tcp: [3569] 10.32.1.240:3260,2 iqn.1986-03.com.sun:02:5c41ce31-3fbb-c6aa-d479-947e85515ac7
[root@cnode01-m ~]# vgs
VG #PV #LV #SN Attr VSize VFree
VG_XenStorage-1aeee13b-2a87-1d0d-1834-7b8c868009b0 1 40 0 wz--n- 6.35T 4.93T
VG_XenStorage-28e2c663-dae5-9504-9733-e05063ff081d 1 57 0 wz--n- 6.35T 4.52T
VG_XenStorage-365d6e13-5caa-1fea-9940-e1bb553e3513 1 42 0 wz--n- 6.35T 5.13T
VG_XenStorage-4ea23f9a-f945-5d45-cbd2-f3eab3fe75b3 1 42 0 wz--n- 6.35T 5.40T
VG_XenStorage-54d69165-2eed-c058-d587-1b84d488adea 1 37 0 wz--n- 6.35T 5.01T
VG_XenStorage-598b7237-282b-ea61-8edc-5101a70ea001 1 63 0 wz--n- 6.35T 5.01T
VG_XenStorage-6a063762-26de-a3f8-f18c-734fce25433a 1 49 0 wz--n- 6.35T 5.56T
VG_XenStorage-6b7bea84-7269-fa88-7b95-23dce431e1aa 1 71 0 wz--n- 6.35T 4.80T
VG_XenStorage-6d6d263b-243c-fb24-4f0c-28b226a22bab 1 47 0 wz--n- 6.35T 4.94T
VG_XenStorage-76fe6d6d-a37a-698d-9af2-50ea3f55e127 1 44 0 wz--n- 6.35T 5.37T
VG_XenStorage-80e2df33-268c-b8a6-cc02-71f27ebe3326 1 39 0 wz--n- 6.35T 5.80T
VG_XenStorage-886070b7-34e8-eb96-0931-2c31952608a6 1 13 0 wz--n- 457.65G 369.31G
VG_XenStorage-97136f70-cf33-2593-38e0-b8c09785a754 1 60 0 wz--n- 6.35T 5.14T
VG_XenStorage-c910e9fd-8817-0b99-8c8d-1ee0883705de 1 37 0 wz--n- 6.35T 5.67T
VG_XenStorage-cd709bcb-d46a-8483-acbf-49b2b0c59c06 1 58 0 wz--n- 6.35T 4.80T
VG_XenStorage-e153d09a-716a-9764-8967-f704278d55bd 1 43 0 wz--n- 6.35T 4.45T
VG_XenStorage-f8574b51-31d4-7b0e-c71e-8253e1cdd230 1 61 0 wz--n- 6.35T 4.20T
[root@cnode01-m ~]# ls -la /dev/sd[a-z]
brw-r----- 1 root disk 8, 0 Jun 8 17:37 /dev/sda
brw-r----- 1 root disk 8, 16 Aug 1 10:14 /dev/sdb
brw-r----- 1 root disk 8, 32 Jun 8 17:38 /dev/sdc
brw-r----- 1 root disk 8, 48 Jul 31 14:49 /dev/sdd
brw-r----- 1 root disk 8, 64 Jul 31 14:46 /dev/sde
brw-r----- 1 root disk 8, 80 Jul 31 14:51 /dev/sdf
brw-r----- 1 root disk 8, 96 Aug 3 13:52 /dev/sdg
brw-r----- 1 root disk 8, 112 Aug 3 10:53 /dev/sdh
brw-r----- 1 root disk 8, 128 Aug 2 13:40 /dev/sdi
brw-r----- 1 root disk 8, 144 Jul 30 00:17 /dev/sdj
brw-r----- 1 root disk 8, 160 Jul 30 00:17 /dev/sdk
brw-r----- 1 root disk 8, 176 Jul 30 00:17 /dev/sdl
brw-r----- 1 root disk 8, 192 Jul 30 00:17 /dev/sdm
brw-r----- 1 root disk 8, 208 Jul 30 00:17 /dev/sdn
brw-r----- 1 root disk 8, 224 Jul 30 00:17 /dev/sdo
brw-r----- 1 root disk 8, 240 Jul 30 00:17 /dev/sdp
brw-r----- 1 root disk 65, 0 Jul 30 00:17 /dev/sdq
This is my DomU (VM) enviroment.
[root@i-58-7172-VM ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
16G 1.5G 14G 11% /
/dev/xvda1 99M 30M 65M 32% /boot
tmpfs 512M 0 512M 0% /dev/shm
When I put heavy I/O load to / partition at VM and ISCSI connection have some problems
(network problem, ISCSI target failover event) / partition crashed.
How can I solve this problem ? In advance, Thank you so much.
Added
This is my iscsid.conf at Dom0
[root@cnode01-m ~]# more /etc/iscsi/iscsid.conf node.startup = manual node.session.timeo.replacement_timeout = 86400 node.conn[0].timeo.login_timeout = 15 node.conn[0].timeo.logout_timeout = 15 node.conn[0].timeo.noop_out_interval = 0 node.conn[0].timeo.noop_out_timeout = 0 node.session.initial_login_retry_max = 4 node.session.cmds_max = 128 node.session.queue_depth = 32 node.session.iscsi.InitialR2T = No node.session.iscsi.ImmediateData = Yes node.session.iscsi.FirstBurstLength = 262144 node.session.iscsi.MaxBurstLength = 16776192 node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072 discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 32768 node.session.iscsi.FastAbort = No
10G ethernet and Jumbo Frame are implemented at storage layer.
And Citrix XenServer have also command for pausing VMs when storage service have some issue
But Pausing and Unpausing VM operation cause unintegrity of the VM's system clock.
So it may have side effect, nomally at application layer. I think.
Best Answer
first you should address the source of the issue - storage access. with iscsi you can tweak iscsi.conf, and increase the queue length, buffer sizes and timeout, so the connection will be able to sustain longer outages. besides, implementing multipathing, 10G ethernet (if the SAN supports it) and jumbo frames is a good idea.
I'm no Xen expert, but with KVM, there is an option to pause the VMs when there is an EIO or ENOSPACE returned by the storage layer, it should be possible with Xen, if you dig into the options IMO, and if not - I'd try and file a feature request with the developers.