How to upgrade Xen with minimal risk and downtime

virtualizationxen

I've recently noticed that one of my server is running on a pretty old version of Xen:

$ dpkg-query -l | grep xen
ii  libc6-xen                          2.11.3-4                     Embedded GNU C Library: Shared libraries [Xen version]
ii  libxenstore3.0                     4.0.1-5.10                   Xenstore communications library for Xen
ii  linux-image-2.6.32-5-xen-686       2.6.32-48squeeze1            Linux 2.6.32 for modern PCs, Xen dom0 support
ii  xen-hypervisor-4.0-amd64           4.0.1-5.10                   The Xen Hypervisor on AMD64
ii  xen-linux-system-2.6-xen-686       2.6.32+29                    Xen system with Linux 2.6 for modern PCs (meta-package)
ii  xen-linux-system-2.6.32-5-xen-686  2.6.32-48squeeze1            Xen system with Linux 2.6.32 on modern PCs (meta-package)
ii  xen-tools                          4.2-1                        Tools to manage Xen virtual servers
ii  xen-utils-4.0                      4.0.1-5.10                   XEN administrative tools
ii  xen-utils-common                   4.0.0-1                      XEN administrative tools - common files
ii  xenstore-utils                     4.0.1-5.10                   Xenstore utilities for Xen

The Dom0 is quite old too:

$ uname -a 
Linux Dom0 2.6.32-5-xen-686 #1 SMP Mon Feb 25 05:55:06 UTC 2013 i686 GNU/Linux

I am not very familar with such production server and I would rather think twice before doing a:

$ apt-get update
$ apt-get upgrade

What do I need to check before doing the upgrade and should I need to shutdown and reboot all my VM during the upgrade?

Best Answer

Minimal risk and down-time can be somewhat subjective terms in this case and may also be limited by available resources.

The 'ideal' way to update with no downtime and minimal risks to VM data would involve multiple servers, at least 3, possibly more depending on load and storage requirements:

  • Backend storage for VMs, ideally not on a hypervisor, virtual machine images and snapshots can be stored here, as well as data that could potentially be accessed by multiple VMs.
  • Two hypervisor systems
  • Depending on total number of hypervisors, IO requirements, required storage a dedicated high-speed network between the hypervisors and storage servers could improve performance

Once the systems are in place, it is relatively simple to migrate a live VM. Once the migration from server0 to server1 has taken place and everything is verified to be running correctly on server1 relevant services on server0 can be stopped and upgraded.

If you have the resources to setup this sort of infrastructure there can be many advantages to running hypervisors/VM pools in this manner. Having a tested and documented process for migrating VMs between hypervisors will allow you to schedule regular maintenance and downtime on the hypervisors. Planned upgrades and maintenance allow you to stay on top of updates that can impact security and performance.

Having minimal infrastructure setup to allow even temporary migration of services between hypervisors can reduce the impact and visibility to customers if/when critical security patches need to be applied to production systems.

In cases where some downtime is acceptable and the infrastructure for an 'ideal' update scenario , I have generally had success using this process, though the occasional unforeseen issue can occur; based on past experience, keeping a Hot Spare if possible for critical systems and infrastructure is always a good idea. I have used some variation of this set of steps with KVM and Xen on openSUSE and CentOS:

  1. Make sure all backups and snapshots from VMs are up to date
  2. Shutdown running VMs in the most graceful possible manner
  3. Upgrade/patch the hypervisor
  4. Reboot the hypervisor, not strictly speaking necessary, though depending on the upgrades performed this may be the easiest way to make sure all the changes take effect.
  5. Pace about the server room while waiting for hypervisor to reboot
  6. Restart the VMs
  7. Test to see if everything is working
Related Topic