Linux – Upgrading Ubuntu remotely: Howto minimize the risk of losing the server

linuxUbuntu

Background: I am forced to remotely upgrade a server from Ubuntu 8.04 LTS to 10.04 LTS due to an incompability issue with the raid controller.

The internet connection to the server is somewhat stable and seldom drops. Despite that I am concerned about losing the connection over SSH while doing the upgrade, leaving the server in an unreachable state. I am also worried about the server not being able to boot after the upgrade, in case I will be unable to know what is the problem.

Action plan: What I am looking for is advice to minimize the risk of losing the server, I am aware that what I am doing is very risky. This is my current action plan:

1) Backup everything that matters, locally and externally.

2) Temporarily disable boot-time disk checks with fsck. (I will have no clue what is going on if the disk check would take a long time to finish). This would be done through fstab by changing the very last paramter from 1 to 0:

UUID=5b1ff964-7608-44fd-a38d-7e43ad6b4c11 /               ext3    relatime,errors=remount-ro 0       0

3) Starting all upgrade processes with with screen so that they can be resumed if I lose the connection. Ie:

sudo screen apt-get upgrade

Questions:

  • Does my proposed action plan seem reasonable?
  • Is disabling the boot-time disk check a bad idea?
  • What else could be done to decrease the risk of losing the server?

Update: Almost all answeres suggested me to setup DRAC/IPMI which I have now done. This feels like a really great acheivement that will for sure make the risk much much smaller as I can follow the entire power cycle over KVM/console redirection. For future references, this is what I have done:

1) Installed ipmitool to setup IP address, gateway etc for IPMI v2.0:

sudo ipmitool lan set 1 ipaddr 192.168.1.99 
sudo ipmitool lan set 1 defgw ipaddr 192.168.1.1

2) Installed free-ipmi to change the NIC selection mode to shared (I have only one network interface connected to the network):

sudo ipmi-oem dell set-nic-selection shared 

3) Used DRAC's https interface on https://192.168.1.99 to launch the console redirection viewer. This allows me to follow the entire boot sequence as well as configuring BIOS, raid controllers etc. Awesome.


Update 2. Done. All went with a charm, took less than 30 min to do the job. I ended up not turning off the disk check as the redirected console gave me the freedom to interrupt it whenever I wanted to, but I let it run to the end.

Thank you guys, your wisdom is invaluable!

Best Answer

If hardware does not break, there isn't anything you can't do with a serial console, so that's the way to go:

  • get some remote access to serial console (IPMI serial over lan if the system has >=IPMI-2.0, or a null modem serial cable connected to another system where you'll run minicom)
  • configure grub and linux to use the serial console
  • redirect the system BIOS interface on serial if it is possible (many server systems are able to do that)
  • reboot the system and check out that you can use (bios), grub, see dmesg, see init scripts, and login all over the serial console
  • run the upgrade
  • cross your fingers

Also, install the new system on another disk or partition if at all possible, so you can test the new system before erasing the old one. I usually do that with two disks system: I take one disk out of the mirror, create a new (degraded) mirror with the free disk, install there, if everything is ok I destroy the old mirror and hot-add the 'old' disk to the new mirror and let it rebuild.

EDIT: I read it's a Dell R710, AFAIK that should have IPMI2. Configure it running ipmitool locally on the system, and test the serial over lan feature using ipmitool sol enable on another system. Bang! You have your serial console. Dells also are able to redirect BIOS on the serial console (that IPMI will in turn redirect on serial-over-lan). You should have done that anyway to get access to the system if anything goes really bad. I manage a couple of old Dell PE1425 using null modem cables with bios,grub,system serial consoles, and a couple of Dell R300 the same way but using IPMI serial over lan in place of the actual serial cable.

Related Topic