Web-server – Best practices for Windows Server 2003 live server reboot

maintenanceweb-serverwindows-server-2003

We have a Windows Server 2003 box running as a web server in a remote data centre. Last night I installed a large batch of Windows Updates and then triggered a reboot around 1:30am via RDP. The reboot failed and although the server is still running, we're locked out via RDP. Cue panic and mayhem!

I did not route the applications to another machine (which is only partly possible with the equipment we have currently) because a few hours' down time for upgrades in the middle of the night is acceptable to our customers.

For now, I am liaising with the data centre staff to reboot the machine via other means late tonight, but obviously I want to avoid this in the future.

My questions:

  • How can I prevent losses of service like these? Note that we are a very small company with a light load on our server, so while I am interested in best-practices that involve buying lots of extra hardware, I would prefer to hear about cheaper things that can be done.
  • Is having RDP as the only means to perform certain important tasks (namely modifying the database to solve customer service issues) a liability?

Details of the machine:

  • Windows Server 2003, unsure which service pack
  • Running two websites on ASP.NET 3.5, and another four on ASP classic
  • SQL Server 2005 backend for all the websites
  • Uses a continuous backup solution to an identically configured machine that is ready to be plugged in and switched on
  • Running a VMware instance that contains a staging environment and is not mission critical
  • One hard drive partition with mirroring
  • 4GB RAM
  • Core 2 Duo ~ 2Ghz

Thanks in advance. More information on request.

UPDATE:

Some excellent answers here so far.

For remote management, several have suggested using KVM and remote power management over IP, or hardware such as HP's iLO or Dell's DRAC. We have HP servers so I will look into iLO. Irritatingly, our hosting centre has KVM over IP for all its machines but don't allow access to customers as it's not set up securely. In selecting future hosting services, I will make sure this is not the case.

For prevention, mh suggested stopping services and closing sessions that may be preventing the reboot. In our case that would probably have identified the issue and prevented the problem. It seems like the VMware instance running our staging environment was not shut down and that stopped the main server restarting.

John Gardeniers suggested performing reboots manually after installing updates, and not letting Windows Update carry out the reboot. I will do this in future.

Thanks everyone.

Best Answer

There are several options for accessing the console remotely without relying on RDP into a working Windows install:

  1. Lights Out Management - some servers from Sun, HP, IBM and several others have a Lights Out Management Chip (LOM). I have much of my experience with HP's iLO technology which has saved me several trips. Essentially this chip gives you authenticated remote access to the controls on the front of the server, and most instances much much more. The down side is if you server dosn't have it you can't use it.

  2. IP KVM's or Remote Power Management - several vendors provide products that allow you to access either the keyboard, video and mouse remotely (IP KVM) or less expensivly the ability to manage power remotely by cycling the mains power to the server off and on again via a PDU. The latter option is not recommended unless you are sure simply power cycling the server will help.

The final option would be to not install updates over night, and schedule short outages during the day when you or your "hands and eyes" can be at the data center to sort out any problems. This is at the whim of your customer really, although a quaterly downtime is often a good thing to build in to your agreements.