Amazon EC2 Windows Server 2008 – Reboot/Boot Failure – Logs

amazon ec2bootwindows-server-2008

For the 2nd time in 24 hours, our Web Production and Database Server is failing to boot.
Installed IIS7 and MSSQL 2008 R2 – running asp.net and legacy classic asp applications.

The initial failure was apparently due to with a pagefile size error – these boot errors were viewable in the system logs after mounting the volume from said instance onto another existing instance, and viewing through event viewer.
Microsoft said there was a hotfix available. Fresh server re-image, Hotfix applied, all various system config done = a few reboots and 8 hours later no errors.

… until the 2nd failure happened after a simple reboot. And viewing the event logs in the same way as previously, we are none the wiser to why this instance (which is practically identical to our other EC2 instances – except for the classic asp applications) is refusing to boot up. No security, system, application, or any other errors to speak off.

So the broad question is: How can we find out what the h*ll has happened: can someone suggest where to look in the recovered volume for startup errors etc?

2nd question: any bright ideas for getting it working again?
We notice that "Get System Log" option in Amazon interface just results in a blank screen

Many Thanks
Brett

Best Answer

I understand your frustration, as we deal heavily with Windows images on EC2, both as a consumer and as an EC2 solution/management tool provider.

The blank log retrieval is a huge annoyance, and seems to just happen every so often. I contacted Amazon about this, and the general response is that sometimes the log info is blank... Not much to go on with, so I am just letting you know of my own communications about this.

Reboots of Windows EC2 instances also make me nervous. After many failed reboots I can only recommend that you be at the ready to restore your image and database if need be. I spent many hours searching for information about reboot delays on EC2, and generally there are two findings:

  1. It can simply be really, really slow. Think hours, not minutes.
  2. I was once able to find a posting that seemed to hint that Windows instances with SSL certificates can be a problem after a reboot. Sounds weird, but again I am just passing on the information.

Good luck!

Related Topic