Our EC2 Instance (Windows Server 2008) crashed multiple times for the past 3 months (last time was today at 1:05 EST). Upon reviewing MEMORY.DMP file we noticed that possible cause of the crashes is rhelnet.sys (RedHat PV NIC Driver).
Server's Event Viewer has the following records right after the crash:
Critical - Kernel Power:
The system has rebooted without cleanly shutting down first.
This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.
BugCheck:
The computer has rebooted from a bugcheck. The bugcheck was:
0x000000d1 (0x000000000000002d, 0x0000000000000002, 0x0000000000000000, 0xfffff88001402d14).
A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: 100113-35849-01.
Could this be a hardware issue? Would it help if we stop and start the instance? Or is this more likely that this is caused by the software running on the system?
[Update 10.01.2013]
Amazon Rep suggested to update RH drivers to Citrix PV drivers on our instance:
[Update 10.08.2013]
We performed a drivers upgrade on the cloned instance. Right after the upgrade we noticed the following errors in our Event viewer:
Xennet6 errors in Event Viewer (Event ID# 5001)
After digging a bit more I found this article suggesting to install the latest Citrix drivers. Unfortunately, this didn't help us at all and our cloned instance became unresponsive.
[Update 10.08.2013 2]
I recreated an instance and updated PV drivers again.
After searching on Internet I found this article where Amazon Rep explains that:
"Event ID 5001 from source Xennet6 cannot be found" message does not
indicate anything wrong, just that the PV driver is looking for a feature
that we have not implemented in our version of Xen.
I will keep my test system running for a while to see if there any issues with it.
Best Answer
Upgrading drivers as suggested by Amazon Rep fixed the isuse.
In regards to
Event ID 5001...
issue below is the reply I've got from Amazon: