Windows – How to kill a hung service on Windows 2008R2

windowswindows-server-2008-r2windows-service

I have a Windows 2008R2 server running NSClient++. For some reason the service has got its knickers in a twist and stopped responding to Nagios polling.

When I tried to restart the service the service manager takes a long time to try and kill the service then eventually gives up with a message along the lines of "the service took too long to respond". But…it also starts a new instance of the service.

If I look in Task Manager or tasklist I can now see two instances of nsclient++.exe running.

I tried to kill both of these using:

  • right click and "End Process" in task manager – pretends to kill the process and reports no errors (for example Access Denied) but the process is still there.

  • taskkill /PID <proc id> /F – reports SUCCESS: The process with PID 6672 has been terminated. but the process is still running.

  • downloaded SysInternals PsTools and ran pskill <PID>
    reports Process <PID> killed – yet the process is still there.

  • execute at hh:mm pskill <PID> to get pskill to do this as the SYSTEM account … and you guessed it the process is still running.

All of the above were run in an Administrator command prompt.

Other than a reboot which is not really ideal (the box is a fairly mission critical production server), what else can I try?

The server isn't under any resource pressure (memory, CPU, disk etc) and everything running on it is chugging along just fine.

As quick look at the threads tab in SysInternals Process Explorer shows that all of these nsclient++.exe instances are stuck unloading:

enter image description here

As an aside, I also tried killing all of the TCP connections for these zombie(?) processes (with TCPView) in the hope that I could start a new instance and it would be able to grab port 5666. Then we could reboot the server when things are quieter, but alas that didn't work.

Best Answer

Even though it seems you've figured this out already, the problem is that the process is waiting on the Kernel for something. (This is usually a driver-level problem, but not always.) The only way to kill such a process is to unload the kernel, which, of course, you can't do without rebooting.

Might be worth trying some kernel debugging (does this tool work on 2008 R2?) in the hopes of narrowing down the specific cause or conflict, but your options for handling the problem are either living with it, or rebooting the server to eliminate it.

Is there a reason you haven't considered living with it? If it's just a zombie process, and it's not impacting anything, I'd think you could put off a reboot until a maintenance window or more opportune time. Typically my approach, when the zombie or hung process isn't interfering with anything - take care of it during the next patch cycle or scheduled maintenance window.