C# – the proper way for a Windows service to fail

cwindows-services

I have inherited a Windows service written in C#. Under rare conditions it fails badly. However, it isn't at all clear how to fail well. Ross Bennett states the problem elegantly at bytes.com. For the sake of simplicity I will just quote him here.

Ahoy, Folks!

I've been looking all over for this,
but I just can't seem to shake any
documentation out of the MSDN or from
Google. I've reviewed every .NET
article on developing Windows Services
in the MSDN I've located.

I'm developing a Windows Service
application. This service reads its
configuration data from the system
registry (HKLM) where it was deposited
by another "manager" application. No
problems there.

The service uses a worker thread to do
its work. The thread is created in the
OnStart() and signaled/joined/disposed
in the OnStop(). Again, no problems.

Everything works beautifully when:

  1. The system administrator has set up everything properly, and
  2. the foreign network resources are all reachable.

But of course, we as developers simply
can't rely on:

  1. The system administrator having set up everything properly, or
  2. the foreign network resources being reachable.

Really, what we need is for the
service application to have some way
of dying on its own. If a network
resource goes down, we need the
service to stop. But more to the
point, we need the SCM to know it has
stopped on its own accord. SCM needs
to know that the service has
"failed"…and hasn't just been shut
down by someone.

Calling "return" or throwing an
exception in the "OnStart()" method
isn't even helpful for services still
in the start-up process.. The SCM goes
merrily on and the process keeps
running in the Task Manager–though
it's not actually doing anything since
the worker thread was never created
and started.

Using a ServiceController instance
doesn't do it, either. That appears to
the SCM as a normal shutdown–not a
service failure. So none of the
recovery actions or restarts happen.
(Also, there is MSDNful documentation
warning about the perils of a
ServiceBase descendant using a
ServiceController to make things
happen with itself.)

I've read articles where people were
messing about with PInvoking calls to
the native code just to set the
"Stopped" status flag in the SCM. But
that doesn't shut down the process the
service is running within.

I'd really like to know the Intended
Way of:

  1. Shutting down a service from within the service, where
  2. The SCM is appropriatedly notified that the service has "Stopped", and
  3. The process disappears from the Task Manager.

Solutions involving ServiceControllers
don't seem to be appropriate, if only
because 2 is not satisfied. (That the
Framework documentation specifically
contraindicates doing that carries a
good deal of weight, incidentally.)

I'd appreciate any recommendations,
pointers to documentation, or even
well-reasoned conjecture. 🙂 Oh! And
I'm perfectly happy to entertain that
I've missed the point.

Most cordially,

Ross Bennett

Best Answer

Best practice in native code is to call SetServiceStatus with a non-zero exit code to indicate 1) it's stopped and 2) something went wrong.

In managed code, you could achieve the same effect by obtaining the SCM handle through the ServiceBase.ServiceHandle Property and P/Invoke-ing the Win32 API.

I don't see why the SCM would treat this any differently than setting the ServiceBase.ExitCode property non-zero and then calling ServiceBase.Stop, actually. P/Invoke is a bit more direct perhaps, if the service is in panic mode.


As noted in the comments (also see https://serverfault.com/questions/72318/set-up-recovery-actions-to-take-place-when-a-service-fails) if a process calls SetServiceStatus(SERVICE_STOPPED) with a non-zero exit code, the Recovery Actions for the serice will only be done if the option "Enable Actions For Stops With Errors" (sc.exe failureflag) is ticked. -> System Event ID 7024

If a service process exits (Env.Exit()) or crashs without consulting the SCM, then the Recovery Actions will always be run. -> System Event ID 7031