Apache – How to Diagnose Stuck Worker Process in Apache and mod_wsgi

apache-2.2mod-wsgimpm-workerpython

I am running Apache 2.2 with mod_wsgi, Python 2.7 and mpm_worker. Occassionally, one of the worker processes gets stuck and all of it threads stop in writing state (as demostrated in the screenshot below).

This happens ~once in a day for one worker process.

I assume this is either because

Some internal problem in Apache
All of my Python threads inside the mod_wsgi worker process deadlock somehow

Thus far, the only remedy I have found is Apache full restart (not graceful).

I'd hope to find some pointers how to diagnose the issue what is causing it

Why Apace Timeout does not kill the worker threads / processes. Time timeout is one minute, but looks like those threads and workers have been running happily several hours on one request.
Is it possible to obtain thread dump from inside mod_wsgiand see if the Python thread themselves are somehow deadlocked
Any idea what could be causing this and how to remedy the situation?

Below is an screenshot showing Apache server-status where one of the worker processes (1-0) is stuck.

enter image description here

Best Answer

After turning on WSGIDaemonProcess and switching to separate daemon processes this issue has not appeared anymore, as recommended in the linked question.

Related Solutions

Apache 2.2 mpm_worker: more threads or more processes

Thus far, these have been my top considerations when determining Threads vs Processes :

Threads will use up much less resident memory than Processes. Yes, with dynamically linked libraries a lot of memory is shared between the Apache Control Process and it's child Processes, however each new Process will need to instantiate all of the modules you have enabled.

This is easily testable by comparing the memory usage of each Process where you have, for example, either 5 Processes and 1 Thread each or 5 Processes and 25 Threads each. In my case here, each child Process takes about 7 MBs regardless of the amount of Threads.

+For Threads
It takes longer to initiate in terms of time and cpu cycles to load a new Process than it does a Thread. This can be tested by verifying avg amount of pages served via 'ab'.

+For Threads
A Processes Threads all depend on the Process .. The biggest concern here, is that if something happens to the Process it will affect all the Threads that are associated with it. If you're running with a single Process with a bunch of Threads, then when the Process dies so will the Threads. More Processes would therefore cause a better separation, and thus greater "fault" tolerance if you will.

+For Processes
Related to (3), for modules such as PHP, their memory is loaded by the Process and shared across all of the Threads. This means that if you have php with memory_limit set to 100Mbs with 25 Threads below, then at max load technically each Thread would be able to allocate a maximum of 4MBs each ( course it won't happen this way, some will hog, some will starve ).

So in the end, it really depends on your use case .. That being said, you'll want to maximize the amount of Threads used so as to diminish memory usage and increase responsiveness. However, you'll have to balance that with a proper amount of Processes for better fault tolerance.

Course I'm no expert here as I've only recently have had to become concerned with this, so I look forward to see what other answers might pop up here !

Apache2 worker mpm too many processes

MaxClients doesn't determine the number of child processes - the number of child processes multiplied by the number of ThreadsPerChild determines the maximum acceptable value of MaxClients.

To meet your values of 6 child processes and 150 maxclients, use the following settings:

StartServers 2
ServerLimit 6
MinSpareThreads 10
MaxSpareThreads 35
ThreadsPerChild 25
MaxClients 150

Note also that I modified the MaxSpareThreads value. From the documentation:

The range of the MaxSpareThreads value is restricted. Apache will correct the given value automatically according to the following rules:
* mpm_netware wants the value to be greater than MinSpareThreads.
* For worker the value must be greater or equal than the sum of MinSpareThreads and ThreadsPerChild.

Related Topic