Django process hangs when connecting to database in Apache

apache-2.2djangomod-wsgi

I'm having an issue with Djang in Apache. It is running in daemon mode and using worker MPM. The application is mostly a restful API that is serving connections from Mobile devices and external web services.

When a server process reaches its maximum requests it restarts. It seems like when the process restarts there is a chance that the python process will lock up. I've grabbed stack traces of the processes and it looks like they are locking up when they import the database engine.

Worker settings:

<IfModule worker.c>
StartServers         5 
MaxClients          150
MinSpareThreads     20
MaxSpareThreads     100
ThreadsPerChild     10
MaxRequestsPerChild  0
</IfModule>

and VirtualHost:

NameVirtualHost *:443
<VirtualHost *:443>
Header set Access-Control-Allow-Origin "*"

ServerName example.com


WSGIScriptAlias / /example/API/apache/django.wsgi
ErrorLog /example/API/apache/log
<Directory /example/API>
<Files django.wsgi>
Order allow,deny
Allow from all
</Files>
</Directory>

</VirtualHost>

I've gone and grabbed a couple stack traces using the method described here https://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Extracting_Python_Stack_Traces.

The process has 10 threads. 9 of them have produced this.

# ProcessId: 10353
# ThreadID: 140479095494400
File: "/usr/lib64/python2.6/site-packages/newrelic-2.20.0.17/newrelic/api/web_transaction.py", line 853, in __call__
File: "/usr/lib64/python2.6/site-packages/newrelic-2.20.0.17/newrelic/api/function_trace.py", line 108, in literal_wr\
apper
File: "/usr/lib/python2.6/site-packages/django/core/handlers/wsgi.py", line 206, in __call__
File: "/usr/lib/python2.6/site-packages/django/core/handlers/base.py", line 112, in get_response
File: "/usr/lib64/python2.6/site-packages/newrelic-2.20.0.17/newrelic/hooks/framework_django.py", line 492, in wrappe\
r
File: "/usr/lib/python2.6/site-packages/django/views/generic/base.py", line 69, in view
File: "/usr/lib/python2.6/site-packages/django/views/decorators/csrf.py", line 57, in wrapped_view
File: "/usr/lib/python2.6/site-packages/rest_framework/views.py", line 388, in dispatch
File: "/usr/lib/python2.6/site-packages/rest_framework/views.py", line 317, in initial
File: "/usr/lib/python2.6/site-packages/rest_framework/views.py", line 267, in perform_authentication
File: "/usr/lib/python2.6/site-packages/rest_framework/request.py", line 219, in user
File: "/usr/lib/python2.6/site-packages/rest_framework/request.py", line 385, in _authenticate
File: "/Locqus/API/PathServe/LocqusSimpleAuth.py", line 35, in authenticate
File: "/usr/lib/python2.6/site-packages/django/contrib/auth/__init__.py", line 49, in authenticate
File: "/usr/lib/python2.6/site-packages/django/contrib/auth/backends.py", line 16, in authenticate
File: "/usr/lib/python2.6/site-packages/django/contrib/auth/models.py", line 167, in get_by_natural_key
File: "/usr/lib/python2.6/site-packages/django/db/models/manager.py", line 151, in get
File: "/usr/lib/python2.6/site-packages/django/db/models/query.py", line 304, in get
File: "/usr/lib/python2.6/site-packages/django/db/models/query.py", line 77, in __len__
File: "/usr/lib/python2.6/site-packages/django/db/models/query.py", line 857, in _fetch_all
File: "/usr/lib/python2.6/site-packages/django/db/models/query.py", line 220, in iterator
File: "/usr/lib/python2.6/site-packages/django/db/models/sql/compiler.py", line 713, in results_iter
File: "/usr/lib/python2.6/site-packages/django/db/models/sql/compiler.py", line 785, in execute_sql
File: "/usr/lib/python2.6/site-packages/django/db/backends/__init__.py", line 162, in cursor
File: "/usr/lib/python2.6/site-packages/django/db/backends/__init__.py", line 132, in _cursor
File: "/usr/lib/python2.6/site-packages/django/db/backends/__init__.py", line 127, in ensure_connection
File: "/usr/lib/python2.6/site-packages/django/db/backends/__init__.py", line 115, in connect
File: "/usr/lib/python2.6/site-packages/django/db/backends/postgresql_psycopg2/base.py", line 115, in get_new_connect\
ion
File: "/usr/lib64/python2.6/site-packages/newrelic-2.20.0.17/newrelic/hooks/database_dbapi2.py", line 91, in __call__
File: "/usr/local/lib64/python2.6/site-packages/psycopg2/__init__.py", line 164, in connect
  conn = _connect(dsn, connection_factory=connection_factory, async=async)

One of them produced this trace.

    # ProcessId: 10353
# ThreadID: 140478641071872
File: "/usr/lib64/python2.6/threading.py", line 504, in __bootstrap
File: "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
File: "/usr/lib64/python2.6/threading.py", line 484, in run
File: "/usr/lib64/python2.6/site-packages/newrelic-2.20.0.17/newrelic/core/agent.py", line 543, in _harvest_loop
File: "/usr/lib64/python2.6/site-packages/newrelic-2.20.0.17/newrelic/core/agent.py", line 584, in _run_harvest
File: "/usr/lib64/python2.6/site-packages/newrelic-2.20.0.17/newrelic/core/application.py", line 1258, in harvest
File: "/usr/lib64/python2.6/site-packages/newrelic-2.20.0.17/newrelic/core/internal_metrics.py", line 82, in __call__
File: "/usr/lib64/python2.6/site-packages/newrelic-2.20.0.17/newrelic/core/data_collector.py", line 614, in send_metr\
ic_data
File: "/usr/lib64/python2.6/site-packages/newrelic-2.20.0.17/newrelic/core/data_collector.py", line 340, in send_requ\
est
File: "/usr/lib64/python2.6/site-packages/newrelic-2.20.0.17/newrelic/packages/requests/sessions.py", line 399, in po\
st
File: "/usr/lib64/python2.6/site-packages/newrelic-2.20.0.17/newrelic/packages/requests/sessions.py", line 357, in re\
quest
File: "/usr/lib64/python2.6/site-packages/newrelic-2.20.0.17/newrelic/packages/requests/sessions.py", line 460, in se\
nd
File: "/usr/lib64/python2.6/site-packages/newrelic-2.20.0.17/newrelic/packages/requests/adapters.py", line 320, in se\
nd
File: "/usr/lib64/python2.6/site-packages/newrelic-2.20.0.17/newrelic/packages/requests/packages/urllib3/connectionpo\
ol.py", line 544, in urlopen
File: "/usr/lib64/python2.6/site-packages/newrelic-2.20.0.17/newrelic/packages/requests/packages/urllib3/connectionpo\
ol.py", line 369, in _make_request
File: "/usr/lib64/python2.6/httplib.py", line 920, in request
File: "/usr/lib64/python2.6/httplib.py", line 957, in _send_request
File: "/usr/lib64/python2.6/site-packages/newrelic-2.20.0.17/newrelic/hooks/external_httplib.py", line 35, in httplib\
_endheaders_wrapper
File: "/usr/lib64/python2.6/httplib.py", line 914, in endheaders
File: "/usr/lib64/python2.6/httplib.py", line 786, in _send_output
File: "/usr/lib64/python2.6/httplib.py", line 745, in send
File: "/usr/lib64/python2.6/site-packages/newrelic-2.20.0.17/newrelic/packages/requests/packages/urllib3/connectionpo\
ol.py", line 134, in connect
File: "/usr/lib64/python2.6/site-packages/newrelic-2.20.0.17/newrelic/packages/requests/packages/urllib3/util.py", li\
ne 626, in ssl_wrap_socket
File: "/usr/lib64/python2.6/ssl.py", line 338, in wrap_socket
File: "/usr/lib64/python2.6/ssl.py", line 118, in __init__
  cert_reqs, ssl_version, ca_certs)

Finally, I used attached the process in gdb and grabbed thread info

Id   Target Id         Frame 
  14   Thread 0x7fc3d695d700 (LWP 10355) "httpd.worker" 0x00007fc3e27e5f7d in __lll_lock_wait ()
   from /lib64/libpthread.so.0
  13   Thread 0x7fc3d615c700 (LWP 10356) "httpd.worker" 0x00007fc3e27e5f7d in __lll_lock_wait ()
   from /lib64/libpthread.so.0
  12   Thread 0x7fc3d595b700 (LWP 10357) "httpd.worker" 0x00007fc3e27e5f7d in __lll_lock_wait ()
   from /lib64/libpthread.so.0
  11   Thread 0x7fc3d515a700 (LWP 10358) "httpd.worker" 0x00007fc3e27e5f7d in __lll_lock_wait ()
   from /lib64/libpthread.so.0
  10   Thread 0x7fc3d4959700 (LWP 10359) "httpd.worker" 0x00007fc3e27e5f7d in __lll_lock_wait ()
   from /lib64/libpthread.so.0
  9    Thread 0x7fc3cffff700 (LWP 10360) "httpd.worker" 0x00007fc3e27e5f7d in __lll_lock_wait ()
   from /lib64/libpthread.so.0
  8    Thread 0x7fc3cf7fe700 (LWP 10361) "httpd.worker" 0x00007fc3e27e5f7d in __lll_lock_wait ()
   from /lib64/libpthread.so.0
  7    Thread 0x7fc3ceffd700 (LWP 10362) "httpd.worker" 0x00007fc3e27e5f7d in __lll_lock_wait ()
   from /lib64/libpthread.so.0
  6    Thread 0x7fc3ce7fc700 (LWP 10363) "httpd.worker" 0x00007fc3e27e5f7d in __lll_lock_wait ()
   from /lib64/libpthread.so.0
  5    Thread 0x7fc3cdffb700 (LWP 10364) "httpd.worker" 0x00007fc3e27e5f7d in __lll_lock_wait ()
   from /lib64/libpthread.so.0
  4    Thread 0x7fc3cd7fa700 (LWP 10365) "httpd.worker" 0x00007fc3e27e3705 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
  3    Thread 0x7fc3bbfff700 (LWP 10372) "httpd.worker" 0x00007fc3e2300933 in select () from /lib64/libc.so.6
  2    Thread 0x7fc3bb7fe700 (LWP 10373) "httpd.worker" 0x00007fc3e27e5f7d in __lll_lock_wait ()
   from /lib64/libpthread.so.0
* 1    Thread 0x7fc3e3fa5840 (LWP 10353) "httpd.worker" 0x00007fc3e27e625d in read () from /lib64/libpthread.so.0

Best Answer

What you are seeing looks like what is a known bug in Postgres client library.

When there are concurrent threads and you get an SSL Postgres connection made which overlaps with a SSL connection for something like a web service, Postgres client library can cause a deadlock.

I am not sure when/if Postgres is being fixed, but there was a recent workaround made to psycopg2 for it. The psycopg2 issue for it is:

Related Topic