Mysql – Error 2006: MySQL server has gone away

MySQLpythonuwsgi

I'm running a Python Pyramid app on a CentOS server using uWSGI and nginx. I'm using SQLAlchemy as an ORM, MySQLdb as the API, and MySQL as a database. The site hasn't gone live yet, so the only traffic is me and some other employees of the company. We purchased some data to populate the database, so the largest (and most frequently queried) table is ~150,000 rows.

Yesterday I opened four new tabs of the website in quick succession, and I got back a couple 502 Bad Gateway errors. I looked in the uWSGI log, and found the following:

sqlalchemy.exc.OperationalError: (OperationalError) (2006, 'MySQL server has gone away') 'SELECT ge...

Important note: This error is not due to MySQL's wait_timeout. Been there, done that.

I wondered if the issue was caused by concurrent requests being served simultaneously. I made myself a poor man's load tester:

for i in {1..10}; do (curl -o /dev/null http://domain.com &); done;

Sure enough, within those ten requests at least one would throw a 2006 error, oftentimes more. Sometimes the errors would get even stranger, for example:

sqlalchemy.exc.NoSuchColumnError: "Could not locate column in row for column 'table.id'"

When the column most definitely exists and worked fine on all the other identical requests. Or, this one:

sqlalchemy.exc.ResourceClosedError: This result object does not return rows. It has been closed automatically.

When, once again, it worked fine for all the other requests.

To further verify that the problem stemmed from concurrent database connections, I set uWSGI to a single worker and multi-threading disabled, forcing the requests to be processed one at a time. Sure enough, the problems disappeared.

In an attempt to find the problem, I set up an error log for MySQL. With the exception of some notices during MySQL starting up, it remains empty.

Here is my MySQL config:

[mysqld]
default-storage-engine = myisam
key_buffer = 1M
query_cache_size = 1M
query_cache_limit = 128k
max_connections=25
thread_cache=1
skip-innodb
query_cache_min_res_unit=0
tmp_table_size = 1M
max_heap_table_size = 1M
table_cache=256
concurrent_insert=2
max_allowed_packet = 1M
sort_buffer_size = 64K
read_buffer_size = 256K
read_rnd_buffer_size = 256K
net_buffer_length = 2K
thread_stack = 64K
innodb_file_per_table=1
log-error=/var/log/mysql/error.log

Heavy Googling on the error revealed little, but suggested that I increase max_allowed_packet. I increased it to 100M and restarted MySQL, but that didn't help at all.

To summarize: Concurrent connections to MySQL cause 2006, 'MySQL server has gone away' and some other strange errors. There is nothing of relevance in MySQL's error log.

I have been working at this for hours and haven't made any progress. Can somebody help me out?

Best Answer

I've encountered this as well and found the reason and fix.

The reason this happens is that the python uwsgi plugin (or more likely all uwsgi plugins) fork() new workers after the application is loaded in the parent. As a result, the children inherit all resources (including file descriptors like the db connection) from the parent.

You can read about this briefly on the uwsgi wiki :

uWSGI tries to abuse fork() copy on write whenever possible. By default it will fork after having loaded your applications. If you do not want that behaviour use the --lazy option. Enabling it, will instruct uWSGI to load the applications after each worker's fork()

And as you may know, Python's mysqldb connections and cursors are not threadsafe unless you explicitly protect them. Therefore multiple processes (such as the uwsgi workers) using the same mysql connection/cursor concurrently will corrupt it.

In my case (for the King Arthur's Gold API) this worked fine when I created the MySQL connection per-request in another module's scope, but when I wanted persistent connections to help with performance I moved the database connection and cursor to the global scope in the parent module. As a result, my connections were stepping on each other like yours are.

The fix to this is to add the "lazy" keyword (or --lazy command line option) to your uwsgi configuration. As a result, the application will be forked anew for each child instead of forking from the parent and sharing the connection (and stepping on it at some point, such that the MySQL server forces it closed due to a corrupt request at some point).

Lastly, if you wanted a way to do this without modifying your uwsgi config, you can likely use the @postfork decorator to properly create a new database connection immediately after a worker process is forked. You can read about that here.

I see from your follow-up that you already switched to pgsql, but here's the answer so you can sleep better at night and for anyone like you and I trying to find the answer to this!

P.S. Once I had an understanding of the problem (the cursor being corrupted due to the workers stepping on each other) but didn't realize the bit about fork() and --lazy, I was considering implementing my own pool where workers would "check out" a mysql connection from a pool at the global scope, then "check back in" just before exiting application(), however it is likely much better to use --lazy unless your web/application load varies enough that you are constantly creating new workers. Even then I might prefer --lazy because it is significantly cleaner than implementing your own db connection pool.

edit: here is a more thorough writeup of this problem+solution since there is a shortage of information on it for others who have encountered it: http://tns.u13.net/?p=190