MySql *completely random* Unable to connect

asp.netmonoMySQLosx-lionubuntu-11.10

This one is killing me. We have an ASP.NET/Mono application on a MySQL database.

For months we've been plagued by a completely random but consistent occurrence of "Unable to connect to any of the specified MySQL hosts". I'm by no means a MySQL pro, and I hope this is something silly.

Specifics:

MySQL 5.5.17
MySQL Connector/Net 6.4.4
Mono 2.10.8
MonoDevelop 2.8.6.5

Connection string: SERVER=localhost,3306;DATABASE=xxx;UID=xxx;PWD=xxx;CONNECTION TIMEOUT=90;Encrypt=false;

Max_connections is set to 150. There are never more than 5 connections, and they are all active (probably due to the pooling).

The application is C# ASP.NET 3.5. Runs solid as a rock for years on Windows/IIS. Nine months ago our market led us to get off Windows, so we (easily I might add) moved the whole thing to Mono/Apache.

The issue only occurs on the Mono side, but that's critical since we've retired our Windows support.

Happens on average about 10 times an hour, but spacing varies wildly.

The issue has been witnessed in the following configurations:

Ubuntu 11/Apache2/mod_mono
OSX Lion/Apacke2/mod_mono
Ubuntu 11/xsp2/MonoDevelop debug server
OSX Lion/xsp2/MonoDevelop debug server

Have messed with keepalive, wait_timeout, connectionreset, etc to no avail. Timings seem to have no effect. The error is instantaneous on conn.open – not after any connection timeout.

Here's a clue – the app absolutely will not run with Pooling=false. Nearly every connection attempt fails. Of course I want pooling on, but I'm not sure why it wouldn't work without.

The application has a built in "heartbeat" (user's session in the database) – every 2 minutes.

The most maddening of all? It absolutely cannot be reproduced on demand in a running environment – it's random. I've examined query sizes, execution times, etc.

In exactly one scenario can I make it happen: On initial startup of an Ubuntu server or OSX, the very first time Mono makes it's first connection to MySQL – it will happen then. After that, it's anybody's guess.

Best Answer

Thanks to good advice here and elsewhere, I ran down a couple issues.

1) no matter what the mysql docs say, I can't seem to get "localhost" to reference the local unix socket file - it always goes to tcp.
2) I additionally noticed every failure is accompanied with "no route to host" buried down in the stack trace.
3) using a conn string like this: "SERVER=/tmp/mysql.sock;DATABASE=xxx;UID=xxx;PWD=xxx;" and setting my server to "skip-networking" I was able to force the app to run on a local socket. I've tested enough to be satisfied the issue does not occur on the socket.

Conclusions?

I guess it's possible that multiple different OSX and Ubuntu instances all have random, flaky network issues causing the occasional "no route to host". I'm surprised the Connector/NET isn't more resilient, and possibly a Mono thing since we never saw it on Windows.

Final Resolution? I put my connection function in a for loop. It will now try 15 times to get a connection before failing. After hours of testing, I still see the issue, but it's never taken more than 11 tries to connect.

I don't like this answer, but I gotta move on!