Python – Cloud SQL periodic issues connecting OperationalError: (2062, ‘Cloud SQL socket open failed with error: Transport endpoint is not connected’)

djangogoogle-cloud-platformpythonsql

Periodically connection attempts to the 2nd Gen MySQL server receives this error:

OperationalError: (2062, 'Cloud SQL socket open failed with error: Transport endpoint is not connected')

This will persist for ~10 minutes and then go away.

The only reference to this error in conjunction with Cloud SQL I could find is that it may be related to connection limits (https://groups.google.com/d/msg/google-cloud-sql-discuss/sdeD17oDBOQ/wtTewl4-EgAJ) but there is very little else going on with the instance outside of one TaskQueue task so I find it unlikely we're hitting even the 12 per-instance limit (and definitely not the 4k overall limit).

The code that causes this exception uses the same DB settings as the rest of the app and is simply trying toSELECT a row by primary key (so a small query).

In the DB logs there are a lot of entries of the error as follows.

[Note] Aborted connection 39643 to db: 'my_schema' user: 'root' host: 'cloudsqlproxy~<instance_ip>' (Got an error reading communication packets)

but I'm not sure if they're related as they occur pretty consistently throughout the day and the above error only occurs at a certain time for ~10 minutes.

This only started after upgrading to 2nd Gen Cloud SQL.

Has anyone else seen this or know some more info?

Best Answer

It turns out that the source of this was connection limits as originally suggested by the Google Groups chat linked in the question.

The bug appeared because the Pipelines API uses webapp while the rest of the app uses Django. Since Django closes connections for us there was no code to handle it manually but those routes that went through webapp and accessed the DB (using Django's connection code) left them open, resulting in the limit eventually being hit.