Centos – Curl returning error 52 or 56 with REST API call spanning more than 5 minutes


So I have been trying to figure this one out for about a week now. Here is the run down:

I use CURL in PHP to pull data from an API. As the response to the API call gets bigger (pulling over 15k records at once) I noticed that any API call that takes 5 or more minutes (within a few seconds) fails to return on my CentOS and Suse servers. So, I tested the API call from the CLI via CURL and I get the same issue. Oddly, if I run the CURL command via OS X the command runs fine and returns after ~7 or so minutes.

Here is the command (creds censored) I am running via CURL:

curl -m 0 -k --trace-ascii trace.txt --trace-time -X GET -H "tenant-code: 1cmPx7tqVDVTdN1GSelwycFUmICmASnLCmNQsV72" -H "Authorization: Basic JxHAsXeUiHMRkS8Msiu6pWb3PvY20p6am3QvXCY3knXTAntlxTBS3EyEDgly" -H "Content-Type: application/json" -H "Cache-Control: no-cache" 'https://api.endpoint.com/API/v1/system/users/search?groupid=555' > dump.txt

And here is the version output from CURL for each platform:

CentOS (this is where I really need this to work)-

curl 7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.19.1 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2
Protocols: tftp ftp telnet dict ldap ldaps http file https ftps scp sftp 
Features: GSS-Negotiate IDN IPv6 Largefile NTLM SSL libz 


curl 7.19.7 (x86_64-suse-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8j zlib/1.2.7 libidn/1.10
Protocols: tftp ftp telnet dict ldap ldaps http file https ftps 
Features: GSS-Negotiate IDN IPv6 Largefile NTLM SSL libz


curl 7.37.1 (x86_64-apple-darwin14.0) libcurl/7.37.1 SecureTransport zlib/1.2.5
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp smtp smtps telnet tftp 
Features: AsynchDNS GSS-Negotiate IPv6 Largefile NTLM NTLM_WB SSL libz 

And these are the error codes I get from Centos:

curl: (56) SSL read: errno -5961

I can't find that code referenced in the documentation.

I get a slightly different error from Suse:

curl: (52) SSL read: error:00000000:lib(0):func(0):reason(0), errno 104

The error 104 lead me to believe that the server is stopping/resetting the connection but the server side logs don't show it being dropped and OS X can pull the data without an issue. I even tried spoofing the User Agent to make sure that was not the issue.

So, at this point I assume that the SSL package SecureTransport is doing something that OpenSSL and NSS are not doing. The question is what and if not, what is the issue?

Best Answer

Run the curl command on the MacOSX machine but don't redirect the output, let it stream to your shell window. Watch to see if there appears to be any buffering involved, IE, do you get output from the start, a bit at a time, or do you get nothing for 5 minutes and then a flood of data all at once?

Run the curl command again on a machine where it times out, and compare the behavior. If your output is being buffered by some background process on the API server, you may not get results until it finishes its query. Something between your client application, your client's OS, the server's OS, the server's REST api, and the SSL between them likely has a timeout value of non-zero, and if that timer doesn't see any data flowing for 5 minutes, it may close your connection without saying much about why. I see this happen a lot in HTTP-based services. In perl I habitually put a $|=1; at the top of the code to disable output buffering on the server side.

It is also possible that a 3rd party device such as a Cisco ASA may have NAT rules timing out and triggering issues. I have this issue with AMANDA backups that are attempting to read from a client on the external side of an ASA. If the client takes too long to return its size estimates through the ASA back to the AMANDA server, the ASA drops its dynamic NAT rule and the backup fails. This suggestion is worth investigating if the MacOSX that works doesn't have a firewall between it and the API server but the ones that fail do have one.

It wouldn't surprise me in the least if MacOSX has a timeout value set somewhere to 0 (wait forever) where Linux defaults to something with a limit of 60 or 90 seconds.

Related Topic