Nginx – DNS failure crashing web site – CURL requests

curldomain-name-systemnginxPHP

My PHP keeps throwing random errors which relate to slow scripts. nginx/php5-fpm report in their error logs that the child has been terminated due to script timeout.

PHP5-FPM slow error log reports the problem is with the curl requests i make.

Confused because i use connect timeouts on all my curl requests e.g.

CURLOPT_CONNECTTIMEOUT 3 #number of seconds trying to connect
CURLOPT_TIMEOUT 3 #number of seconds to run curl for

This is a production server so the values are low, we expect a respond within 1 second.

So i realised that the URL's we use for the CURL must be failing at the DNS level.

So i simulated a bad DNS or an unavailable DNS and i was able to reproduce the same errors and slow script messages in my log.

So, how do i deal with this?

1) CURLOPT_DNS_CACHE_TIMEOUT – but increase the default from 2 minutes to say 10 minutes?

However, i did see that in my errors i had these slow scripts for 2 minutes and 3 seconds exactly… which suggests the CURLOPT_DNS_CACHE_TIMEOUT default of 2 minutes is caching a bad response which is killing it for however long the cache is set for. The timings seemed to convenient to ignore but i could be wrong. Would it cache a bad DNS / unavailable DNS?

2) Edit my DNS nameservers somehow to allow multiple DNS servers.

However i edited my resolv.conf file and added openDNS to the existing nameservers. But i put a fake DNS first 1.1.1.1 and it still failed giving me a bad gateway error on the site. Why isn't it picking another server to get a response?

I thought adding multiple servers would sort it but it is still reading the first in the list and failing. I think the curl cache might be caching bad responses but i'm not sure. Ideally i keep the cache to reduce the latency to lookup the IP.

Any suggestions on how i could solve an unavailable DNS server from crashing my php script / curl requests?

thanks

Best Answer

You may want to consider taking DNS out of the hands of curl and running your own local caching, resolver.

You can set CURLOPT_DNS_USE_GLOBAL_CACHE to false to turn off caching then use a local DNS cache to manage your lookups.

I am not sure of the throughput you are handling but I've found handling our own DNS at the DNS server level is often faster than relying on the caching internal to many application stack.

See: http://devblog.moz.com/2011/02/high-performance-libcurl-tips/

Also, this may be better asked over at stackoverflow since I think it is your application layer rather than the server level configuration causing the problem.