Linux – Debugging slow wordpress site with strace

apache-2.2linuxPHPstrace

I'm trying to get to the bottom of a wordpress site that runs very slowly: on requesting a page, it hangs for around 20 seconds before the page starts to load.

I've tried the usual things: WP debug mode, checking for slow MySQL queries, disabling plugins, checking Apache isn't queuing requests etc, with no joy. I've now resorted to watching the Apache process (I'm running PHP as mod_php) with strace in an attempt to see what is going on. Unfortunately my kernel-fu is weak, and I need some help intepreting the strace output.

I'm running strace with -r, if I'm understanding the man pages correctly, this would indicate the time the previous sys call took.

Looking through the output, I see a few calls taking 5 seconds eg:

 0.000114 accept4(4, {sa_family=AF_INET6, sin6_port=htons(46242), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28], SOCK_CLOEXEC) = 13
 0.000116 getsockname(13, {sa_family=AF_INET6, sin6_port=htons(80), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
 0.000109 fcntl(13, F_GETFL)        = 0x2 (flags O_RDWR)
 0.000061 fcntl(13, F_SETFL, O_RDWR|O_NONBLOCK) = 0
 0.000078 read(13, "GET /wp-content/uploads/fake-ici"..., 8000) = 593
 0.000144 stat("/var/www/wp-content/uploads/fake-icicles-500x500.jpg", {st_mode=S_IFREG|0777, st_size=34241, ...}) = 0
 0.000197 open("/var/www/wp-content/uploads/fake-icicles-500x500.jpg", O_RDONLY|O_CLOEXEC) = 14
 0.000100 close(14)                 = 0
 0.000097 read(13, 0x7f66f397b048, 8000) = -1 EAGAIN (Resource temporarily unavailable)
 0.000079 writev(13, [{"HTTP/1.1 304 Not Modified\r\nDate:"..., 182}], 1) = 182
 0.000161 write(8, "127.0.0.1 - - [26/Oct/2016:14:17"..., 235) = 235
 0.000112 times({tms_utime=3107, tms_stime=70, tms_cutime=0, tms_cstime=0}) = 1802798719
 0.000090 poll([{fd=13, events=POLLIN}], 1, 5000) = 0 (Timeout)
 5.005144 shutdown(13, SHUT_WR)     = 0
 0.000130 poll([{fd=13, events=POLLIN}], 1, 2000) = 1 ([{fd=13, revents=POLLIN|POLLHUP}])
 1.000750 read(13, "", 512)         = 0

My interpretation would be that polling the file descriptor with id 13 took 5 seconds, and had timed out (4th line from the end). 13 looks like the ID of the socket that has been opened for writing the HTTP response back to the client. Is this correct? Is there anything else I can learn from the above? Should I be concerned about the 'resource temporarily unavailable' on the read() midway through my output?

Best Answer

Yes, your interpretation is right.

Actually as i understand it, you're seeing an HTTP KeepAlive.

What happens:

You recieve a connection and accept it, with file descriptor = 13:

 0.000114 accept4(4, {sa_family=AF_INET6, sin6_port=htons(46242), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28], SOCK_CLOEXEC) = 13

You recieve a HTTP GET request:

 0.000078 read(13, "GET /wp-content/uploads/fake-ici"..., 8000) = 593

You write the response, in this case a 304 "Not modified", not the entire file as the request is a conditional GET, probably due to If-Modified-Since as i don't se a read to the opened file, not sure though):

 0.000079 writev(13, [{"HTTP/1.1 304 Not Modified\r\nDate:"..., 182}], 1) = 182

You write an entry to the log file:

 0.000161 write(8, "127.0.0.1 - - [26/Oct/2016:14:17"..., 235) = 235

Then you wait for another request to the same socket due to KeepAlive, which by default in Apache is 5 seconds:

 0.000090 poll([{fd=13, events=POLLIN}], 1, 5000) = 0 (Timeout)

No other request arrives during the keepalive interval so you start closing the TCP connection:

 5.005144 shutdown(13, SHUT_WR)     = 0

As for read returning -EAGAIN (Resource temporarily unavailable) it's normal, check here for a better description than mine: https://stackoverflow.com/questions/10318191/reading-socket-eagain-resource-temporarily-unavailable#10318268