Nginx – Why is $request_time sometimes much bigger than $upstream_response_time

httpnginxperformance-monitoring

I have an HTTPS website where sometimes, for the same clients, the $request_time is 10x the $upstream_response_time, or even 100x. I understand the different between the 2 times: $request_time is the duration between the first byte received and the last byte sent.

Some users have told me they experienced a Connection Timeout, so I think these long $request_time are real problems.

These long $request_time happen for GET requests (typical request size: 185 bytes). The upstream is a fastcgi process. I wonder under which scenario the $request_time could be too high:

  1. no fastcgi worker is accepting connection, $request_time includes
    the "waiting time" for a fastcgi process
  2. The response is incorrect (wrong Content Length, chunked response) and the client is waiting for data that is not coming
  3. SSL certificate: the client gets our SSL certificate, ask for the OCSP and finishes the SSL connection.

I wonder which options are actually possible and how I would find out what is actually creating long $request_time.

Best Answer

OSCP is an issue every then and there, but i'd investigate more in the timeout/unavailable fastcgi-workers-direction. is this a true heisenbug or does it happens, when it happens, to different users? do you have http-based monitoring (e.g. a true GET -requests via Nagios, Selenium etc, and not just Port 80/443 - check)

debug-steps:

  • copy over your server {} - block and use a different port (for debugging only)
  • adjust very short proxy-read/send-*-timeouts
  • browser your debugging-server when users experience waiting-times and try to catch some timeouts
  • build a parser for your logfiles to catch and analyse long-runners