Nginx – select() hangs due to resource exhaustion – but what resource

debiannginxssh

Connecting to my server via sftp sometimes results in a hang here:

if (select(max+1, rset, wset, NULL, NULL) < 0) {

which is line 1428 from openssh 5.2p1's sftp-server.c (main loop of sftp_server_main()).

The same hang occurs when opening a data connection over e.g. vanilla
FTP. I am sometimes able to get through after a number of seconds or
minutes, but sometimes the connection times out on the client side
before the server is able to respond. When the server does respond and
I am connected, then if I issue e.g. 'ls' it will hang again at the
select() for some time.

ssh is OK; can connect with no delay and issue commands, etc.

I don't think it's socket death:

root@dl:~# cat /proc/net/sockstat
sockets: used 304
TCP: inuse 444 orphan 302 tw 152 alloc 451 mem 5280
UDP: inuse 4
RAW: inuse 0
FRAG: inuse 0 memory 0

root@dl:~# netstat -tan | awk '{print $6}' | sort | uniq -c
    2 CLOSE_WAIT
  121 CLOSING
    1 established)
  109 ESTABLISHED
    17 FIN_WAIT1
    9 FIN_WAIT2
    1 Foreign
  300 LAST_ACK
    20 LISTEN
    2 SYN_RECV
  433 TIME_WAIT

It also doesn't seem to be out of file descriptors but I'm not 100%
sure on that. And even if it were, wouldn't that produce an error, not
hang?

It does seem to be somewhat related to the number of connections
nginx is serving. I can shut down nginx and the problem goes
away. Having said this, nginx and apache are able to coexist in
this state with no problem (apache never hangs). People can also
connect to an IRC server on the same machine with no problem during
these "episodes". So maybe it is limited to select()?

What resource is nginx using that is not sockets/file descriptors
that is causing select() to hang? I am pulling my hair out over this.

I've tried all of the usual network tuning stuff (the various settings
through sysctl, reducing the timeouts), all with no effect. The machine is not out of RAM and CPU and I/O are both fine.

Linux dl 2.6.26-2-486 #1 Sat Jun 11 14:47:34 UTC 2011 i686 GNU/Linux

It's running Debian Lenny.

What might cause select() to hang checking some sockets?

Best Answer

Two things:

  1. A bug in the code calling 'select'.

  2. No information has been received yet.