Sockets – close() is not closing socket properly

cclient-serversocketstcp

I have a multi-threaded server (thread pool) that is handling a large number of requests (up to 500/sec for one node), using 20 threads. There's a listener thread that accepts incoming connections and queues them for the handler threads to process. Once the response is ready, the threads then write out to the client and close the socket. All seemed to be fine until recently, a test client program started hanging randomly after reading the response. After a lot of digging, it seems that the close() from the server is not actually disconnecting the socket. I've added some debugging prints to the code with the file descriptor number and I get this type of output.

Processing request for 21
Writing to 21
Closing 21

The return value of close() is 0, or there would be another debug statement printed. After this output with a client that hangs, lsof is showing an established connection.

SERVER 8160 root 21u IPv4 32754237 TCP localhost:9980->localhost:47530 (ESTABLISHED)

CLIENT 17747 root 12u IPv4 32754228 TCP localhost:47530->localhost:9980 (ESTABLISHED)

It's as if the server never sends the shutdown sequence to the client, and this state hangs until the client is killed, leaving the server side in a close wait state

SERVER 8160 root 21u IPv4 32754237 TCP localhost:9980->localhost:47530 (CLOSE_WAIT)

Also if the client has a timeout specified, it will timeout instead of hanging. I can also manually run

call close(21)

in the server from gdb, and the client will then disconnect. This happens maybe once in 50,000 requests, but might not happen for extended periods.

Linux version: 2.6.21.7-2.fc8xen
Centos version: 5.4 (Final)

socket actions are as follows

SERVER:

int client_socket;
struct sockaddr_in client_addr;
socklen_t client_len = sizeof(client_addr);  

while(true) {
  client_socket = accept(incoming_socket, (struct sockaddr *)&client_addr, &client_len);
  if (client_socket == -1)
    continue;
  /*  insert into queue here for threads to process  */
}

Then the thread picks up the socket and builds the response.

/*  get client_socket from queue  */

/*  processing request here  */

/*  now set to blocking for write; was previously set to non-blocking for reading  */
int flags = fcntl(client_socket, F_GETFL);
if (flags < 0)
  abort();
if (fcntl(client_socket, F_SETFL, flags|O_NONBLOCK) < 0)
  abort();

server_write(client_socket, response_buf, response_length);
server_close(client_socket);

server_write and server_close.

void server_write( int fd, char const *buf, ssize_t len ) {
    printf("Writing to %d\n", fd);
    while(len > 0) {
      ssize_t n = write(fd, buf, len);
      if(n <= 0)
        return;// I don't really care what error happened, we'll just drop the connection
      len -= n;
      buf += n;
    }
  }

void server_close( int fd ) {
    for(uint32_t i=0; i<10; i++) {
      int n = close(fd);
      if(!n) {//closed successfully                                                                                                                                   
        return;
      }
      usleep(100);
    }
    printf("Close failed for %d\n", fd);
  }

CLIENT:

Client side is using libcurl v 7.27.0

CURL *curl = curl_easy_init();
CURLcode res;
curl_easy_setopt( curl, CURLOPT_URL, url);
curl_easy_setopt( curl, CURLOPT_WRITEFUNCTION, write_callback );
curl_easy_setopt( curl, CURLOPT_WRITEDATA, write_tag );

res = curl_easy_perform(curl);

Nothing fancy, just a basic curl connection. Client hangs in tranfer.c (in libcurl) because the socket is not perceived as being closed. It's waiting for more data from the server.

Things I've tried so far:

Shutdown before close

shutdown(fd, SHUT_WR);                                                                                                                                            
char buf[64];                                                                                                                                                     
while(read(fd, buf, 64) > 0);                                                                                                                                         
/*  then close  */

Setting SO_LINGER to close forcibly in 1 second

struct linger l;
l.l_onoff = 1;
l.l_linger = 1;
if (setsockopt(client_socket, SOL_SOCKET, SO_LINGER, &l, sizeof(l)) == -1)
  abort();

These have made no difference. Any ideas would be greatly appreciated.

EDIT — This ended up being a thread-safety issue inside a queue library causing the socket to be handled inappropriately by multiple threads.

Best Answer

Here is some code I've used on many Unix-like systems (e.g SunOS 4, SGI IRIX, HPUX 10.20, CentOS 5, Cygwin) to close a socket:

int getSO_ERROR(int fd) {
   int err = 1;
   socklen_t len = sizeof err;
   if (-1 == getsockopt(fd, SOL_SOCKET, SO_ERROR, (char *)&err, &len))
      FatalError("getSO_ERROR");
   if (err)
      errno = err;              // set errno to the socket SO_ERROR
   return err;
}

void closeSocket(int fd) {      // *not* the Windows closesocket()
   if (fd >= 0) {
      getSO_ERROR(fd); // first clear any errors, which can cause close to fail
      if (shutdown(fd, SHUT_RDWR) < 0) // secondly, terminate the 'reliable' delivery
         if (errno != ENOTCONN && errno != EINVAL) // SGI causes EINVAL
            Perror("shutdown");
      if (close(fd) < 0) // finally call close()
         Perror("close");
   }
}

But the above does not guarantee that any buffered writes are sent.

Graceful close: It took me about 10 years to figure out how to close a socket. But for another 10 years I just lazily called usleep(20000) for a slight delay to 'ensure' that the write buffer was flushed before the close. This obviously is not very clever, because:

The delay was too long most of the time.
The delay was too short some of the time--maybe!
A signal such SIGCHLD could occur to end usleep() (but I usually called usleep() twice to handle this case--a hack).
There was no indication whether this works. But this is perhaps not important if a) hard resets are perfectly ok, and/or b) you have control over both sides of the link.

But doing a proper flush is surprisingly hard. Using SO_LINGER is apparently not the way to go; see for example:

And SIOCOUTQ appears to be Linux-specific.

Note shutdown(fd, SHUT_WR) doesn't stop writing, contrary to its name, and maybe contrary to man 2 shutdown.

This code flushSocketBeforeClose() waits until a read of zero bytes, or until the timer expires. The function haveInput() is a simple wrapper for select(2), and is set to block for up to 1/100th of a second.

bool haveInput(int fd, double timeout) {
   int status;
   fd_set fds;
   struct timeval tv;
   FD_ZERO(&fds);
   FD_SET(fd, &fds);
   tv.tv_sec  = (long)timeout; // cast needed for C++
   tv.tv_usec = (long)((timeout - tv.tv_sec) * 1000000); // 'suseconds_t'

   while (1) {
      if (!(status = select(fd + 1, &fds, 0, 0, &tv)))
         return FALSE;
      else if (status > 0 && FD_ISSET(fd, &fds))
         return TRUE;
      else if (status > 0)
         FatalError("I am confused");
      else if (errno != EINTR)
         FatalError("select"); // tbd EBADF: man page "an error has occurred"
   }
}

bool flushSocketBeforeClose(int fd, double timeout) {
   const double start = getWallTimeEpoch();
   char discard[99];
   ASSERT(SHUT_WR == 1);
   if (shutdown(fd, 1) != -1)
      while (getWallTimeEpoch() < start + timeout)
         while (haveInput(fd, 0.01)) // can block for 0.01 secs
            if (!read(fd, discard, sizeof discard))
               return TRUE; // success!
   return FALSE;
}

Example of use:

   if (!flushSocketBeforeClose(fd, 2.0)) // can block for 2s
       printf("Warning: Cannot gracefully close socket\n");
   closeSocket(fd);

In the above, my getWallTimeEpoch() is similar to time(), and Perror() is a wrapper for perror().

Edit: Some comments:

My first admission is a bit embarrassing. The OP and Nemo challenged the need to clear the internal so_error before close, but I cannot now find any reference for this. The system in question was HPUX 10.20. After a failed connect(), just calling close() did not release the file descriptor, because the system wished to deliver an outstanding error to me. But I, like most people, never bothered to check the return value of close. So I eventually ran out of file descriptors (ulimit -n), which finally got my attention.
(very minor point) One commentator objected to the hard-coded numerical arguments to shutdown(), rather than e.g. SHUT_WR for 1. The simplest answer is that Windows uses different #defines/enums e.g. SD_SEND. And many other writers (e.g. Beej) use constants, as do many legacy systems.
Also, I always, always, set FD_CLOEXEC on all my sockets, since in my applications I never want them passed to a child and, more importantly, I don't want a hung child to impact me.

Sample code to set CLOEXEC:

   static void setFD_CLOEXEC(int fd) {
      int status = fcntl(fd, F_GETFD, 0);
      if (status >= 0)
         status = fcntl(fd, F_SETFD, status | FD_CLOEXEC);
      if (status < 0)
         Perror("Error getting/setting socket FD_CLOEXEC flags");
   }

Summary

A TCP socket is an endpoint instance defined by an IP address and a port in the context of either a particular TCP connection or the listening state.

A port is a virtualisation identifier defining a service endpoint (as distinct from a service instance endpoint aka session identifier).

A TCP socket is not a connection, it is the endpoint of a specific connection.

There can be concurrent connections to a service endpoint, because a connection is identified by both its local and remote endpoints, allowing traffic to be routed to a specific service instance.

There can only be one listener socket for a given address/port combination.

Exposition

This was an interesting question that forced me to re-examine a number of things I thought I knew inside out. You'd think a name like "socket" would be self-explanatory: it was obviously chosen to evoke imagery of the endpoint into which you plug a network cable, there being strong functional parallels. Nevertheless, in network parlance the word "socket" carries so much baggage that a careful re-examination is necessary.

In the broadest possible sense, a port is a point of ingress or egress. Although not used in a networking context, the French word porte literally means door or gateway, further emphasising the fact that ports are transportation endpoints whether you ship data or big steel containers.

For the purpose of this discussion I will limit consideration to the context of TCP-IP networks. The OSI model is all very well but has never been completely implemented, much less widely deployed in high-traffic high-stress conditions.

The combination of an IP address and a port is strictly known as an endpoint and is sometimes called a socket. This usage originates with RFC793, the original TCP specification.

A TCP connection is defined by two endpoints aka sockets.

An endpoint (socket) is defined by the combination of a network address and a port identifier. Note that address/port does not completely identify a socket (more on this later).

The purpose of ports is to differentiate multiple endpoints on a given network address. You could say that a port is a virtualised endpoint. This virtualisation makes multiple concurrent connections on a single network interface possible.

It is the socket pair (the 4-tuple consisting of the client IP address, client port number, server IP address, and server port number) that specifies the two endpoints that uniquely identifies each TCP connection in an internet. (TCP-IP Illustrated Volume 1, W. Richard Stevens)

In most C-derived languages, TCP connections are established and manipulated using methods on an instance of a Socket class. Although it is common to operate on a higher level of abstraction, typically an instance of a NetworkStream class, this generally exposes a reference to a socket object. To the coder this socket object appears to represent the connection because the connection is created and manipulated using methods of the socket object.

In C#, to establish a TCP connection (to an existing listener) first you create a TcpClient. If you don't specify an endpoint to the TcpClient constructor it uses defaults - one way or another the local endpoint is defined. Then you invoke the Connect method on the instance you've created. This method requires a parameter describing the other endpoint.

All this is a bit confusing and leads you to believe that a socket is a connection, which is bollocks. I was labouring under this misapprehension until Richard Dorman asked the question.

Having done a lot of reading and thinking, I'm now convinced that it would make a lot more sense to have a class TcpConnection with a constructor that takes two arguments, LocalEndpoint and RemoteEndpoint. You could probably support a single argument RemoteEndpoint when defaults are acceptable for the local endpoint. This is ambiguous on multihomed computers, but the ambiguity can be resolved using the routing table by selecting the interface with the shortest route to the remote endpoint.

Clarity would be enhanced in other respects, too. A socket is not identified by the combination of IP address and port:

[...]TCP demultiplexes incoming segments using all four values that comprise the local and foreign addresses: destination IP address, destination port number, source IP address, and source port number. TCP cannot determine which process gets an incoming segment by looking at the destination port only. Also, the only one of the [various] endpoints at [a given port number] that will receive incoming connection requests is the one in the listen state. (p255, TCP-IP Illustrated Volume 1, W. Richard Stevens)

As you can see, it is not just possible but quite likely for a network service to have numerous sockets with the same address/port, but only one listener socket on a particular address/port combination. Typical library implementations present a socket class, an instance of which is used to create and manage a connection. This is extremely unfortunate, since it causes confusion and has lead to widespread conflation of the two concepts.

Hagrawal doesn't believe me (see comments) so here's a real sample. I connected a web browser to http://dilbert.com and then ran netstat -an -p tcp. The last six lines of the output contain two examples of the fact that address and port are not enough to uniquely identify a socket. There are two distinct connections between 192.168.1.3 (my workstation) and 54.252.94.236:80 (the remote HTTP server)

  TCP    192.168.1.3:63240      54.252.94.236:80       SYN_SENT
  TCP    192.168.1.3:63241      54.252.94.236:80       SYN_SENT
  TCP    192.168.1.3:63242      207.38.110.62:80       SYN_SENT
  TCP    192.168.1.3:63243      207.38.110.62:80       SYN_SENT
  TCP    192.168.1.3:64161      65.54.225.168:443      ESTABLISHED

Since a socket is the endpoint of a connection, there are two sockets with the address/port combination 207.38.110.62:80 and two more with the address/port combination 54.252.94.236:80.

I think Hagrawal's misunderstanding arises from my very careful use of the word "identifies". I mean "completely, unambiguously and uniquely identifies". In the above sample there are two endpoints with the address/port combination 54.252.94.236:80. If all you have is address and port, you don't have enough information to tell these sockets apart. It's not enough information to identify a socket.

Addendum

Paragraph two of section 2.7 of RFC793 says

A connection is fully specified by the pair of sockets at the ends. A local socket may participate in many connections to different foreign sockets.

This definition of socket is not helpful from a programming perspective because it is not the same as a socket object, which is the endpoint of a particular connection. To a programmer, and most of this question's audience are programmers, this is a vital functional difference.

@plugwash makes a salient observation.

The fundamental problem is that the TCP RFC definition of socket is in conflict with the defintion of socket used by all major operating systems and libraries.

By definition the RFC is correct. When a library misuses terminology, this does not supersede the RFC. Instead, it imposes a burden of responsibility on users of that library to understand both interpretations and to be careful with words and context. Where RFCs do not agree, the most recent and most directly applicable RFC takes precedence.

References

TCP-IP Illustrated Volume 1 The Protocols, W. Richard Stevens, 1994 Addison Wesley
RFC793, Information Sciences Institute, University of Southern California for DARPA
RFC147, The Definition of a Socket, Joel M. Winett, Lincoln Laboratory

C# – How to Abort a Socket in C#

I've managed to simulate this situation:

To do a normal graceful disconnection:

you do:

socket.Shutdown(SocketShutdown.Both);
socket.Close();

However, to do an Abort, you do:

socket.Shutdown(SocketShutdown.Send);
socket.Close();

I think the difference is that the client will not receive any ACK packets and thinks the computer reset or something.

Best Answer

Related Solutions

Sockets – the difference between a port and a socket

Summary

Exposition

Addendum

References

C# – How to Abort a Socket in C#

Related Topic