Sockets – Unix Domain Socket: Using datagram communication between one server process and several client processes

cdatagramsocketsunix-socket

I would like to establish an IPC connection between several processes on Linux. I have never used UNIX sockets before, and thus I don't know if this is the correct approach to this problem.

One process receives data (unformated, binary) and shall distribute this data via a local AF_UNIX socket using the datagram protocol (i.e. similar to UDP with AF_INET). The data sent from this process to a local Unix socket shall be received by multiple clients listening on the same socket. The number of receivers may vary.

To achieve this the following code is used to create a socket and send data to it (the server process):

struct sockaddr_un ipcFile;
memset(&ipcFile, 0, sizeof(ipcFile));
ipcFile.sun_family = AF_UNIX;
strcpy(ipcFile.sun_path, filename.c_str());

int socket = socket(AF_UNIX, SOCK_DGRAM, 0);
bind(socket, (struct sockaddr *) &ipcFile, sizeof(ipcFile));
...
// buf contains the data, buflen contains the number of bytes
int bytes = write(socket, buf, buflen);
...
close(socket);
unlink(ipcFile.sun_path);

This write returns -1 with errno reporting ENOTCONN ("Transport endpoint is not connected"). I guess this is because no receiving process is currently listening to this local socket, correct?

Then, I tried to create a client who connects to this socket.

struct sockaddr_un ipcFile;
memset(&ipcFile, 0, sizeof(ipcFile));
ipcFile.sun_family = AF_UNIX;
strcpy(ipcFile.sun_path, filename.c_str());

int socket = socket(AF_UNIX, SOCK_DGRAM, 0);
bind(socket, (struct sockaddr *) &ipcFile, sizeof(ipcFile));
...
char buf[1024];
int bytes = read(socket, buf, sizeof(buf));
...
close(socket);

Here, the bind fails ("Address already in use"). So, do I need to set some socket options, or is this generally the wrong approach?

Thanks in advance for any comments / solutions!

Best Answer

There's a trick to using Unix Domain Socket with datagram configuration. Unlike stream sockets (tcp or unix domain socket), datagram sockets need endpoints defined for both the server AND the client. When one establishes a connection in stream sockets, an endpoint for the client is implicitly created by the operating system. Whether this corresponds to an ephemeral TCP/UDP port, or a temporary inode for the unix domain, the endpoint for the client is created for you. Thats why you don't normally need to issue a call to bind() for stream sockets in the client.

The reason you're seeing "Address already in use" is because you're telling the client to bind to the same address as the server. bind() is about asserting external identity. Two sockets can't normally have the same name.

With datagram sockets, specifically unix domain datagram sockets, the client has to bind() to its own endpoint, then connect() to the server's endpoint. Here is your client code, slightly modified, with some other goodies thrown in:

char * server_filename = "/tmp/socket-server";
char * client_filename = "/tmp/socket-client";

struct sockaddr_un server_addr;
struct sockaddr_un client_addr;
memset(&server_addr, 0, sizeof(server_addr));
server_addr.sun_family = AF_UNIX;
strncpy(server_addr.sun_path, server_filename, 104); // XXX: should be limited to about 104 characters, system dependent

memset(&client_addr, 0, sizeof(client_addr));
client_addr.sun_family = AF_UNIX;
strncpy(client_addr.sun_path, client_filename, 104);

// get socket
int sockfd = socket(AF_UNIX, SOCK_DGRAM, 0);

// bind client to client_filename
bind(sockfd, (struct sockaddr *) &client_addr, sizeof(client_addr));

// connect client to server_filename
connect(sockfd, (struct sockaddr *) &server_addr, sizeof(server_addr));

...
char buf[1024];
int bytes = read(sockfd, buf, sizeof(buf));
...
close(sockfd);

At this point your socket should be fully setup. I think theoretically you can use read()/write(), but usually I'd use send()/recv() for datagram sockets.

Normally you'll want to check error after each of these calls and issue a perror() afterwards. It will greatly aid you when things go wrong. In general, use a pattern like this:

if ((sockfd = socket(AF_UNIX, SOCK_DGRAM, 0)) < 0) {
    perror("socket failed");
}

This goes for pretty much any C system calls.

The best reference for this is Steven's "Unix Network Programming". In the 3rd edition, section 15.4, pages 415-419 show some examples and lists many of the caveats.

By the way, in reference to

I guess this is because no receiving process is currently listening to this local socket, correct?

I think you're right about the ENOTCONN error from write() in the server. A UDP socket would normally not complain because it has no facility to know if the client process is listening. However, unix domain datagram sockets are different. In fact, the write() will actually block if the client's receive buffer is full rather than drop the packet. This makes unix domain datagram sockets much superior to UDP for IPC because UDP will most certainly drop packets when under load, even on localhost. On the other hand, it means you have to be careful with fast writers and slow readers.

Edit:

Yes, it's not portable and not POSIX, but we work on real platforms, don't we?

The thing is that you have to zero-terminate the path and the above code is as good as sizeof( struct sockaddr_un ) but might save you a few bytes when copying from user to kernel but wastes a few cycles in strlen.

Look at how Linux handles that length (from http://lxr.linux.no/linux+v2.6.32/net/unix/af_unix.c#L200):

static int unix_mkname(struct sockaddr_un *sunaddr, int len, unsigned *hashp)
{
    if (len <= sizeof(short) || len > sizeof(*sunaddr))
        return -EINVAL;
    if (!sunaddr || sunaddr->sun_family != AF_UNIX)
        return -EINVAL;
    if (sunaddr->sun_path[0]) {
        /*
         * This may look like an off by one error but it is a bit more
         * subtle. 108 is the longest valid AF_UNIX path for a binding.
         * sun_path[108] doesnt as such exist.  However in kernel space
         * we are guaranteed that it is a valid memory location in our
         * kernel address buffer.
         */
        ((char *)sunaddr)[len] = 0;
        len = strlen(sunaddr->sun_path)+1+sizeof(short);
        return len;
    }

    *hashp = unix_hash_fold(csum_partial(sunaddr, len, 0));
    return len;
}

Here len is directly from third argument to bind system call, but sunaddr is already copied into kernel space with that length. You can't have address longer then sizeof( sockadd_un ). Kernel does the strlen anyway.

So yes, doing sizeof( sockaddr_un ) is probably safer across the board, but telling kernel exact length doesn't hurt either.

Sockets – How to know whether any process is bound to a Unix domain socket

I know I am very late to the party and that this was answered a long time ago but I just encountered this searching for something else and I have an alternate proposal.

When you encounter the EADDRINUSE return from bind() you can enter an error checking routine that connects to the socket. If the connection succeeds, there is a running process that is at least alive enough to have done the accept(). This strikes me as being the simplest and most portable way of achieving what you want to achieve. It has drawbacks in that the server that created the UDS in the first place may actually still be running but "stuck" somehow and unable to do an accept(), so this solution certainly isn't fool-proof, but it is a step in the right direction I think.

If the connect() fails then go ahead and unlink() the endpoint and try the bind() again.

Best Answer

Related Solutions

Sockets – Proper length of an AF_UNIX socket when calling bind()

Edit:

Sockets – How to know whether any process is bound to a Unix domain socket

Related Topic