Java – Choosing Between Socket and SocketChannel

javaserversockets

I am designing a server for a remote file store in Java and trying to decide whether to use asynchronous java.nio.SocketChannels or the older blocking model with Sockets. What kind of factors should I consider when choosing between these two server architectures and what kinds of tasks would be better suited to one versus the other?

Specifically, I am wondering how these respective approaches compare as:

the number of connected clients scale up,
the length of a client connection increases, and
the amount of data transferred increases.

Also, are there any other factors that I should be aware of that would favor one approach over the other?

Best Answer

Why are you trying to solve an already solved problem? There already exists at least two good ones:

Grizzly

Writing scalable server applications in the Java™ programming language has always been difficult. Before the advent of the Java New I/O API (NIO), thread management issues made it impossible for a server to scale to thousands of users. The Grizzly NIO framework has been designed to help developers to take advantage of the Java™ NIO API. Grizzly’s goal is to help developers to build scalable and robust servers using NIO as well as offering extended framework components: Web Framework (HTTP/S), WebSocket, Comet, and more!

Netty

Netty is a NIO client server framework which enables quick and easy development of network applications such as protocol servers and clients. It greatly simplifies and streamlines network programming such as TCP and UDP socket server.

'Quick and easy' doesn't mean that a resulting application will suffer from a maintainability or a performance issue. Netty has been designed carefully with the experiences earned from the implementation of a lot of protocols such as FTP, SMTP, HTTP, and various binary and text-based legacy protocols. As a result, Netty has succeeded to find a way to achieve ease of development, performance, stability, and flexibility without a compromise.

It sounds to me like you are less interested in solving the problem yourself than you are in just having the problem be solved - so you can have your remote file store. Personally, I use Netty in my application and it works great. Don't reinvent the wheel - use someone else's.

Related Solutions

Java – Retry design for high volume

The issue here is with throttling. When the system comes up, the application needs to be designed in such a way as not to be overwhelmed on both the publisher and consumer.

You could get clever with your algorithm. If you have the ability to classify a message by priority then the failed messages could be saved with a lower priority. So after the publisher has publishing a new message, it can look into the lower priority queue to check if any failed messages need republishing and republish them.

This is one well known approach to throttling messages. I am sure there are other throttling algorithms that can be applied here based on your specific needs.

Socket on a webserver

Sockets are file descriptors with special abilities. While every socket somehow uses a port, they are not the same thing.

A socket is identified by a local address+port and a remote address+port. That means the same local port can be part of multiple sockets if the remote part is different.

A TCP server (such as a web server process) listens on a local port. Here, the local address only controls who can connect to this port: everyone, or only connections from localhost. The remote address of a listening socket is zero, which means no connection. Here I've started a python3 -m http.server on localhost port 7001:

tcp  127.0.0.1:7001   0.0.0.0:*       LISTEN        32143/python3

When I connect to that web server via my web browser, we see two additional sockets:

tcp  127.0.0.1:7001   0.0.0.0:*        LISTEN       32143/python3
tcp  127.0.0.1:50204  127.0.0.1:7001   ESTABLISHED  1658/firefox
tcp  127.0.0.1:7001   127.0.0.1:50204  ESTABLISHED  32143/python3

(data obtained via netstat, and edited for clarity)

The Firefox browser created a socket to connect() to the server. Firefox uses port 50204 in this case, so its socket is identified as local 127.0.0.1:50204 remote 127.0.0.1:7001. When the server accept()ed the connection, this connection got its own socket, which is basically the reverse of the client socket: local 127.0.0.1:7001 remote 127.0.0.1:50204. The local port is the same port the server is listening to.

The client socket and server connection socket always mirror each other, although in reality the server often sees a different client IP+port due to network address translation (NAT).

Why can the server use the same port for all connections? Well, every TCP/IP packet contains the IP+port of the sender and receiver. When the server operating system gets a connection request from a client for some port, the connection will usually be refused unless a server process is listening on that port. In that case, the server process may accept the connection and we get a socket representing that connection.

For all subsequently received TCP packets, the OS will look at the addresses and see whether they match an established socket connection. If so, the packet content is stored in a buffer that can be read from the socket file descriptor by the server process. When the server writes to the socket connection file descriptor, the OS knows the local and remote address, and can therefore create a TCP packet with the appropriate metadata.

So the sockets are entries in a lookup table used by the OS to translate between file descriptors and network addresses/ports.