There's no silver bullet
In practice it depends...
tl;dr - easy solution, use nginx...
Blocking:
For instance, Apache by default uses a blocking scheme where the process is forked for every connection. That means every connection needs its own memory space and the sheer amount of context-switching overhead increases more as the number of connections increases. But the benefit is, once a connection is closed the context can be disposed and any/all memory can be easily retrieved.
A multi-threaded approach would be similar in that the overhead of context switching increases with the number of connections but may be more memory efficient in a shared context. The problem with such an approach is it's difficult to manage shared memory in a manner that's safe. The approaches to overcome memory synchronization problems often include their own overhead, for instance locking may freeze the main thread on CPU-intensive loads, and using immutable types adds a lot of unnecessary copying of data.
AFAIK, using a multi-process approach on a blocking HTTP server is generally preferred because it's safer/simpler to manage/recovery memory in a manner that's safe. Garbage collection becomes a non-issue when recovering memory is as simple as stopping a process. For long-running processes (ie a daemon) that characteristic is especially important.
While context-switching overhead may seem insignificant with a small number of workers, the disadvantages become more relevant as the load scales up to hundreds-to-thousands of concurrent connections. At best, context switching scales O(n) to the number of workers present but in practice it's most-likely worse.
Where servers that use blocking may not be the ideal choice for IO heavy loads, they are ideal for CPU-intensive work and message passing is kept to a minumum.
Non-Blocking:
Non-blocking would be something like Node.js or nginx. These are especially known for scaling to a much larger number of connections per node under IO-intensive load. Basically, once people hit the upper limit of what thread/process-based servers could handle they started to explore alternative options. This is otherwise known as the C10K problem (ie the ability to handle 10,000 concurrent connections).
Non-blocking async servers generally shares a lot of characteristics with a multi-threaded-with-locking approach in that you have to be careful to avoid CPU-intensive loads because you don't want to overload the main thread. The advantage is that the overhead incurred by context switching is essentially eliminated and with only one context message passing becomes a non-issue.
While it may not work for many networking protocols, HTTPs stateless nature works especially well for non-blocking architectures. By using the combination of a reverse-proxy and multiple non-blocking HTTP servers it's possible to identify and route around the nodes experiencing heavy load.
Even on a server that only has one node, it's very common for the setup to include one server per processor core to maximize throughput.
Both:
The 'ideal' use case would be a combination of both. A reverse proxy at the front dedicated to routing requests at the top, then a mix of blocking and non-blocking servers. Non-blocking for IO tasks like serving static content, cache content, html content. Blocking for CPU-heavy tasks like encoding images/video, streaming content, number crunching, database writes, etc.
In your case:
If you're just checking headers but not actually processing the requests, what you're essentially describing is a reverse proxy. In such a case I'd definitely go with an async approach.
I'd suggest checking out the documentation for the nginx built-in reverse proxy.
Aside:
I read the write-up from the link you provided and it makes sense that async was a poor choice for their particular implementation. The issue can be summed up in one statement.
Found that when switching between clients, the code for saving and restoring values/state was difficult
They were building a state-ful platform. In such a case, an async approach would mean that you'd have to constantly save/load the state every time the context switches (ie when an event fires). In addition, on the SMTP side they're doing a lot of CPU-intensive work.
It sounds like they had a pretty poor grasp of async and, as a result, made a lot of bad assumptions.
Is it correct that only one ServerSocket may bind to a Port? I take it there's no way to have a pool of objects accepting Socket connections?
Correct. Only one thing can bind to a particular port on a given network interface at a time. Whenever a connection is received, a new socket is automatically created for the client on a deferred/internally-routed port. This is how a single listener can spawn any number of client handlers without waiting for the previous one to disconnect.
I have two ideas for passing messages between ClientProxy objects. Either directly between ClientProxy objects that are "buddies" or via a central "Exchange" object, or better yet, pool of objects.
You can have clients receive a proxy to their their buddies' message queues whenever the buddies come online. They can then directly queue messages to each other, avoiding the need of a centralized exchange. The server would keep a reference to all of its online clients and notify the interested parties of their friends' connections/disconnections via the queue.
Scaling this to fill a single box would become a matter of adding network interfaces (or using a port range on the one interface). Scaling this across multiple boxes would require server synchronization / passing queue proxies to-and-fro willy nilly.
Best Answer
I created a server with a limited number of max threads before. The solution is to put a cap on the lifetime of the open connections and/or the number of requests that a connection will run before being closed by the server. The client then simply gets back in line to request another connection. This can only work if your client requests are independent and do not require a long lived connection (which should be the case). I used blocking I/O, but had a timeout on receiving request data.
In my case, I allowed an HTTP connection (handled by a newly spawned thread) to process up to 10 requests and live up to 2 seconds (final request runs to completion, of course), then the thread finishes. This ensures fairness. I used a counting semaphore to limit the number of open connections / threads. I also provided a means for multiple server processes so that in case a server process crashed (which didn't happen), requests would simply go to another process until the failed one had restarted. I could update the software live that way, sending a hangup signal to tell the server to restart.
I had a way of monitoring connection status across all of the servers and it worked very smoothly and well. Unix did all the heavy lifting, I just had to learn about and take advantage of what it provided. This was back in 1999.