There are a number of important missing bits of information:
- Why is OpenMPI relevant?
- Why is heterogeneous relevant?
- What is the work that the server is orchestrating?
- You mention one thread per connection is a problem.
If you're using Boost::Asio and you're having the problem of one thread per client, then you're likely doing something wrong. Make sure that you're doing things asynchronously and not blocking any one thread too long. This is probably the easiest thing for you to do at the moment.
OpenMPI is not what you need. MPI was designed originally for distributed memory architectures. It is possible to use it over the internet, but it is probably not the best choice. The fact you mention heterogeneous and MPI makes me think that you've done a CS course in HPC. That is not relevant in this case.
When deciding on a protocol and architecture, the type work that the clients are doing is important, how you share state between the clients and server and how long each message takes to process.
If you're optimizing for throughput, then a per message overhead is less important, so verbose serialization formats like XML and JSON are ok. You can even tear down connections between message processing jobs - meaning that the server does not have to maintain a thread.
If you're trying to keep things low latency, then per message overhead is important, so maintaining a connection and using terse serialization formats.
A Messages Broker is an architectural pattern that you could consider. It is responsible for distributing your messages between your applications.
You're better off using internet-friendly protocols, like HTTP. You don't want to worry about Proxies or NAT traversal. The internet can be considered a massive distributed system with millions of heterogeneous clients. REST is an architectural style inspired by that so it might be appropriate.
I'm going to assume that:
- the server produces work item for clients to do and aggregates responses.
- your clients receives a message, does some work and sends a response back.
My default choice of technology, would be a web server to implement this. It would have two URLs. One where the clients can GET new work, the other for clients to POST responses.
When you initially start the client it should periodically poll for new work (you could use WS or Long Poll HTTP requests). When the server returns a work item, one client should consume it. Each work item should be stamped with a unique id, so that you can ensure that their are no duplicates and correlate responses with the initial job.
When the client has completed the task it should POST the response and restart the process of polling for new work.
Web Servers don't work well for long requests so you would want to hand it off to something else, perhaps via a DB, IPC or a message queue.
The server then does not have to worry about the capabilities of each client (so it makes heterogeneous irrelevant). The clients will just consume at the rate they can.
Best Answer
Good question! Nice to see someone considering such points. Philosophising over computer application and usage is important. What you describe is a Distributed System.
If I were you, I would consider looking at the likes of SETI@Home project and other "screensaver processes" that use redundant CPU cycles to process large amounts of data. Chances are that those guys will have considered this kind of situation before.
The main issues are:
This is the question of "how can I ensure my request gets processed when the node I ask to process the request may fail - and how do I handle such inevitable failures?" (see Jeff Atwood's Coding Horror piece on the Chaos Monkey concept (in today for a possible method of investigation
Memory speed (or CAS) would be so variable that this would mean that any application would have to have enough memory available locally, at the place of use, that it could manage its tasks. In the same way that a Windows machine uses a Page File, so a distributed architecture could use specific nodes as memory caches
There are other considerations (such as security, usability, and yes bandwidth to some extent although you can always add more nodes, etc) but these will suffice as a starting point for you. Good luck with your research.