C++ – Serializing Network Messages with Boost

boostcnetworking

I am writing a network wrapper around boost::asio and was wondering what is a good and simple way to serialize my messages. I have a message factory which can take care of dispatching the data to the correct builder, but I want to know if there are any established solutions for getting the binary data on the sender side and consequently passing the data for deserialization on the receiver end.

Some options I've explored are: passing a pointer to a char[] to the serialize/deserialize functions (for serialize to write to, and deserialize to read from), but it's difficult to enforce buffer size this way; building on that, I decided to have the serialize function return a boost::asio::mutable_buffer, however ownership of the memory gets blurred between multiple classes, as the network wrapper needs to clean up the memory allocated by the message builder.

I have also seen solutions involving streambuf's and stringstream's, but manipulating binary data in terms of its string representation is something I want to avoid. Is there some sort of binary stream I can use instead? What I am looking for is a solution (preferrably using boost libs) that lets the message builder dictate the amount of memory allocated during serialization and what that would look like in terms of passing the data around between the wrapper and message factory/message builders.

PS. Messages contain almost exclusively built-in types and PODs and form a shallow but wide hierarchy for the sake of going through a factory.

Note: a link to examples of using boost::serialization for something like this would be appreciated as I'm having difficulties figuring out the relation between it and buffers.

Best Answer

Since you can't have your cake and eat it (see comment thread for the question), I've decided on the following solution:

The network wrapper will keep a boost::asio::streambuf, while serialization/deserialization will happen over ostream/istream. This way the message builders don't have to deal with memory management, can use as much/as little as they need and can choose whether they want to serialize as binary or text. As an added benefit, I can also easily serialize messages to/deserialize from a file for record/replay during debugging.

Related Solutions

C++ – Memory Management for Fast Message Passing Between Threads

If the objects are small and simple, copy by value might be the fastest. However, I fear that it forces unnecessary limitations on the implementation of the supported messages, so I want to avoid it.

If you can anticipate an upper bound char buf[256], e.g. A practical alternative if you cannot which only invokes heap allocations in the rare cases:

struct Message
{
    // Stores the message data.
    char buf[256];

    // Points to 'buf' if it fits, heap otherwise.
    char* data;
};

Distributed Computing – Using Remote Heterogeneous Machines

There are a number of important missing bits of information:

Why is OpenMPI relevant?
Why is heterogeneous relevant?
What is the work that the server is orchestrating?
You mention one thread per connection is a problem.

If you're using Boost::Asio and you're having the problem of one thread per client, then you're likely doing something wrong. Make sure that you're doing things asynchronously and not blocking any one thread too long. This is probably the easiest thing for you to do at the moment.

OpenMPI is not what you need. MPI was designed originally for distributed memory architectures. It is possible to use it over the internet, but it is probably not the best choice. The fact you mention heterogeneous and MPI makes me think that you've done a CS course in HPC. That is not relevant in this case.

When deciding on a protocol and architecture, the type work that the clients are doing is important, how you share state between the clients and server and how long each message takes to process.

If you're optimizing for throughput, then a per message overhead is less important, so verbose serialization formats like XML and JSON are ok. You can even tear down connections between message processing jobs - meaning that the server does not have to maintain a thread.

If you're trying to keep things low latency, then per message overhead is important, so maintaining a connection and using terse serialization formats.

A Messages Broker is an architectural pattern that you could consider. It is responsible for distributing your messages between your applications.

You're better off using internet-friendly protocols, like HTTP. You don't want to worry about Proxies or NAT traversal. The internet can be considered a massive distributed system with millions of heterogeneous clients. REST is an architectural style inspired by that so it might be appropriate.

I'm going to assume that:

the server produces work item for clients to do and aggregates responses.
your clients receives a message, does some work and sends a response back.

My default choice of technology, would be a web server to implement this. It would have two URLs. One where the clients can GET new work, the other for clients to POST responses.

When you initially start the client it should periodically poll for new work (you could use WS or Long Poll HTTP requests). When the server returns a work item, one client should consume it. Each work item should be stamped with a unique id, so that you can ensure that their are no duplicates and correlate responses with the initial job.

When the client has completed the task it should POST the response and restart the process of polling for new work.

Web Servers don't work well for long requests so you would want to hand it off to something else, perhaps via a DB, IPC or a message queue.

The server then does not have to worry about the capabilities of each client (so it makes heterogeneous irrelevant). The clients will just consume at the rate they can.

Best Answer

Related Solutions

C++ – Memory Management for Fast Message Passing Between Threads

Distributed Computing – Using Remote Heterogeneous Machines

Related Topic