Serialization – Protobuf Design Patterns

protobufserialization

I am evaluating Google Protocol Buffers for a Java based service (but am expecting language agnostic patterns). I have two questions:

The first is a broad general question:

What patterns are we seeing people use? Said patterns being related to class organization (e.g., messages per .proto file, packaging, and distribution) and message definition (e.g., repeated fields vs. repeated encapsulated fields*) etc.

There is very little information of this sort on the Google Protobuf Help pages and public blogs while there is a ton of information for established protocols such as XML.

I also have specific questions over the following two different patterns:

Represent messages in .proto files, package them as a separate jar, and ship it to target consumers of the service –which is basically the default approach I guess.

Do the same but also include hand crafted wrappers (not sub-classes!) around each message that implement a contract supporting at least these two methods (T is the wrapper class, V is the message class (using generics but simplified syntax for brevity):

public V toProtobufMessage() {
    V.Builder builder = V.newBuilder();
    for (Item item : getItemList()) {
        builder.addItem(item);
    }
    return builder.setAmountPayable(getAmountPayable()).
                   setShippingAddress(getShippingAddress()).
                   build();
}

public static T fromProtobufMessage(V message_) { 
    return new T(message_.getShippingAddress(), 
                 message_.getItemList(),
                 message_.getAmountPayable());
}

One advantage I see with (2) is that I can hide away the complexities introduced by V.newBuilder().addField().build() and add some meaningful methods such as isOpenForTrade() or isAddressInFreeDeliveryZone() etc. in my wrappers. The second advantage I see with (2) is that my clients deal with immutable objects (something I can enforce in the wrapper class).

One disadvantage I see with (2) is that I duplicate code and have to sync up my wrapper classes with .proto files.

Does anyone have better techniques or further critiques on any of the two approaches?

*By encapsulating a repeated field I mean messages such as this one:

message ItemList {
    repeated item = 1;
}

message CustomerInvoice {
    required ShippingAddress address = 1;
    required ItemList = 2;
    required double amountPayable = 3;
}

instead of messages such as this one:

message CustomerInvoice {
    required ShippingAddress address = 1;
    repeated Item item = 2;
    required double amountPayable = 3;
}

I like the latter but am happy to hear arguments against it.

Best Answer

Where I work, the decision was taken to conceal the use of protobuf. We don't distribute the .proto files between applications, but rather, any application that exposes a protobuf interface exports a client library which can talk to it.

I have only worked on one of these protobuf-exposing applications, but in that, each protobuf message corresponds to some concept in the domain. For each concept, there is a normal Java interface. There is then a converter class, which can take an instance of an implementation and construct an appropriate message object, and take a message object and construct an instance of an implementation of the interface (as it happens, usually a simple anonymous or local class defined inside the converter). The protobuf-generated message classes and converters together form a library which is used by both the application and the client library; the client library adds a small amount of code for setting up connections and sending and receiving messages.

Client applications then import the client library, and provide implementations of any interfaces they wish to send. Indeed, both sides do the latter thing.

To clarify, that means that if you have a request-response cycle where the client is sending a party invitation, and the server is responding with an RSVP, then the things involved are:

PartyInvitation message, written in the .proto file
PartyInvitationMessage class, generated by protoc
PartyInvitation interface, defined in the shared library
ActualPartyInvitation, a concrete implementation of PartyInvitation defined by the client app (not actually called that!)
StubPartyInvitation, a simple implementation of PartyInvitation defined by the shared library
PartyInvitationConverter, which can convert a PartyInvitation to a PartyInvitationMessage, and a PartyInvitationMessage to a StubPartyInvitation
RSVP message, written in the .proto file
RSVPMessage class, generated by protoc
RSVP interface, defined in the shared library
ActualRSVP, a concrete implementation of RSVP defined by the server app (also not actually called that!)
StubRSVP, a simple implementation of RSVP defined by the shared library
RSVPConverter, which can convert an RSVP to an RSVPMessage, and an RSVPMessage to a StubRSVP

The reason we have separate actual and stub implementations is that the actual implementations are generally JPA-mapped entity classes; the server either creates and persists them, or queries them up from the database, then hands them off to the protobuf layer to be transmitted. It wasn't felt that it was appropriate to be creating instances of those classes on the receiving side of the connection, because they wouldn't be tied to a persistence context. Furthermore, the entities often contain rather more data than is transmitted over the wire, so it wouldn't even be possible to create intact objects on the receiving side. I am not entirely convinced that this was the right move, because it has left us with one more class per message than we would otherwise have.

Indeed, I am not entirely convinced that using protobuf at all was a good idea; if we'd stuck with plain old RMI and serialization, we wouldn't have had to create nearly as many objects. In many cases, we could just have marked our entity classes as serializable and got on with it.

Now, having said all that, I have a friend who works at Google, on a codebase that makes heavy use of protobuf for communication between modules. They take a completely different approach: they don't wrap the generated message classes at all, and enthusiastically pass them deep(ish) into their code. This is seen as a good thing, because it's a simple of way of keeping interfaces flexible. There is no scaffolding code to keep in sync when messages evolve, and the generated classes provide all the necessary hasFoo() methods needed for receiving code to detect the presence or absence of fields that have been added over time. Bear in mind, though, that people who work at Google tend to be (a) rather clever and (b) a bit nuts.

Best Answer

Related Solutions

Serialization – Moving from JSON to Protobuf, Is It Worth It?

C++ – Is it rational to convert protobuf into json to send it to a web server

Related Topic