Handling Optimistic Locking on a Collection with ETag Headers

concurrencyhttplanguage-agnosticrest

Consider endpoint /projects that returns a list of projects with the following headers:

HTTP/1.1 200 OK
Etag: "superEtag"

The etag value represents a hash of the entire collection and it does not allow a client to update a single resource e.g. /projects/1.

Fetching the resources individually makes no sense, so how can I handle optimistic locking with a collection?

Best Answer

When doing GET /projects, the ETag corresponds to the hash of the collection. Now if I want to do PUT /projects/1, I need the hash of this specific resource for the conditional request (If-Match) to be successful. I could do GET /projects/{id} to get the individual hash for each resource, but it makes no sense; the collection service would become useless.

I think the problem is that HTTP doesn't mean what you want it to mean.

Fundamentally, the semantics of HTTP are that the resources are stored in a flat key value store. Although /collection and /collection/item are hierarchical identifiers (we can use relative resolution to get from one to the other), the resources that they identify are not hierarchical. There's no relationship inferred from the similar spelling of the identifiers.

This is why DELETE /collection doesn't do anything to your locally cached copy of /collection/item.

Because there is no inferred relationship between the collection and the item, there is no generic vector available for communicating the eTag of the item(s) in the meta-data for the collection.

You can certainly do either of

GET /collection
Conditional PUT /collection

GET /collection/item
Conditional PUT /collection/item

and the origin server can, at its discretion also change the representation of the other resource, as a side effect.

This isn't to say that you can't communicate the information by hand - there's nothing against the rules about returning a representation of the collection that communicates the appropriate representations of the member items, along with their validators, so that a "smart" client can create the correct requests without needing to get the individual items.

What do you mean by "the resources that they identify are not hierarchical"?

Disclaimer: all analogies are non-normative; what's real is what's described in the specifications.

The semantics of HTTP resources are not quite like those of a file system. For example, if we issue the following command on linux

rm -rf /collection

then one of the effects that we would expect is the removal of /collection/item. But that's not true of HTTP!

DELETE /collection

doesn't say anything at all about the resource /collection/item. It might be that when the server processes this request, the side effects might affect other resources. But HTTP isn't describing implementations, it is only assigning meaning to the messages. The meaning of the request message is constrained by the target resource only.

Another way of saying the same thing: as far as HTTP is concerned, none of these identifiers is "wrong" for an item in a collection.

/collection/item
/item/collection
/f5add126-65ef-4122-8657-03e672f159c4

Some of the server frameworks we use to implement our servers care; for instance, Rails has opinions on spelling. But those are really just implementation details behind the uniform interface.

So yes, in your domain model the project entities and the tracks entities may form a hierarchy, and you might choose spellings for the resource identifiers that reflect that hierarchy, but the semantics of HTTP are those of a flat key value store.

# Example #1: hierarchical key value store

echo ; cat <<EOF | python
d={}
d["/collection"]={}
d["/collection"]["/item"]=456
d.pop("/collection")
print(d)
EOF

{}

# Example #2: flat key value store

echo ; cat <<EOF | python
d={}
d["/collection"]={}
d["/collection/item"]=456
d.pop("/collection")
print(d)
EOF

{'/collection/item': 456}

HTTP acts like the second example.

Related Topic