Alternative Locking Strategies for Concurrency in Databases

I'm having a problem with the design of my application, that neither optimistic nor pessimistic locking tends to solve. Here is a simplified/altered version of the problem that describes the situation.

Premise of the problem:

A document-processing application
A database where documents are stored
Multiple clients simultaneously access the database

What I try to achieve:

Say there are 10 unprocessed documents in the database. The first client requests the number of unprocessed documents. The response is 10. He requests 6 of them and starts processing. Before he finishes the second client requests the number of unprocessed documents. There is no point for him to process the same documents as the first client, so I want the program to answer "There are 4 unprocessed documents".

The problem if I optimistically lock those 6 documents:

In the particular case of this application transaction will take a long time and there is a big chance that other client still thinks that there are 10 unprocessed documents, not 4.

The problem if I pessimistically lock those 6 documents:

Again, the transaction takes a long time to complete and the other client will get stuck waiting for an answer about the number of documents.

The problem if I shorten the transaction and commit a status change before processing:

There is a chance that a power outage occurs, as a result 6 documents have a status "exported for processing", but in reality they aren't.

What alternative/custom locking strategy should be selected for this kind of scenario with concurrent access and long transactions?

Best Answer

I would try to solve this with (at least) four document states

unprocessed
export started (but not under processing)
under processing (means: after export finished)
processed

Moreover, it will be probably a good idea to add a timestamp field to remember when the last state was changed, and as @Bart pointed out, an additional field which user caused the state change.

The state changes itself should be done for each document individual, in a single, short, atomic transaction. For example, before the export starts, set the state acccordingly. After the export was successfully finished, you set the state accordingly, too. When the document was processed, again. This gives you fine-grained control over the transactions and will allow individual failure-recovery.

For example, assumed there is a power outage, or a network outage in the middle of an export of 100 documents, this will leave you with about 50 docs in state "export started" and 50 docs "under processing". Using the timestamp field, you can now easily detect when there are documents for which the state is still "export started" for more than, lets say, 15 minutes, and reset those docs back to "unprocessed" after this time out. You can also reset docs from "under processing" back to unprocessed when the processing takes more than, lets say, 5 days. The check for these pending state may be done in a separate background program, running, for example, every minute, or every five minutes, whatever suits your process best.

Actually, "locking of those documents" should mean: all docs with the state different from "unprocessed" are locked for export & processing from others - but not by using a long lasting "database lock mechanism". The database should always stay responsive and tell you how many documents of which state are stored. To my understanding, this is still called a "pessimistic lock" strategy (since the same document is not to be processed by two different users at the same time), but with a failover strategy.

Best Answer

Related Solutions

REST API – Is REST Limited to Optimistic Concurrency Control?

Related Topic