Architecture – Using distributed locks in microservice environment

Architecturedistributed computingmicroservicesredisservice-bus

I'm developing distributed system and trying to use best practices of microservice architecture.
I was faced with a situation when I think I need something like distributed locks. Since I have not so many experience in microservices and still not sure about the final solution. I need any suggestions, thoughts, best practices, possible articles that can help me (now I'm investigating the issue and already have possible solutions).

I'm using Azure Service Bus to organize pub \ sub communication and Redis as shared cache in my system.

Here is an abstract schema of what I actually trying to do
.
Explanation:
I have queue with incoming data (dataFlow). The format is the following {dataOwnerId: "1", dataOwnerSubId: "1", data: "status"}
The dataOwnerId is an aggregator for a 1-n sub owners. So it's 1 to many relationship.

The owners periodically (let say each 1 minute) push new status (data) to the queue related to certain sub-owner. So if the dataOwnerId related to
dataOwnerSubIds [1, 2, 3]. The message can be {1, 1, status} or {1, 2, status} or {1, 3, status} and after 1 minutes {1, 1, status2} or {1, 2, status2} or {1, 3, status3} etc

The queue has 1-n consumers (dataConsumers) which processed data from queue and storing to DB. Data order is non-deterministic, the first available consumer takes the first message from queue. So the dataConsumer1-N can handle the data from owner1-N and sub-owner1-N.

For example in one time the dataConsumer1 can handle the data from owner1 and sub-ownerN the next time he can handle the data from ownerN and sub-owner1.

Requirements:
The possibility of auto adding new data owners and data sub-owners to the system during queue processing required.

Possible solution:
I can have situation when the dataOwnerN + 1 (new customer) already push data to the queue but he is still not in the system. In this case the dataConsumers will skip this new owner (skip storing the data) and push message NewOwnerEvent to ServiceBus.

The messages will be handled by dataOwnerManagers which working with dataOwners. The dataOwnerManagers responsibilities are creating the new dataOwners and related dataSubOwners

Problem:
In one time dataConsumers can process data of two (for example) the same NEW (should be added to the system before processing) dataOwners with same or different dataSubOwners for example: {1, 1, status} or {1, 2, status} . In this case the two NewOwnerEvent will be created and dataOwnerManagers can start creating the same NEW dataOwner record.
Same situation can be for dataSubOwners but here we have at least 1 minute delay for creation. So if the first event will be {1, 1, status}, the second event {1, 1, status2} will be after 1 minute and it almost impossible that those 2 events will be handled in one time in dataConsumers.

So I need to prevent the resending NewOwnerEvent related to the same dataOwner – if the dataConsumer1 already send the NewOwnerEvent related to NEW owner1, I must notify (add possibility to check that the same event already exists) all other dataConsumers, that the owner1 have already been added to the queue and will be created soon. So I need some shared list of already sent/processed NewOwnerEvent events/dataOwners

Possible implementation:
I tried using Redis to solve this problem. I’m adding the key each time I'm pushing the NewOwnerEvent event, so all other data processors can check this key to understand was the event already pushed or not. But the problem here is that in one time different dataConsumers can take the message from the same owner {1, 1, status} and {1, 2, status} and try to push event at the same time, because the key in Redis won't be created yet(dataConsumer2 check faster then dataConsuer1 add the key in Redis). So I need distributed lock to be sure that the Redis key will be added only by one dataConsumers at one time. I know that Redis already have implementation for distributed lock but I still confusing and guess it will be very complex solution in my case.

UPD:

I'm "developing the distributed system" because the requirements are availability and scalability.
I think I can't provide more details here. I tried to use abstraction for better understanding the common situation.

I can add here that the dataConsumers is part of Azure EventHub. Azure EventHub supports up to 32 partitions. The dataConsumer is the consumer of separate partition. As I said earlier “Data order is non-deterministic”. So, since I'm using EventHub I can't simplify at least the left part on my scheme.

dataConsumer is console app and just process (calibrate) and store the data flow.

The dataOwnerManager is Web API manages new data owners in system and all related settings. I decided that it will be separate bounded context. But I'm not sure about the dataConsumer.

Finally I decided that the dataConsumer can be part of the same context that dataOwnerManager and use the same DB storage. So I can remove the message queue in my scheme and just create shared service \ library that will create the new data owners in both dataConsumers (if necessary) and in dataOwnerManager through UI. Or just copy the code in both places.

But since the dataConsumers can handle the data from the same dataOwners in parallel. I still can get the situation when dataConsumer1 and dataConsumer2 will try to add the same NEW dataOwner in my DB.

One of the possible solution here is using unique key for dataOwner assigned Id(or use this assigned Id as primary key instead of surrogate key). Then I can handle the exception "dataOwner with this Id already exists" and forget it. Maybe it will be the best solution in my case. So, the NEW dataOwner will be added and all other attempts will be rejected on DB level I just need to handle it and the distributed locks not required.

But maybe somebody knows a better solution in this case. I still have the question can I do it in a better way?

Best Answer

As about your concerns with Redis (the lock is created before the actual Redis entry was created), I have encountered the same problem in my application.

What I ended up doing is using Redis Lua scripts, which are considered as a single atomic operation. I wrote a Lua script that tries to get the cached entry, if it doesn't exist, I try to acquire a lock on that Redis key (by trying to create a new cached entry named {myKey}_lock ). If lock acquisition also failed then the Lua script returns null, which indicates both actions has failed.

I just gave you an example, my point here is that you should take a look on Redis Lua scripts and see how they can help you achieve your required solutions while relying on Redis as your persistent store.

See more info here https://redis.io/commands/eval

Related Topic