Distributed Locking

What is distributed locking?

With loosely coupled distributed systems, several instances of a microservice might be accessing the same shared resource. For example, several instances of a microservice might attempt to write to the same database.

We have two kinds of locks:

  • Optimistic: instead of blocking a potentially invalid process, the process continues, in the hope that everything will work as expected. If not an error is returned. 
    An example of optimistic concurrency is versioning in a database system.
  • Pessimistic: block access to the resource before operating on it; perform the operation and release the lock at the end.
    For pessimistic concurrency we rely on a third party system, that will hold the locks for our microservices.

Why do we need it?

Distributed locking tries to resolve the following two problems:

  1. Efficiency: 
    Ensure that the request will only be processed once, by single instance of the microservice.
  2. Correctness:
    If several instances of a microservice try to perform the same operation simultaneously, then we might experience data loss, data corruption or data inconsistency.
    For example, imagine an inventory system where several users are trying to purchase the same item. While parallel processes try to update the inventory at the same time, this might lead to over-selling, as the availability of the product is in accurate.

Implementation

In this post we will focus only on the pessimistic concurrency locks.

The logic goes as follows, every time, a microservice needs to perform an operation, firstly it needs to acquire a lease for locking the shared resource (eg. database) for x amount of seconds. If the resource is available, then is locked for x amount of seconds. If not the process reties to acquire a lease for x amount of seconds.

For the implementation we could use a data storage platform such as Azure Storage. In that case we could use a file, and lock/unlock it accordingly.

Conclusion

As a safeguard, the resource should be unlocked after x amount of minutes to avoid starvation.

Moreover, when an operation tries to acquire a lease, the retry mechanism should expire after x amount of seconds, to avoid blocking.

SiteLock