Friday, January 10, 2020

Dealing with Concurrency Issues in large applications

The last few days have been hectic dealing with concurrency issues with our monolith application during the peak traffic period.

Concurrency issues are not easy to resolve, especially when you have an application with thousands of files. The error was in the order pipeline during checkout when hundreds of custom pipelines execute in parallel. When the error occurred, all the previous transactions got revoked. 

Since the issue happened for the first time, to begin with, we just ignored the error, hoping it did not crop up again. As the traffic increased, the errors also increased simultaneously, and every error in the log pointed to a concurrent exception.

We did not have much logging, and that's when we started evaluating every table in the transaction and their relationships. We got the list of all the tables, and there were like close to 100 tables getting accessed. We decided to split the table in terms of read-only and write. Once we got the number of tables that were getting updated, we tried pinpointing the tables that had a foreign-key relationship. That further filtered the number of tables where the issue could potentially be present.

Lastly, on further analysis came across a table where locking was a possibility. Meanwhile, enabling logs gave details about concurrency errors on the same set of tables. The first thing noticed was there was no last-modified timestamp column on these tables. Then went back to the application code and added an explicit locking in the code and a check for validating the last modified timestamp.

All this took a week to resolve, and the issues made me realize how difficult it is to eradicate concurrency in systems. Years later, when I look back at this article, it will be a surprise not to have come across the same issues again.



Building Microservices by decreasing Entropy and increasing Negentropy - Series Part 5

Microservice’s journey is all about gradually overhaul, every time you make a change you need to keep the system in a better state or the ...