Showing posts with label Development. Show all posts
Showing posts with label Development. Show all posts

Monday, August 12, 2024

Perils of Holiday Code Freezes

Holiday Code Freeze has now become an age-old practice where deployments are stopped, and it's aimed to reduce any risk of new bugs or issues in the system when a majority of support staff is away. However, this also means the servers and infrastructure are left untouched for several weeks, and surprisingly, in my experience, this can be a bit chaotic. 

Application and Database Leaks

First and foremost, the most common error noticed is the Application leak. I recall one particular instance where an e-commerce application began to slow down significantly a week into the holiday break. The application retained references to several objects that were not required, causing the heap memory to fill up gradually. As the memory usage increased, the application became sluggish, eventually leading to crashes and "out of memory" errors.

Leaks can also happen when connecting to a database. In another example, the same e-commerce application began experiencing intermittent outages. The root cause was traced to a connection leak where the application was not releasing database connections after it was used. As the number of open connections grew, the database server eventually refused new connections, causing the application to crash. 

Similarly, I have also experienced code freeze situations, where thread leaks were consuming system resources and slowing down the application. This typically happens when threads are created but not terminated. 

Array index out-of-bounds errors

Another issue I recently encountered during a recent freeze was the Array index out-of-bounds error. The application was a CMS, and system downtime started in the middle of a week when an application tried to access an index in an array that didn't exist. It happened due to unexpected input and data changes not accounted for in the custom code.

Array Index out-of-bound exceptions can also be caused by data mismatch when interacting with external services or APIs, not under code freeze. Once, during a holiday season, a financial reporting application began throwing array index out-of-bounds exceptions. The root cause was traced back to an external data feed that had changed its format. The application was expecting a certain number of fields, but the external feed had added additional fields, causing the application to attempt to access non-existent indices. It led to errors that took the application offline until a patch was deployed after the freeze.

Cache Corruption

Cache corruption is another potential way of bringing down heavy cache-dependent applications. In online real-time applications, caches improve the application performances, but, on several occasions, I have seen over time, if not cleared, caches can become corrupt, leading to stale and receiving of incorrect data.

COnclusion

While it's funny that IT stakeholders think that the code freeze aims to maintain stability, in most cases, they expose underlying issues that might not be apparent during regular operations. The even more funnier thing is that a majority of the time, these issues are resolved by a simple server restart.

Building Microservices by decreasing Entropy and increasing Negentropy - Series Part 5

Microservice’s journey is all about gradually overhaul, every time you make a change you need to keep the system in a better state or the ...