Showing posts with label Caching. Show all posts
Showing posts with label Caching. Show all posts

Monday, August 12, 2024

Perils of Holiday Code Freezes

Holiday Code Freeze has now become an age-old practice where deployments are stopped, and it's aimed to reduce any risk of new bugs or issues in the system when a majority of support staff is away. However, this also means the servers and infrastructure are left untouched for several weeks, and surprisingly, in my experience, this can be a bit chaotic. 

Application and Database Leaks

First and foremost, the most common error noticed is the Application leak. I recall one particular instance where an e-commerce application began to slow down significantly a week into the holiday break. The application retained references to several objects that were not required, causing the heap memory to fill up gradually. As the memory usage increased, the application became sluggish, eventually leading to crashes and "out of memory" errors.

Leaks can also happen when connecting to a database. In another example, the same e-commerce application began experiencing intermittent outages. The root cause was traced to a connection leak where the application was not releasing database connections after it was used. As the number of open connections grew, the database server eventually refused new connections, causing the application to crash. 

Similarly, I have also experienced code freeze situations, where thread leaks were consuming system resources and slowing down the application. This typically happens when threads are created but not terminated. 

Array index out-of-bounds errors

Another issue I recently encountered during a recent freeze was the Array index out-of-bounds error. The application was a CMS, and system downtime started in the middle of a week when an application tried to access an index in an array that didn't exist. It happened due to unexpected input and data changes not accounted for in the custom code.

Array Index out-of-bound exceptions can also be caused by data mismatch when interacting with external services or APIs, not under code freeze. Once, during a holiday season, a financial reporting application began throwing array index out-of-bounds exceptions. The root cause was traced back to an external data feed that had changed its format. The application was expecting a certain number of fields, but the external feed had added additional fields, causing the application to attempt to access non-existent indices. It led to errors that took the application offline until a patch was deployed after the freeze.

Cache Corruption

Cache corruption is another potential way of bringing down heavy cache-dependent applications. In online real-time applications, caches improve the application performances, but, on several occasions, I have seen over time, if not cleared, caches can become corrupt, leading to stale and receiving of incorrect data.

COnclusion

While it's funny that IT stakeholders think that the code freeze aims to maintain stability, in most cases, they expose underlying issues that might not be apparent during regular operations. The even more funnier thing is that a majority of the time, these issues are resolved by a simple server restart.

Tuesday, February 6, 2018

Caching Strategies for Improved Website Performance

When it comes to performance, the first thing that comes to mind is Caching. The concept of Caching is complex to implement for a website, and we need to have the right balance with the underlying infrastructure. We need to understand the in-depth end-to-end data communications, their bandwidths, and how they get stored at each layer. 

One general mistake development teams typically make is looking at code optimization randomly to solve all caching issues. While this helps resolve some performance bottlenecks and memory issues in legacy code, it does not eliminate the core problems. 

The first and foremost fundamental point is that caching strategies differ between Static and Dynamic contents. Content Delivery Network serving contents is a no-brainer, and applications having a CDN make it easy to cache static contents for a specific amount of time on edge layers closest to the end-user. 

CDN, however, makes it tricky to implement when it comes to semi-dynamic or dynamic pages. It is much easier to shift the caching responsibilities for the dynamic functionalities closer to the application layers. 

The second tricky point is that, the Cache at the application layer influences the health of the underlying infrastructure. After any cache tweaks, it is always better to review the application server sizing after analyzing the cached data amount, cache size, cache duration, cache hit ratio, etc. Even caching at the database layer influences the underlying infrastructure. Query caches can be heavy and sometimes unnecessary, as all modern ORMs already have an internal cache implemented.

While it's tempting to resort to code optimization as a quick fix, the fundamental lesson learned is implementing caching tailored to the nature of content—static or dynamic. Also, the application layer involves a delicate balance, influencing the overall health of the underlying infrastructure. By approaching caching strategically and understanding its nuanced impact on the entire ecosystem, development teams can navigate performance challenges effectively and deliver a seamless experience to end users.

Building Microservices by decreasing Entropy and increasing Negentropy - Series Part 5

Microservice’s journey is all about gradually overhaul, every time you make a change you need to keep the system in a better state or the ...