Wednesday, January 30, 2019

Solving Legacy System Database issues inside a Common Datacenter

When dealing with customized applications, especially in legacy COTS products, there are hundreds of tables created. Hence, resolving database issues keeps getting complicated as the systems get old.

This week, I encountered a system downtime where the database was the culprit. Database issues are typically tricky to solve, as, in most organizations, these get handled by a different DB team. Also, in an on-premise network setup inside a data center, the databases physically are several layers apart from the applications that call them.

Also, different network components and web application firewalls affect the database call performances. We were lucky to identify the network components using the SolarWinds Orion monitoring and management platform. Otherwise, it would have been impossible to understand what happened at the network layer.

We used an Oracle database, so it was easy to generate an Automatic workload repository report (AWR), which provided detailed Query Statistics, Server Activity, Disk Usage, thread counts, session information, etc. The report included details on expensive queries, table index optimization, and server upgrade recommendations. Getting an AWR report regularly also helps identify any out-of-ordinary system behavior.

Then came the most complex puzzle of caching the database responses at the application layer. An out-of-box caching implemented at the ORM level caused a very high memory during peak load times. Typically, these issues are due to underestimating the long-term sizing requirements. 

In the end, the root cause of the issue was a clogged network due to the high bandwidth consumed by a multi-TB file transfer over the network. It resulted in several applications in the network using the same infrastructure to slow down. 

In conclusion, the legacy database issues are really tricky to solve. The closer these are to the application layer, the easier it is to identify the bottlenecks. Also, another lesson learned here was the importance of enhancing end-to-end monitoring and real-time load testing in the production environment from time to time. 

Tuesday, February 6, 2018

Caching Strategies for Improved Website Performance

When it comes to performance, the first thing that comes to mind is Caching. The concept of Caching is complex to implement for a website, and we need to have the right balance with the underlying infrastructure. We need to understand the in-depth end-to-end data communications, their bandwidths, and how they get stored at each layer. 

One general mistake development teams typically make is looking at code optimization randomly to solve all caching issues. While this helps resolve some performance bottlenecks and memory issues in legacy code, it does not eliminate the core problems. 

The first and foremost fundamental point is that caching strategies differ between Static and Dynamic contents. Content Delivery Network serving contents is a no-brainer, and applications having a CDN make it easy to cache static contents for a specific amount of time on edge layers closest to the end-user. 

CDN, however, makes it tricky to implement when it comes to semi-dynamic or dynamic pages. It is much easier to shift the caching responsibilities for the dynamic functionalities closer to the application layers. 

The second tricky point is that, the Cache at the application layer influences the health of the underlying infrastructure. After any cache tweaks, it is always better to review the application server sizing after analyzing the cached data amount, cache size, cache duration, cache hit ratio, etc. Even caching at the database layer influences the underlying infrastructure. Query caches can be heavy and sometimes unnecessary, as all modern ORMs already have an internal cache implemented.

While it's tempting to resort to code optimization as a quick fix, the fundamental lesson learned is implementing caching tailored to the nature of content—static or dynamic. Also, the application layer involves a delicate balance, influencing the overall health of the underlying infrastructure. By approaching caching strategically and understanding its nuanced impact on the entire ecosystem, development teams can navigate performance challenges effectively and deliver a seamless experience to end users.

Building Microservices by decreasing Entropy and increasing Negentropy - Series Part 5

Microservice’s journey is all about gradually overhaul, every time you make a change you need to keep the system in a better state or the ...