Showing posts with label Legacy Applications. Show all posts
Showing posts with label Legacy Applications. Show all posts

Wednesday, January 30, 2019

Solving Legacy System Database issues inside a Common Datacenter

When dealing with customized applications, especially in legacy COTS products, there are hundreds of tables created. Hence, resolving database issues keeps getting complicated as the systems get old.

This week, I encountered a system downtime where the database was the culprit. Database issues are typically tricky to solve, as, in most organizations, these get handled by a different DB team. Also, in an on-premise network setup inside a data center, the databases physically are several layers apart from the applications that call them.

Also, different network components and web application firewalls affect the database call performances. We were lucky to identify the network components using the SolarWinds Orion monitoring and management platform. Otherwise, it would have been impossible to understand what happened at the network layer.

We used an Oracle database, so it was easy to generate an Automatic workload repository report (AWR), which provided detailed Query Statistics, Server Activity, Disk Usage, thread counts, session information, etc. The report included details on expensive queries, table index optimization, and server upgrade recommendations. Getting an AWR report regularly also helps identify any out-of-ordinary system behavior.

Then came the most complex puzzle of caching the database responses at the application layer. An out-of-box caching implemented at the ORM level caused a very high memory during peak load times. Typically, these issues are due to underestimating the long-term sizing requirements. 

In the end, the root cause of the issue was a clogged network due to the high bandwidth consumed by a multi-TB file transfer over the network. It resulted in several applications in the network using the same infrastructure to slow down. 

In conclusion, the legacy database issues are really tricky to solve. The closer these are to the application layer, the easier it is to identify the bottlenecks. Also, another lesson learned here was the importance of enhancing end-to-end monitoring and real-time load testing in the production environment from time to time. 

Building Microservices by decreasing Entropy and increasing Negentropy - Series Part 5

Microservice’s journey is all about gradually overhaul, every time you make a change you need to keep the system in a better state or the ...