Showing posts with label Performance. Show all posts
Showing posts with label Performance. Show all posts

Wednesday, November 29, 2023

Mastering the Art of Reading Thread Dumps

I have been for years trying to find a structured way to read thread dumps in production whenever there is an issue. I have often found myself in a wild goose chase, deciphering the cryptic language of thread dumps. These snapshots of thread activities within a running application have so much information, providing insights into performance bottlenecks, resource contention, and High Memory/CPU. 


In this article, I'll share my tips and tricks based on my experience, having read several production thread dumps effectively across multiple projects, demystifying the process for fellow my expert engineers.


Tip 1: Understand the Thread States

Thread states, such as RUNNABLEWAITING, or TIMED_WAITING, offer a quick glimpse into what a thread is currently doing. Mastering these states helps in identifying threads that might be causing performance issues. For instance, a thread stuck in a WAITING state can be a candidate for further investigation.


Tip 2: Identify High CPU Threads

The threads consuming a significant amount of CPU time are often the culprits behind performance degradation. Look for "Top 5 Threads by CPU time" threads and dig into their stack traces. It is where the full stack trace is defined, pinpointing to the exact method or task responsible for the CPU spike.


Tip 3: Leverage Thread Grouping

Grouping threads by their purpose or functionality can simplify the analysis process. In complex applications, the number of threads can be really confusing. Hence, collating or grouping them together can be helpful. For e.g, grouping threads related to database connections, HTTP requests, or background tasks together. This approach often provides a more coherent view of the application's concurrent activities.


Tip 4: Pay Attention to Deadlocks

Deadlocks are the nightmares of multithreaded applications. Thread dumps provide clear indications of deadlock scenarios. Look for threads marked as "BLOCKED" and investigate their dependencies to identify the circular dependencies causing the deadlock.


Tip 5: Explore External Dependencies

Modern applications often rely on external services or APIs. Threads waiting for responses from these external dependencies can significantly impact performance. Identify threads in WAITING states and trace their dependencies to external services.


Tip 6: Utilize Profiling Tools

While thread dumps offer a snapshot of the application state, profiling tools like VisualVM, YourKit, or jVisualVM provide a dynamic and interactive way to analyze thread behavior. These tools allow you to trace thread activities in real time, making it easier to pinpoint performance bottlenecks.


Tip 8: Contextualize with Application Logs

Thread dumps are more powerful when correlated with application logs. Integrate logging within critical sections of your code to capture additional context. This fusion of thread dump analysis and log inspection provides a holistic view of your application's behavior.


In conclusion, reading thread dumps is both an art and a science. It requires a keen eye, a deep understanding of the application's architecture, and the ability to connect the dots between threads and their activities. By mastering this skill, one can unravel the intricacies of their applications, ensuring optimal performance and a seamless user experience.

Wednesday, August 3, 2022

Instilling the idea of Sustainability into Development Teams

Inculcating Green coding practices and patterns with the development team is a modern-day challenge. It can go a long way to reducing the carbon footprints and long-term sustainable goals of an organization. 

Good Green coding practices improve the quality of the software application and directly impact the energy efficiency on which the software applications are running. However, the software developers of today's agile work environment seldom focus away from rapid solution building in reduced sprint cycles. They have all the modern frameworks and libraries at their behest, and writing energy-efficient code is not always the focus. Furthermore, modern data centers and cloud infrastructure provide developers with unlimited resources resulting in high energy consumption and impacting the environment. 

Below are some of the factors that improve the programming practices and can show a drastic impact on the Green Index 

a) Fundamental Programming Practices

Some of the fundamental programming practices start with proper Error and Exception handling. It also includes paying extra attention to the modularity and structure of the code and being prepared for unexpected deviation and behavior, especially when integrating with a different component or system.

b) Efficient Code Development

Efficient code development helps to make the code more readable and maintainable. Efficient code writing includes avoiding memory leaks, high CPU cycles, and managing network and Infrasturc Storage in a proficient manner. It also includes avoiding expensive calls and unnecessary loops, and eliminating unessential operations. 

c) Secured Programming Mindset

A secured programming mindset ensures that the software application has no weak security features or vulnerabilities. Secured programming also includes protecting data, data encoding, and encryption. OWASP vulnerability list awareness and performing timely Penetration testing assessments ensure the application code is compliant with the required level of security.

d) Avoidance of Complexity

A complex code is the least modified and the hardest to follow. A piece of code developed may in the future be modified by several different developers, and hence avoiding complexity when writing code can go a long way to keep the code maintainable. Reducing the cyclomatic complexity of the methods by dividing the code and logic into smaller reusable components helps the code to remain simple and easy to understand. 

e) Clean Architecture concepts

Understanding Clean Architecture concepts is essential to allow changes to the codebases. In a layered architecture, understanding the concerns of tight coupling and weak cohesion helps in reusability, minimal disruption to changes, and avoids rewriting code and capabilities. 

Conclusion

As Architects and developers, it is essential to collect Green metrics on a timely basis and evaluate the compliance and violations of the code. Measuring these coding practices can be done using various static code analysis tools. The tools can further be integrated into the IDE, at the code compilation or deployment layer, or even as a standalone tool. 

With organizations in several industries now focusing on individual sustainability goals, green coding practices have become an integral part of every software developer. The little tweaks to our development approach can immensely contribute to the environmental impact in the long run.

Saturday, December 7, 2019

Quick 5 Point Website Inspection for Peak Holiday Traffic

The holiday season is around the corner and with the code freezes, teams fear to risk any major changes to production to hamper the critical business time of the year.
Below are quick 5 pointer non-code checks that can help prepare and enhance the website holiday traffic and sales.

1) Examine the Infrastructure Sizing

If on-premise servers are running critical applications, recalculate the approximate capacity or how much traffic these applications can hold up.
This is based on different data points and includes the number of requests per second, average application server response times, instances and their details of CPU, cores, threads, sockets, etc. All the required information typically can be gathered from the analytics or monitoring tools.

Calculating the number of Cores?

For doing this all that is needed is to load the CPU information (lscpu) and view the information related to Threads per socket, Cores per socket and several Sockets.

Calculate the maximum load or throughput of the system?

The next step is to calculate the number of average requests to the application servers. This can again be calculated by viewing the data of the monitoring tool. The information needs to be fetched for the most peak traffic or expected peak traffic for an application or website.

Calculate the Average response times?

The next value that needs to be calculated is the average response times from the application server. This information also is available using any monitoring tool or can also be calculated by using the expected average value in seconds.
Now with the required information in place, the number of cores can be calculated using the formulae
Number of cores = Requests per second * average response time in seconds

2) Tweak the Enterprise Search Relevancy

Several Research articles show that Browse and Search pages are the most visited pages as well as the pages where most customers are lost. Several websites spend a lot of money and time on external Search engine optimization and one of the key features that are overlooked is the Relevancy Ranking of products on the website browse and search.
Below are some tips that can help boost the relevant products to the top and bury the least preferred products on the site.

Prioritization of Attributes

Attributes precedence is key for relevancy. There needs to be a clear definition of the data associated with these attributes. Special considerations have to be taken for data like Category Name, Product Name, Descriptions and other keywords, as these attributes form the basis of the Search Engine Index. Custom attributes can be created as required and added to the prioritization list.

Monitoring Search Results

Use analytics to monitor user behavior on the search and browse results. Information like most clicked products and categories, most popular searches for a product and category, error results, zero product search, the search result with auto spell correction, page behavior, etc indicates clearly how relevant results are shown to the user. These data form the basis of boosting the required results and ignoring irrelevant results.

Enterprise Search Engines

The setting of core search functionalities like Synonyms, Thesaurus, Stemming, Spell Corrections, Did you mean? play a vital role in displaying the relevant results. For sites having multiple locales update custom stemming files with the top key search words. Also, it’s very essential to keep building the dictionary as and when new data sets are added.

3) Keep a Close Vigil on the Browser Load times

Page load times matter a lot during the peak traffic time. While application health is one aspect of evaluation, the end customer load times also need to be closely monitored.
Below are a few of the areas that need to be evaluated so that the browser load times are not hampered.

Evaluate the Caching TTL

Retrieving static content over the public network is not cheap. Larger the file size larger is the bandwidth and higher cost and lower response times. Caching of static content in the browser plays a critical role in reducing those server calls resulting in faster performance and quick page loads.
Ensure HTTP cache is enabled which enables servers to direct the browser cache to serve the content for a longer duration.
If CDN is used, reevaluate the TTL for those static contents especially the images.

Avoid those Content Publishes

Every content push to the live site requires clearing of caches. Try avoiding the risks of these content pushes during the peak times and if required ensure that only the specific content cache is cleared.

Avoid those External Scripts Calls

All the third-party integrations that the website refers to especially from the browsers have to be looked at. During peak traffic hours every external script is extremely vulnerable and can bring down the browser load times and in some unfortunate cases, the website as well.

4) Tune Database Performance

Evaluate the Relational database performance to prevent those long-running queries. Below are a few of the basic tips that can help optimize the database in a quick time.

Evaluate the Database Statistics

One of the critical aspects of tuning the database is collecting and understanding the statistics. While the monitoring tools give certain information on the most time-consuming queries, the statistics collected from the database server can give detailed information about specific tables, the index distribution, details related to database sizing, etc.

Optimize Indexes

It is vital to have the right insight of indexes on tables. This is the most common issue found when dealing with RDBMS. It’s very essential to understand the fine gap between no indexes and too many indexes, Indexing strategies for inserting, reading and updating data.
Most of the modern databases give information regarding columns that are required to be indexed, their priority order, missing or incorrect indexes, etc and also provide required suggestions on improving them.

5) Network Optimization

Retrieving website over enterprise network is underestimated. Responses within the internal organization network require many round trips between the enterprise internet gateway, firewalls, load balancers, security gateways and those web and application servers placed in different data centers. Below are some suggestions that can help optimize the internal network.
Try to remove those additional firewall rules which can gain few milliseconds for every request.
Try to keep the internal network free from any excessive internal traffic.
Identify those infrastructures with a single point of failure and monitor them closely.
Look out for those Bots and Hostile IP addresses that are clogging the network bandwidth.

Tuesday, February 6, 2018

Caching Strategies for Improved Website Performance

When it comes to performance, the first thing that comes to mind is Caching. The concept of Caching is complex to implement for a website, and we need to have the right balance with the underlying infrastructure. We need to understand the in-depth end-to-end data communications, their bandwidths, and how they get stored at each layer. 

One general mistake development teams typically make is looking at code optimization randomly to solve all caching issues. While this helps resolve some performance bottlenecks and memory issues in legacy code, it does not eliminate the core problems. 

The first and foremost fundamental point is that caching strategies differ between Static and Dynamic contents. Content Delivery Network serving contents is a no-brainer, and applications having a CDN make it easy to cache static contents for a specific amount of time on edge layers closest to the end-user. 

CDN, however, makes it tricky to implement when it comes to semi-dynamic or dynamic pages. It is much easier to shift the caching responsibilities for the dynamic functionalities closer to the application layers. 

The second tricky point is that, the Cache at the application layer influences the health of the underlying infrastructure. After any cache tweaks, it is always better to review the application server sizing after analyzing the cached data amount, cache size, cache duration, cache hit ratio, etc. Even caching at the database layer influences the underlying infrastructure. Query caches can be heavy and sometimes unnecessary, as all modern ORMs already have an internal cache implemented.

While it's tempting to resort to code optimization as a quick fix, the fundamental lesson learned is implementing caching tailored to the nature of content—static or dynamic. Also, the application layer involves a delicate balance, influencing the overall health of the underlying infrastructure. By approaching caching strategically and understanding its nuanced impact on the entire ecosystem, development teams can navigate performance challenges effectively and deliver a seamless experience to end users.

Building Microservices by decreasing Entropy and increasing Negentropy - Series Part 5

Microservice’s journey is all about gradually overhaul, every time you make a change you need to keep the system in a better state or the ...