Showing posts with label Architecture. Show all posts

Monday, August 12, 2024

Perils of Holiday Code Freezes

Holiday Code Freeze has now become an age-old practice where deployments are stopped, and it's aimed to reduce any risk of new bugs or issues in the system when a majority of support staff is away. However, this also means the servers and infrastructure are left untouched for several weeks, and surprisingly, in my experience, this can be a bit chaotic.

Application and Database Leaks

First and foremost, the most common error noticed is the Application leak. I recall one particular instance where an e-commerce application began to slow down significantly a week into the holiday break. The application retained references to several objects that were not required, causing the heap memory to fill up gradually. As the memory usage increased, the application became sluggish, eventually leading to crashes and "out of memory" errors.

Leaks can also happen when connecting to a database. In another example, the same e-commerce application began experiencing intermittent outages. The root cause was traced to a connection leak where the application was not releasing database connections after it was used. As the number of open connections grew, the database server eventually refused new connections, causing the application to crash.

Similarly, I have also experienced code freeze situations, where thread leaks were consuming system resources and slowing down the application. This typically happens when threads are created but not terminated.

Array index out-of-bounds errors

Another issue I recently encountered during a recent freeze was the Array index out-of-bounds error. The application was a CMS, and system downtime started in the middle of a week when an application tried to access an index in an array that didn't exist. It happened due to unexpected input and data changes not accounted for in the custom code.

Array Index out-of-bound exceptions can also be caused by data mismatch when interacting with external services or APIs, not under code freeze. Once, during a holiday season, a financial reporting application began throwing array index out-of-bounds exceptions. The root cause was traced back to an external data feed that had changed its format. The application was expecting a certain number of fields, but the external feed had added additional fields, causing the application to attempt to access non-existent indices. It led to errors that took the application offline until a patch was deployed after the freeze.

Cache Corruption

Cache corruption is another potential way of bringing down heavy cache-dependent applications. In online real-time applications, caches improve the application performances, but, on several occasions, I have seen over time, if not cleared, caches can become corrupt, leading to stale and receiving of incorrect data.

COnclusion

While it's funny that IT stakeholders think that the code freeze aims to maintain stability, in most cases, they expose underlying issues that might not be apparent during regular operations. The even more funnier thing is that a majority of the time, these issues are resolved by a simple server restart.

Saturday, August 3, 2024

Instilling the idea of Sustainability into Development Teams

Inculcating Green coding practices and patterns with the development team is a modern-day challenge. It can go a long way to reducing the carbon footprints and long-term sustainable goals of an organization.

Good Green coding practices improve the quality of the software application and directly impact the energy efficiency on which the software applications are running. However, the software developers of today's agile work environment seldom focus away from rapid solution building in reduced sprint cycles. They have all the modern frameworks and libraries at their behest, and writing energy-efficient code is not always the focus. Furthermore, modern data centers and cloud infrastructure provide developers with unlimited resources resulting in high energy consumption and impacting the environment.

Below are some of the factors that improve the programming practices and can show a drastic impact on the Green Index

a) Fundamental Programming Practices

Some of the fundamental programming practices start with proper Error and Exception handling. It also includes paying extra attention to the modularity and structure of the code and being prepared for unexpected deviation and behavior, especially when integrating with a different component or system.

b) Efficient Code Development

Efficient code development helps to make the code more readable and maintainable. Efficient code writing includes avoiding memory leaks, high CPU cycles, and managing network and Infrasturc Storage in a proficient manner. It also includes avoiding expensive calls and unnecessary loops, and eliminating unessential operations.

c) Secured Programming Mindset

A secured programming mindset ensures that the software application has no weak security features or vulnerabilities. Secured programming also includes protecting data, data encoding, and encryption. OWASP vulnerability list awareness and performing timely Penetration testing assessments ensure the application code is compliant with the required level of security.

d) Avoidance of Complexity

A complex code is the least modified and the hardest to follow. A piece of code developed may in the future be modified by several different developers, and hence avoiding complexity when writing code can go a long way to keep the code maintainable. Reducing the cyclomatic complexity of the methods by dividing the code and logic into smaller reusable components helps the code to remain simple and easy to understand.

e) Clean Architecture concepts

Understanding Clean Architecture concepts is essential to allow changes to the codebases. In a layered architecture, understanding the concerns of tight coupling and weak cohesion helps in reusability, minimal disruption to changes, and avoids rewriting code and capabilities.

Conclusion

As Architects and developers, it is essential to collect Green metrics on a timely basis and evaluate the compliance and violations of the code. Measuring these coding practices can be done using various static code analysis tools. The tools can further be integrated into the IDE, at the code compilation or deployment layer, or even as a standalone tool.

With organizations in several industries now focusing on individual sustainability goals, green coding practices have become an integral part of every software developer. The little tweaks to our development approach can immensely contribute to the environmental impact in the long run.

Saturday, February 3, 2024

The Coevolution of API Centric and Event-based Architecture

When evaluating communication between different systems, there is always an argument of choosing between an API-first approach and an event-first approach. In a distributed ecosystem, it’s not one or the other, but the combination of both these strategies that can solve data transmission between one or more systems.

API’s are the de facto way of interacting for synchronous operations. That means performing tasks one at a time in sequential order. When designing systems with a specific responsibility, APIs shield the underlying systems from being accessed directly and expose only the reusable data, thus ensuring no duplication of data happens elsewhere. When using simple API’s all that is needed is a readable API structure and systems that follow a request and response pattern. API’s are beneficial in the case of a real-time integration where the requesting system needs information abruptly.

However, designing and Scaling APIs can also get intricate. In high transactions microservices architecture, throttling and caching of APIs are not simple as APIs need to scale on-demand. Also, in such integrations, API gateway becomes necessary to make the systems loosely coupled.

The below example depicts a reporting system that creates different reports based on the Customer, Order, and Catalog data. The source system exposes an API. The reporting system fetches the data via the API and sends the information to the underlying destination systems.

This architecture looks fine if there are no changes to the Information from the source systems. But, if the order information has properties that keep getting updated, then the Reporting system needs to have the capability of ensuring that the changed state gets updated in subsequent systems.

Handling Cascading Failures

In a chain of systems that interact using APIs, handling errors or failure can also become cumbersome. Similarly, if there are multiple dependent API calls between two systems, the cascading failures become complex. The complexity further increases when there is a need for systems to react based on dynamic state changes. This is where Event-based architecture can help address some of the issues.

The basis of Event-based strategy is asynchronous means of communication. There is an intermediate system that decouples the source and the destination service interfaces. This strategy is apt for applications that need near real-time communication and when scalability is a bottleneck.

With an Event-based architecture, all the source system has to do is adhere to a contract, and on any state changes, trigger a message to the intermediate broker system. One or more destination systems can subscribe to the broker system to receive messages on any state changes. Also, since the source system triggers an event, the scalability of the APIs is not an issue.

Event First Architecture

With a pure Event-based architecture with an increase in the number of messages, the architecture can get complicated. Tracking the statuses of a message if they are processed or not becomes tricky. In this case, every order tracking needs to happen for the latest state, and error handling needs to be robust. Also, this entire process is slow and there is a huge latency between the end-to-end systems.

Another way of simplifying the architecture is by combining API and the event design. The below diagram illustrates that the Reporting system interacts with the Order system using both API and events. The Order system sends the state change notification to the broken. The Reporting system reads the state change and then triggers an API call to update the Order information. The reporting system makes API calls to the Catalog and Customer systems to fetch the static data. It can further push the created destination messages to consume using the event broker.

In conclusion, both API and events have their pros and cons and solve a specific problem. They are not a replacement for one another and architecture can be made less complex if they co-exist. In a modern micro-services architecture to have both of them handy can help ease distributed system interaction complexities.

Wednesday, November 29, 2023

Mastering the Art of Reading Thread Dumps

I have been for years trying to find a structured way to read thread dumps in production whenever there is an issue. I have often found myself in a wild goose chase, deciphering the cryptic language of thread dumps. These snapshots of thread activities within a running application have so much information, providing insights into performance bottlenecks, resource contention, and High Memory/CPU.

In this article, I'll share my tips and tricks based on my experience, having read several production thread dumps effectively across multiple projects, demystifying the process for fellow my expert engineers.

Tip 1: Understand the Thread States

Thread states, such as RUNNABLE, WAITING, or TIMED_WAITING, offer a quick glimpse into what a thread is currently doing. Mastering these states helps in identifying threads that might be causing performance issues. For instance, a thread stuck in a WAITING state can be a candidate for further investigation.

Tip 2: Identify High CPU Threads

The threads consuming a significant amount of CPU time are often the culprits behind performance degradation. Look for "Top 5 Threads by CPU time" threads and dig into their stack traces. It is where the full stack trace is defined, pinpointing to the exact method or task responsible for the CPU spike.

Tip 3: Leverage Thread Grouping

Grouping threads by their purpose or functionality can simplify the analysis process. In complex applications, the number of threads can be really confusing. Hence, collating or grouping them together can be helpful. For e.g, grouping threads related to database connections, HTTP requests, or background tasks together. This approach often provides a more coherent view of the application's concurrent activities.

Tip 4: Pay Attention to Deadlocks

Deadlocks are the nightmares of multithreaded applications. Thread dumps provide clear indications of deadlock scenarios. Look for threads marked as "BLOCKED" and investigate their dependencies to identify the circular dependencies causing the deadlock.

Tip 5: Explore External Dependencies

Modern applications often rely on external services or APIs. Threads waiting for responses from these external dependencies can significantly impact performance. Identify threads in WAITING states and trace their dependencies to external services.

Tip 6: Utilize Profiling Tools

While thread dumps offer a snapshot of the application state, profiling tools like VisualVM, YourKit, or jVisualVM provide a dynamic and interactive way to analyze thread behavior. These tools allow you to trace thread activities in real time, making it easier to pinpoint performance bottlenecks.

Tip 8: Contextualize with Application Logs

Thread dumps are more powerful when correlated with application logs. Integrate logging within critical sections of your code to capture additional context. This fusion of thread dump analysis and log inspection provides a holistic view of your application's behavior.

In conclusion, reading thread dumps is both an art and a science. It requires a keen eye, a deep understanding of the application's architecture, and the ability to connect the dots between threads and their activities. By mastering this skill, one can unravel the intricacies of their applications, ensuring optimal performance and a seamless user experience.

Saturday, November 25, 2023

The Plight of an Architect in an Agile Project

Agile methodology in software development has emerged as a guiding light, promising flexibility, collaboration, and adaptability. But organizations have mistaken it for a luxury cruise liner while treating it like The Pirates of Carribeans, Black Pearl on the high seas of chaos.

Agile, with its sprints, stand-ups, and user stories, was supposed to be the antidote to the rigid and often cumbersome Waterfall methodology. However, in the real world, Agile is sometimes wielded like a double-edged sword – misused by developers and misunderstood by business leaders.

The Agile coaches are like the Pirate Captain, are the ones mainly responsible to steer the meetings, and are the ones who navigate the ship without an ounce of technical know-how. Picture the Agile stand-up meetings as the meeting of the Brethren Court, which typically turns into recitations of individual developers achievements. Each developer trying to resolve epics and making their own stories for their everyday chores, trying their best to please their captain.

Then there are the Product Owners who act as the Pirate Lords, holding the keys to the treasure chest of project priorities. These lords of prioritization often struggle to let go the old ways of the Waterfall, treating Agile like a mere parrot on their shoulder rather than a shipboard companion. They treat technology debt as The dead man's chest, which is not supposed to be opened or seen.

Amidst all of them, the Architects are often left in the lurch as Agile teams treat their decisions as an afterthought. Their long-term vision gets lost in the relentless pursuit of project priorities and sprint goals. Good Architects are aware of the so called mirage on the horizon. But, often find themselves relegated to the backseat of the ship, much like a passenger becoming mere spectator watching their maps of successful navigation become damp and tattered in this unpredictable Agile Storm.

Friday, April 21, 2023

Funnel-based Architecture for application Security on the Cloud - Part 1 - The Framework

As a Solution Architect, I've got a few opportunities to work with organizations facing security challenges on the cloud, especially with public facing applications. One of the most common issues I've encountered is a lack of visibility and control over their cloud environments.

To solve these security issues I've implemented a funnel-based framework for enhancing security on the cloud. This framework involves identifying the data flow within the cloud platform and implementing funnel points, which act as choke points at each layer for security controls. The last steps of the framework include increasing observability and continuous security improvements.

Below are the different steps :-

Step 1: Identify the Data Flow within the Cloud Platform

The first step in implementing a funnel-based framework for security on the cloud is to identify the data flow within the platform. It concerns understanding the data types processed through the platform and identifying the various stages in the data flow. It also includes getting to know every service or layer through the data flows.

Step 2: Implement Funnel Points

Based on the data flow, the next step involves implementing funnel points throughout the platform. Funnel points are choke points in the data flow where security controls are added at each layer to protect from threats. These funnel points are part of the Network, Transport, and Application Layers. These funnel points in the system may include network gateways, data storage, web and application services, and other components.

Step 3: Implement Security Controls at Each Funnel Point

At each funnel point, security controls at each layer or service protect the cloud environment. It includes access controls, encryption and decryption processes, network security controls, monitoring and logging mechanisms, vulnerability management, and incident response processes. Each security control design addresses a specific threat or vulnerability and works together to provide comprehensive protection for the cloud environment.

Step 4: Regularly Monitor and Update the Security Controls

Once the security controls are implemented in each layer, it is critical to regularly monitor and update them to ensure they are working effectively. It involves monitoring the platform for suspicious activity, regularly reviewing access controls, updating software and security patches, and testing the security controls to identify any weaknesses or vulnerabilities.

Step 5: Continuously Improve the Framework

Finally, to continuously improve the funnel-based framework for security on the cloud, it is critical to stay ahead of emerging threats and vulnerabilities. It involves staying up-to-date on the latest security trends and best practices, regularly reviewing the security controls to identify areas for improvement, and working with clients to identify new threats and risks.

By following these steps, I was able to implement a comprehensive funnel-based framework for security on the cloud that provided good protection against a wide range of threats and vulnerabilities. I will deep dive into the Funnel based Architecture with examples in Part 2.

Funnel-based Architecture for Website Security on the Cloud - Part 2 - Using Microsoft Azure Services

In Part 1 of the article, I described the Funnel-based framework and various steps to improve web application security on the cloud. In this article, I will cite a real-world example of how I used the funnel-based framework and designed a Funnel-based architecture to filter and analyze malicious traffic for a web application.

The layered approach of Funnel-based Architecture is essential in providing multiple levels of security to web applications. By having multiple layers of security, each layer is responsible for detecting and blocking various attacks, making it more challenging for attackers to breach several layers at once. If an attacker bypasses one layer of defense, the other layers can still provide protection, making it harder for them to compromise the web application.

Below is an example of a multi-layered funnel that blocks malicious web requests. As each layer provides an increased level of security. The diagram illustrates

a) The data or request flow from the browser, DNS, across edge layers, and all Azure services in the background.

b) All layered funnel points have independent layers to choke malicious traffic by ip filtering, Geo-blocks, custom WAF rules, rate limiting, content caching, etc.

c) Security controls at each layer or funnel point where access controls and restrictions using user authentication, authorization, audit trails, data encryption at rest, transit, via Intrusion Detection and Prevention System.

d) Deep Monitoring and Alerting of each layer and creating custom automated ways to update infrastructure and WAF rules, log analysis, auto threat detections, triggering application protection via scaling, captchas, static sites, etc.

e) Finally, continuous improvement by providing regular security assessments and benchmarking, performing penetration testing, security awareness training, incident response planning, etc.

Here are some examples of security tools that we used to create a Funnel-based Architecture on Azure:

Azure Firewall: A network layer security tool that provides a managed, cloud-based firewall service to protect Azure virtual networks and resources from network-based threats.
Azure Front Door: A global, scalable, and secure entry point that provides routing, caching, and load balancing of web traffic at the network layer.
Azure Application Gateway: A layer-7 load balancer that provides WAF and SSL termination capabilities to protect web applications from application-layer attacks.
Marketplace WAF: An Advanced WAF that provides robust in-house web application firewall protection by securing applications against layer 7 DDoS attacks, malicious bot traffic, all OWASP top 10 threats, and API protocol vulnerabilities.
Azure DDoS Protection: A layer 3/4 protection service that protects against DDoS attacks by automatically mitigating them in the Azure network before they reach the targeted resource.
Azure Key Vault: A cloud-based service that provides secure storage and management of cryptographic keys and secrets used by cloud applications and services.
Azure Sentinel: A cloud-native SIEM and SOAR solution that provides intelligent security analytics and threat intelligence across the enterprise.

Sunday, March 26, 2023

Role of a Solution Architect in Modern Organizations

This week came across a topic of how the market lacks good Solutions Architects and how the role of a Solution Architect is often misused in the IT industry, especially with the advent of cloud certifications. With Agile Organizations, the role of the Solution Architect also seems to be diminishing. It has also led to organizations creating other different Architecture titles doing specific tasks.

Having played the role of Solution Architect for about a decade now, I feel the title typically requires years of experience designing and implementing complex software solutions in different projects and domains.

Solution Architect Mindset

It involves a deep understanding of Business requirements, Technical constraints, Architecture Design and Principles, and staying up-to-date on Emerging Technologies. Solution Architects must also possess strong communication and leadership skills to work effectively with Stakeholders, Developers, Business Architects, and other team members.

With the rise of cloud certifications, many individuals get the title of Solution Architect without the necessary experience and expertise. It is probably the main reason that has led to a proliferation of Solution Architects who lack the required skills to design and implement complex solutions.

Also, Organizations do not clearly define the roles and responsibilities of a Solution Architect. The role has evolved in the past decade with emerging technologies and ways of working, but the fundamentals have always remained the same.

By definition, a Solution Architect's role is to design and oversee the implementation of software solutions that meet business requirements. To think like a modern Solution Architect, for any project, one has to mold in the below five areas.

a) Understanding Business Requirements

b) Evaluate the requirements based on the abilities

c) Define the Architecture principles of the solution

d) Define the Architecture and design the overall solution

e) and Participate in the development of end-to-end solution

Conclusion

In conclusion, to have a successful Solution Architect mindset, one needs to ensure they have the problem-solving ability, love having strategic thinking proficiency, have proper communication to collaborate, and have the urge for continuous learning.

Thursday, October 1, 2020

Building Composite Architectures

Recently after Gartner in its recent report highlighted “Composite Architecture” or “Composable Architecture” as one of the five emerging trends in modern innovation and technology for the next 10 years. I started coming across this topic in various technical forums.

“Composability” as such is not a new topic, as we have used this frequently in object-oriented programming to achieve Polymorphism. In software architecture terms it is defined as the combination of software systems to produce a new system. In other words, it is directly connected to the goal of agility and reusability and the whole crux of it is to respond to the changing business spectrum.

Domain-Driven Design to build Composable Application

If we take a step back and go back to the way a simple application created using domain-driven design using an onion architecture. The orchestration layer plays a pivotal role in making an application composable by interacting directly with the repository or service layers.

The orchestration layer as such can either be a WebHooks API, a Data importer, API Controller, Messaging service, or a simple REST or SOAP request.

This kind of atomic structure if done properly can result in designing a system that is open to change its external integration seamlessly and also meet the changing business landscape.

Atomic Architecture

If we take the earlier example and apply it in a larger context, the below visualization depicts a circular relationship between different layers in a typical business domain.

Here the applications are interconnected in an atomic way making the organization landscape plug-in and plug-out systems in an easier way. With the advent of the native SaaS-based platforms, this type of “Composable architecture” is getting more and more noticeable.

Elements of Composable Architecture

The basic building blocks of a composable system is still around the foundation of Containerization, Microservices, Cloud, API’s, headless architecture, etc.

Conclusion

With a Composable mindset, organizations can uplift isolated business operating models and move towards a more practical loosely coupled technology landscape where systems can be plugged in and out flexibly.

This kind of model perfectly fits with organizations adopting agile ways of working or building modern omnichannel integrations with different types of native Cloud-based SaaS platforms.

This model can also be applied to bridge gaps across the entire ecosystem of legacy and modern applications including areas of a unified experience, operations, transformations, infrastructure, external and internal system integrations.

Tuesday, August 6, 2019

Key Architectural Considerations to Evaluate when moving applications to cloud - Part 2

Ø If an application migration requires lot of integration or coordination between internal and external environments on top of the cloud services, it will become a layer between the cloud provider and inhouse applications will struggle to keep up with the rate of innovation in the cloud provider’s services. Cloud provides numerous services that are portable. Organizations should not build or acquire layers of insulation on top of cloud provider's native features in order to perceive portability.

Ø Modern cloud service providers can auto scale in order to create a resilient and highly available applications. The cloud service providers have different solutions to provide the ability to store and replicate data. If a legacy application is critical enough to meet the requirement of fault tolerant, moving such applications to the cloud can be easier to manage.

Ø Cloud is a better fit if Speed and Agility are the primary business drivers of an organization. In order to do so it is required for applications to have continuous and direct access to the cloud provider's fast pace of innovation. Only by building directly upon provider-native features will there be the desired business agility and rate of improvement. Organizations will struggle to easily port applications across cloud providers by sacrificing speed, agility and innovation.

Ø Another area to consider is the factor of repeatability for applications. Typical scheduled deployment times in legacy application require a down time along with human intervention in doing the same manual tasks repeatedly. Also, in case of disaster recovery or outage most of the tasks carried out are manual. Typical cloud services excel to execute the same tasks multiple times without failure. Most of the application recovery or deployments are auto managed and incur very little to no human interventions.

Ø Cloud services generally provides high flexibility and testability. Applications can be tuned to run on need basis. Test environment application can be a good candidate to move to the cloud especially when doing a load or stress testing. Different applications can be made available on the fly based on different hardware configuration, operating system and different regions and can be scaled up or down on need basis. This gets even easier with cloud providers excelling in containerized application and providing seamless continuous integration and deployment.

Ø If high performance, monitoring, volatility and high volume are the key requirement then the application needs quick development and high rate of innovation. Cloud vendors do provide ready-made solutions to meet all such requirements. Performance benchmarks can be met with different solutions that fulfil the key constraints of Caching, Sharding, Archiving and Storage. Readymade tools can be configured to meet the requirement of in-depth monitoring, logging and analysing. Cloud providers have rich support for state-of-the art agile development modes including DevOps, containers, microservices and will be the first to have mature support for upcoming methods like serverless computing etc. Different pricing models and tenancy are also provided that can ensure cost is kept to the minimum.


You can Follow me on Home My Linkedin Profile My Medium Profile My PodCast Presentations View My Other Blog About Me

Architecting Modern Applications - Shailendra Bhatt