Saturday, February 25, 2023

Building a Chatbot using WebSockets on Azure Services - Part 1

 Got an opportunity to build a chatbot application on Azure and it was a dilemma to use either HTTP polling or use the WebSocket way of interacting with the client. The moment WebSocket was discussed got a lot of caution from the organization architecture team regarding the complexity and the technical back draws. 

Integrating WebSockets for a chatbot using Microsoft Azure Services can greatly enhance the user experience by providing real-time, bidirectional communication between the chatbot and the user. Microsoft Azure provides a number of services that can be used to implement a WebSockets-based chatbot solution, including Azure SignalR Service, Azure Application Gateway, and Azure Functions.

One of the key advantages of using Azure SignalR Service is that it provides a fully managed service for adding real-time functionality to applications, making it easy to implement WebSockets in a chatbot solution. By using Azure Application Gateway, incoming traffic from users can be redirected to the chatbot's backend, and the WebSockets connection between the chatbot and the user can be secured using SSL offloading and authentication.

In addition to these services, Microsoft Azure also provides a number of tools for monitoring and troubleshooting, such as Azure Monitor and Azure Log Analytics, to ensure that the chatbot solution is performing optimally and that any issues are quickly identified and resolved.

To ensure a seamless user experience, it is important to plan for scalability from the start, so that the chatbot solution can handle increasing traffic and user numbers. This can be achieved using Azure's auto-scaling and load-balancing features.

Finally, thorough testing is critical to ensure that the chatbot implementation meets the requirements and expectations. This includes testing the WebSockets connection through the Azure Application Gateway, as well as any other components of the chatbot solution.

In conclusion, by using Microsoft Azure Services, it is possible to implement a highly performant and scalable WebSockets-based chatbot solution that provides a real-time, responsive experience for users.

Sunday, January 8, 2023

Tackling and Mitigating a Distributed Network attack

A Distributed Denial of Service (DDoS) attack on a public website can have severe consequences for businesses, including lost revenue, reputational damage, and even legal liability. DDoS attacks involve overwhelming a website with traffic from multiple sources, making it inaccessible to legitimate users.

There was a high-volume DDoS attack recently on one of the public sites that I am responsible for. I was in the midst of all the action around the clock and helped mitigate this issue. There was an initial glitch, but the site has been running stable for the customers, with a lot of work done behind the scenes.

We did have a finely tuned WAF layer, but this was a pure layer 7 volumetric attack from across the Globe and was purposely targeting the application layer. The attack was staggered with throughput in millions of requests per minute. Most well-known WAFs available in the market with Advanced DDoS protection and ML-based pattern detections have limitations at such high volumes.  

Some of the  best practices to prevent a DDoS attack includes

Implement DDoS protection measures
The first step in tackling a DDoS attack is to implement DDoS protection measures. These measures can include firewalls (WAF), load balancers, and intrusion prevention systems (IPS). Additionally, businesses can use cloud-based DDoS protection services, which can automatically detect and mitigate attacks. The protection needs to be done at different layers of OSI design. Also, most modern WAF products have managed rules along with advanced Bot protection that needs to be fine-tuned.  

Develop a response plan
Organizations should have a plan in place for how to respond to a DDoS attack. This kind of attack can happen at any time and a proper plan should include a response team, communication protocols, and steps to address the attack. Additionally, another key element is that teams should regularly test their plan to ensure it is effective. 

Monitor network traffic
Monitoring network traffic is critical in identifying a DDoS attack on a public website. This can be achieved by using network traffic analysis tools that can identify spikes in traffic and alert the security team in real time. Capturing logs along with ready-made queries can help identify and monitor malicious traffic. 

Block malicious traffic
One of the most effective ways to mitigate a DDoS attack is to block malicious traffic. This can be achieved by using access control lists (ACLs) and firewalls to block traffic from known malicious IP addresses or Geo Locations. Additionally, teams can use rate limiting, geo-blocks, automatic pattern detection, and URL blocks to limit the amount of traffic coming from specific sources.

Use Content Delivery Networks (CDNs)
CDNs can help to mitigate the impact of a DDoS attack by distributing traffic across multiple servers. This can help to absorb the attack and keep the website online. Additionally, CDNs can offer DDoS protection services that can automatically detect and mitigate attacks.

Tooling and Testing
Having appropriate alerts and notifications along with pattern detection can help identify malicious traffic from time to time. Performing a shadow DDoS test along with timely load testing the infrastructure can help to size the application appropriately. 

Educate users
Finally, teams should educate their users about the risks of DDoS attacks and how to prevent them. This can be achieved through regular training and awareness campaigns. Additionally, businesses can encourage users to report any suspicious activity or traffic they observe.

In conclusion, a DDoS attack on a public website can have significant consequences for businesses. There is no one solution to prevent an attack and the mitigation plan varies from application to application. But the key here is to understand the events of malicious traffic and address specific attack vectors. Also, having a WAF and continuous tuning of the WAF is required to maximize app protection without causing false positives. 

Sunday, January 1, 2023

Distributed Denial of service attacks on different OSI layers - Part 3

A Distributed Denial of Service (DDoS) attack is a malicious attempt to make a network or website unavailable to its users by overwhelming it with traffic from multiple sources. These attacks can occur at different layers of the Open Systems Interconnection (OSI) model, and each layer presents different challenges for mitigation. In this article, we will discuss DDoS attacks on different OSI layers with examples.

Layer 3 (Network Layer)

DDoS attacks at the network layer target the routing of IP packets. These attacks aim to consume network bandwidth, making the targeted service unavailable to legitimate users. An example of a network layer DDoS attack is the Ping of Death attack, where an attacker sends oversized ping packets to a target, causing the system to crash or become unavailable.

Mitigation strategies for network layer DDoS attacks include implementing access control lists (ACLs) to filter out unwanted traffic and deploying routers with built-in DDoS protection features.

Layer 4 (Transport Layer)

DDoS attacks at the transport layer target the transport protocol, such as Transmission Control Protocol (TCP) or User Datagram Protocol (UDP). These attacks aim to consume server resources, making the targeted service unavailable to legitimate users. An example of a transport layer DDoS attack is the SYN flood attack, where an attacker sends a flood of TCP SYN requests to a server, consuming server resources and causing the service to become unavailable.

Mitigation strategies for transport layer DDoS attacks include implementing rate limiting and implementing SYN cookies to prevent SYN flood attacks.

Layer 7 (Application Layer)

DDoS attacks at the application layer target the application protocol, such as HTTP or HTTPS. These attacks aim to consume server resources, making the targeted service unavailable to legitimate users. An example of an application layer DDoS attack is the HTTP Flood attack, where an attacker sends a large number of HTTP requests to a server, consuming server resources and causing the service to become unavailable.

Mitigation strategies for application layer DDoS attacks include implementing web application firewalls (WAFs) to filter out unwanted traffic, implementing rate limiting, and using CDN services to distribute the load across multiple servers.

In conclusion, DDoS attacks can occur at different layers of the OSI model, and each layer presents unique challenges for mitigation. Mitigation strategies for DDoS attacks include implementing ACLs, deploying routers with built-in DDoS protection features, implementing rate limiting, using SYN cookies, implementing WAFs, and using CDN services. By being prepared and implementing these strategies, businesses can mitigate the risks of DDoS attacks and ensure their systems remain available to legitimate users.




Sunday, December 11, 2022

Fine Tuning a WAF to avoid False Positives - Part 2

 This week has been an action-packed week with some high-volume DDoS attacks on one of the web applications. We have been spending a lot of time understanding the importance of having a WAF for all our client-facing public domains. In today's Cloud architecture Web Application Firewalls (WAFs) is a crucial part of any organization's security posture. They protect web applications from DoS, DDoS, and attacks, such as SQL injection, cross-site scripting (XSS), and other malicious activities. However, WAFs need to be fine-tuned regularly to ensure they provide maximum protection without causing false positives. In this article, we will discuss some best practices we followed to fine-tune a WAF and prevent multiple attacks on our application.

1.  The first step in fine-tuning a WAF is to understand the web application it is protecting. This includes identifying the application's components, such as the web server, application server, and database. Additionally, it is essential to identify the web application's behavior, including the type of traffic it receives, the HTTP methods it uses, and the expected user behavior. Understanding the web application will help to identify which rules should be enabled or disabled in the WAF.

2. Configure WAF logging WAF logging is a critical component of fine-tuning. It allows security teams to analyze WAF events and understand which rules generate false positives. WAF logs should be enabled for all rules, and log data should be retained for an extended period, such as 90 days or more.

3. Start with a default configuration WAFs come with a default configuration that provides a good starting point for fine-tuning. Start with the default configuration and enable or disable rules as necessary. Additionally, some WAFs have pre-built templates for specific applications, such as WordPress or Drupal. These templates can be an excellent starting point for fine-tuning.

4. Test the WAF Once the WAF is configured, it is essential to test it thoroughly. The WAF should be tested with a variety of traffic, including legitimate traffic and malicious traffic. This will help identify any false positives or negatives generated by the WAF.

5. Tune the WAF Based on the results of testing, the WAF should be fine-tuned. This may include enabling or disabling rules, adjusting rule thresholds, or creating custom rules to address specific attack vectors. Additionally, WAFs may have machine learning or AI capabilities that can help to reduce false positives.

6. Monitor the WAF After fine-tuning, the WAF should be monitored regularly to ensure it is providing maximum protection without causing false positives. WAF logs should be analyzed regularly, and any anomalies should be investigated immediately.

In conclusion, fine-tuning a WAF is a critical component of any organization's security posture. It requires a thorough understanding of the web application, careful configuration, and extensive testing. Additionally, WAFs should be regularly monitored and fine-tuned to ensure they provide maximum protection without generating false positives. By following these best practices, organizations can ensure their WAFs provide maximum protection against web application attacks.


Thursday, December 8, 2022

Demystifying the hidden costs after moving to the Cloud

The web application at a client was hosted using a combination of services on Azure. The architecture was quite simple and used the following services. Front Door, Api Manager, App Service, SQL Database, Service Bus, Redis Cache, and Azure Functions. As the application matured, little did we think of all the hidden costs of the cloud at the start of the project.

Azure Front Door used for efficient load balancing, WAF, Content Delivery Network, and as a DNS. However, the global routing of requests through Microsoft's network incurred data transfer and routing costs. What started as a seamless solution for enhanced user experience turned into a realization that global accessibility came at a price. Also, the complexity of configuring backend pools, health probes, and routing rules can lead to unintended expenses if not optimized.

App Services had a modest cost to begin with on low-scale Premium servers. But as the application garnered a lot of hits, so did the number of users and, subsequently, the resources consumed. The need for auto-scaling to handle increased traffic and custom domains brought unforeseen expenses, turning the initially reasonable hosting costs into a growing concern. So, keep an eye on the server configuration and the frequency of scaling events.

Azure SQL Database brought both power and complexity. Scaling to meet performance demands led to increased DTU consumption and storage requirements. The once manageable monthly expenses now reflected the intricate dance between database size, transaction units, and backup storage. Not scaling down the backups also incurred costs, especially for databases with high transaction rates. Inefficient queries and suboptimal indexing can increase resource consumption, impacting DTU usage and costs.

Azure Service Bus, the messenger between the application's distributed components, began with reasonable costs for message ingress and egress. Yet, as the communication patterns grew, the charges for additional features like transactions and dead-lettering added expenses to the budget. Also, long message TTLs can lead to increased storage costs. 

Azure Cache for Redis, used for in-memory data storage, initially provided high-performance benefits. However, as the application scaled, the usage to accommodate larger datasets, the costs associated with caching capacity, and data transfer began to rise, challenging the notion that performance came without a price. Eviction of data from the cache, may result in increased data transfer costs, especially if the cache is frequently repopulated from the data source. Also, fine-tuning cache expiration policies is crucial to avoid unnecessary storage costs for stale or rarely accessed data.

Lastly, the Azure Functions, with its pay-as-you-go model, was supposed to be the least cost of all services as it allowed to invoke functions as needed. But, the cumulative charges for execution, execution time, and additional resources reminded me that serverless, too, had its hidden cost. Including unnecessary dependencies in your function can inflate execution times and costs.

Demystifying the expenses after moving to Azure required a keen understanding of its pricing models and a strategic approach to balancing innovation with fiscal responsibility.

Sunday, November 27, 2022

Choosing the right WAF for your Enterprise wide Applications - Part 1

 This is a multi-part series on how to protect a web application using a WAF. To start with this part explains how to choose the right WAF for an Enterprise-wide web application. 

Web Application Firewalls (WAFs) are a crucial part of any organization's security infrastructure, protecting their web applications from cyber threats. With so many WAFs available in the market, choosing the best one can be a daunting task. I have been reading Gartner reports, along with performing POCs and trying to choose a tool that best suits the client, Below are the different criteria to consider when choosing the best WAF for your organization.

Security Features

When choosing a WAF, the first and most crucial criterion is its security features. The WAF should have strong protection against various cyber threats, including  DDoS, SQL injection, cross-site scripting (XSS), and other common OWASP web application vulnerabilities. Additionally, the WAF should offer threat intelligence services that provide continuous updates on the latest security threats and attack patterns.

Customization and Configuration

The WAF should be easily customizable and configurable to suit an organization's specific security needs. It should allow for custom rule creation, custom signature creation, and other customization options that allow you to fine-tune the WAF's security policies according to an organization's requirements. The ability to perform extensive rate-limiting or geo-blocking features is some of the common requirements of a WAF.

Performance and Scalability

The WAF should offer excellent performance and scalability, especially for high-traffic websites or applications. It should be able to handle a large number of concurrent connections without compromising performance or introducing latency. Additionally, the WAF should be scalable, allowing an organization to expand and grow without requiring a complete WAF overhaul. In simple words, it should not be a single point of failure. 

Integration with Existing Security Infrastructure

The WAF should be easy to integrate with the organization's existing security infrastructure, including firewalls, intrusion detection and prevention systems (IDPS), and Security Information and Event Management (SIEM) systems. This integration should also allow for seamless communication and collaboration between the different security systems, providing a holistic approach to security.

Compliance and Regulations

The WAF should comply with various regulatory standards, such as the Payment Card Industry Data Security Standard (PCI DSS) or the General Data Protection Regulation (GDPR). Additionally, the WAF should be auditable, providing detailed logs and reports allowing compliance verification and audit trails.

Ease of Use and Management

The WAF should be easy to use and manage, with a user-friendly interface that allows security administrators to monitor and manage the WAF effectively. Additionally, the WAF should offer automation and orchestration capabilities, allowing for seamless deployment and management of the WAF across different environments.

In conclusion, choosing the best WAF for an organization requires careful consideration of various criteria, including security features, customization and configuration, performance, and scalability, integration with existing security infrastructure, compliance and regulations, and ease of use and management. Selecting the right WAF that meets an organization's specific security needs can protect web applications from various cyber threats and ensure your organization's continued success.


Wednesday, August 3, 2022

Instilling the idea of Sustainability into Development Teams

Inculcating Green coding practices and patterns with the development team is a modern-day challenge. It can go a long way to reducing the carbon footprints and long-term sustainable goals of an organization. 

Good Green coding practices improve the quality of the software application and directly impact the energy efficiency on which the software applications are running. However, the software developers of today's agile work environment seldom focus away from rapid solution building in reduced sprint cycles. They have all the modern frameworks and libraries at their behest, and writing energy-efficient code is not always the focus. Furthermore, modern data centers and cloud infrastructure provide developers with unlimited resources resulting in high energy consumption and impacting the environment. 

Below are some of the factors that improve the programming practices and can show a drastic impact on the Green Index 

a) Fundamental Programming Practices

Some of the fundamental programming practices start with proper Error and Exception handling. It also includes paying extra attention to the modularity and structure of the code and being prepared for unexpected deviation and behavior, especially when integrating with a different component or system.

b) Efficient Code Development

Efficient code development helps to make the code more readable and maintainable. Efficient code writing includes avoiding memory leaks, high CPU cycles, and managing network and Infrasturc Storage in a proficient manner. It also includes avoiding expensive calls and unnecessary loops, and eliminating unessential operations. 

c) Secured Programming Mindset

A secured programming mindset ensures that the software application has no weak security features or vulnerabilities. Secured programming also includes protecting data, data encoding, and encryption. OWASP vulnerability list awareness and performing timely Penetration testing assessments ensure the application code is compliant with the required level of security.

d) Avoidance of Complexity

A complex code is the least modified and the hardest to follow. A piece of code developed may in the future be modified by several different developers, and hence avoiding complexity when writing code can go a long way to keep the code maintainable. Reducing the cyclomatic complexity of the methods by dividing the code and logic into smaller reusable components helps the code to remain simple and easy to understand. 

e) Clean Architecture concepts

Understanding Clean Architecture concepts is essential to allow changes to the codebases. In a layered architecture, understanding the concerns of tight coupling and weak cohesion helps in reusability, minimal disruption to changes, and avoids rewriting code and capabilities. 

Conclusion

As Architects and developers, it is essential to collect Green metrics on a timely basis and evaluate the compliance and violations of the code. Measuring these coding practices can be done using various static code analysis tools. The tools can further be integrated into the IDE, at the code compilation or deployment layer, or even as a standalone tool. 

With organizations in several industries now focusing on individual sustainability goals, green coding practices have become an integral part of every software developer. The little tweaks to our development approach can immensely contribute to the environmental impact in the long run.

Monday, July 18, 2022

Using Well Architect Framework to Address Technical Debt - Part 1

 Since getting my well-architected framework proficiency certification a year back, I have become a massive fan of the framework and have used it extensively at work. The Well Architected Framework is a tool with a set of standards and questionnaires that illustrates design patterns, key concepts, design principles, and best practices for designing, architecting, and running workloads in the cloud.

All major cloud providers like AWS, Azure, Google, and Oracle have defined the framework foundation, and they continue to evolve them with their platforms and services. 

Organizations that have moved to the cloud have a different set of challenges. As all workloads are running in the cloud, the typical requirement from businesses is for more agility and focus on shipping functionalities to production. Teams are very less invested in improving the technical debts. This leads to more reactive rather than proactively continuous improvements and a huge pile load of epics to resolve.

The well-architected framework (WAF) suits really well for teams that are unaware of where to start with the technical debt in terms of priority. The fundamental pillars of the WAF are  

a) System design b) Operational Excellence c) Security d) Reliability e) Performance f) Cost optimization and the newly added pillar g) Sustainability.



The framework can be fine-tuned to fit custom requirements based on the application domain. The framework is also apt to address typical Cloud challenges like the high cost of cloud subscriptions, Application Performance tuning, Cloud security, Operation Challenges in a Cloud or Hybrid setup, Quick recoveries from failure, and improvement on organizations' Green Index.

A dashboard helps to view the technical debts once the questionnaire is updated based on the WAF pillars. The below diagram illustrates the WAF dashboard heatmap and the technical debt based on prioritization and impact. The dashboard stresses the needed improvement and helps to measure the changes implemented by comparing them to all the possible best practices. 



 Performing these reviews on a timely basis helps the team to identify unknown risks and mitigate the problem very early. The WAF reviews fit well with the Agile ways of working and the principle of Continuous improvement. 

Below are the links to Well-Architected Frameworks described by different cloud vendors.





Monday, June 13, 2022

AWS Migration and Modernization Gameday Experience

 I was at the AWS gameday and it was a very fun and learning experience partnering with fellow colleagues and competitors. AWS has kind of created this concept very differently than a typical hackathon and it is more like a gamified activity in a much more stress-free environment.

For the Migration and Modernization Gameday, it was a 6-hour activity with an hour's break for lunch (most of them had it by their desk). We were asked to migrate a 2-tier e-commerce application to AWS in a specific region with all AWS services at our behest. This specific gameday was a level-300 and required at least an associate certification, but I felt even non-experts with some AWS hands-on knowledge can also contribute immensely to the team.

The first part of the day went into setting up the basic infrastructure on new VPCs and following certain guidelines to migrate databases (using DMS) and web servers (using App Migration service). We followed the AWS documentation for the migration part.

The fun part of the gameday was in the latter half of the session post-lunch when the basic migration was completed and we had to switch the DNS from the on-premise to the cloud infrastructure. That’s when the application is throttled with real-world traffic, volumetric attacks, fault injections, etc. The better your application performed the more points you got and vice versa.


Here are some of the learnings for folks wanting to participate in the next Gameday.

a) Be thorough with the networking concept in AWS. Outline your end-state network architecture view and naming conventions to begin with. As you will be on the console this will help avoid confusion.

b) Plan all the AWS services that would be the right fit for the requirements. Since it's a real-world environment scenario, extra points are awarded to teams that include all different AWS services.

E.g. Cloudfront as CDN, AWS WAF for firewall, AWS Guard Duty for threat detection, AWS Cloudwatch for monitoring, etc.

c)Ensure that the architecture follows the well-architected framework pillars.

Operation Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, Sustainability

d) Segregate tasks between team members and ensure to review each other's changes, especially the networking part which is typically confusing. 

e) Last but not least, if stuck in any step for long, try to get the application running first and then try implementing the right principles.


Monday, April 4, 2022

AWS managed Blockchain Blog

I have been part of an interesting case study on AWS-managed blockchain. Glad to be part of authoring the new AWS blog post on AWS Managed Blockchain.

--> https://aws.amazon.com/blogs/apn/capgemini-simplifies-the-letter-of-credit-process-with-amazon-managed-blockchain/

Sunday, March 20, 2022

The Sustainable Enterprise - Why cloud is key to business sustainability

I have been writing several articles on this topic and am pleased to contribute to this newly released white paper on the topic of "How Enterprises can achieve sustainable IT via the cloud, teaming up with Microsoft. Nice to share an Architects view and work with some of the market-leading experts on this topic.

Download the white paper here: 

 https://www.capgemini.com/se-en/resources/the-sustainable-enterprise-why-cloud-is-key-to-business-sustainability/ 

Friday, February 4, 2022

Harnessing Green Cloud computing to achieve Sustainable IT evolution

A few months back, I had written an article about Sustainability explaining what it is all about when it comes to software development. Since then I have come across this topic in several forums, including discussions with multiple client organizations that have pledged to quantify and improve on this subject. 


Organizations that move their applications towards cloud services tremendously improve their IT environmental impacts and goals of being sustainable. They are several factors that an enterprise has to consider beyond just selecting a cloud provider to be considered environmentally sustainable.


Focus on the following 6 areas can help organizations kick start their Green IT revolution on Cloud.




a) Cost Aware Architecture thinking

In applications built on cloud infrastructure, there are several moving parts with innumerable services. Organizations who have moved to the cloud often find it very difficult to be cost-aware, ensuring optimal usage of these services.

 They are so engrossed in building their core business applications that they don’t invest in cost-aware architecture teams that focus on optimizing the spending by eliminating unprovisioned infrastructure, resizing or terminating underutilized and using lifecycle management. Practices like energy audits, alerts and IT cloud analysis helps to identify costs and identify systems that need to be greened.


Cloud provides services like Azure Advisor and AWS Trusted Advisor helps to optimize and reduce overall cloud expenditure by recommending solutions to improve cost-effectiveness. Services like Azure Cost management and Billing, AWS Cost Explorer, and AWS Budgets can be used to analyze, understand, calculate, monitor, and forecast costs. 

b) Sustainable development

 Building applications using modern technologies and cloud services help optimize development code and ensures faster deployments. It also enables in reduction of redundant storage and end-users energy levels. 

Sustainable development on the cloud has many parts. It involves an end-to-end view of how the data traverses wholistically.  Improving load times by optimizing caching strategies reduces the data size, data transfer quantity, and bandwidth. With new innovative edge service solutions and by serving the content from the appropriate systems, energy-efficient applications can be built reducing the distance at which the data travels. 

c) Agile Architecture

One of the core Agile principles is to promote sustainable development and improve ways of working by making the development teams deliver at a consistent pace.

Cloud services provide tools like Azure and AWS DevOps, which is commonplace for development teams to organize, plan, collaborate on code development, build and deploy applications. It allows organizations to create and improve products faster than traditional software development approaches.

d) Increase Observability

There is a direct correlation between an organization's Observability maturity and Sustainability. In Observability, the focus is to cultivate ways of working within development teams to have a holistic data-driven mindset when solving system issues. The concept of Observability is becoming more and more prominent with the emergence and improvement of AI and ML-based services.

Service to improve automation diagnostics, automatic infra healing, and the advent of myriads of services used for deep code and infra drills, real-time analysis, debugging and profiling, alerts and notifications, logging and tracing, etc indirectly helps in organizations return of investment, increasing productivity

e) Consumption-based Utilization

Rightly sized applications, enhanced deployment strategies, automated backup plans, and designing systems using Cloud's well-architected frameworks result in utilizing the underlying hardware and its energy efficiency. It also serves the organization's long-term goals of reducing consumption and power usages, improving network efficiencies, and securing systems. Utilizing the right cloud computing service also helps the applications to Scale Up or Out appropriately. 

Using cloud-provided Carbon tracking calculators helps gauge systems or applications that require better optimization in terms of performance or better infrastructure. 

Conclusion

With AWS introducing Sustainability as the 6th pillar, green cloud computing has become one of the interesting topics for all organizations across different domains. While we all have come across tons of articles predicting how to save the world from various natural catastrophes and climate changes when it comes to software development on the Cloud, it's the foundational changes that one can start with to bring about the transformation

Monday, November 15, 2021

The fundamental principles for using microservices for modernization

The last few years I have spent a lot of time building new application on microservices and also moving parts of monolith to microservices. I I have researched and tried sharing my practical experience in several articles on this topic.

This week my second blog on some foundational principles of microservices published on Capgemini website.

https://www.capgemini.com/se-en/2021/11/the-fundamental-principles-for-using-microservices-for-modernization/

Wednesday, October 20, 2021

How to manage the move to microservices in a mature way

The last few years I have spent a lot of time building new application on microservices and also moving parts of monolith to microservices. I I have researched and tried sharing my practical experience in several articles on this topic.

This week my very first blog on this topic is published on Capgemini website.

 https://www.capgemini.com/se-en/2021/10/how-to-manage-the-move-to-microservices-in-a-mature-way/

Friday, September 3, 2021

The advent of Observability Driven Development

A distributed application landscape with high cardinality makes it difficult for dedicated operation teams to monitor system behavior via a dashboard or react abruptly to system alerts and notifications. In a microservices architecture with several moving parts, detecting failures becomes cumbersome, and developers end up looking at errors like finding a needle in a haystack.

What is Observability?

Observability is more than a quality attribute and one level above monitoring, where the focus applies more to cultivating ways of working within development teams to have a holistic data-driven mindset when it comes to solving system issues.












An observability thought process enables development teams to embed the monitoring aspect right at the nascent stage of development and testing.

Observability in a DevSecOps ecosystem

Several Organizations are adopting a DevSecOps culture, and it has become essential for development teams to become self-reliant and have a proactive approach to identify, heal and prevent systems faults. DevOps focuses on giving the development teams ability to make rapid decisions and more control to access infrastructure assets. Observability enhances this by empowering development teams to be more instinctive when it comes to defining system faults.










Furthermore, the modern ways of working with Agile, Test Driven Development, and Automation enable development teams to get deep insights into operations that can potentially be prone to failures.

Observability on Cloud platforms

Applications deployed on Cloud provide the development teams with several out-of-box myriads of system measurements. Developers can gauge and derive quality attributes of a system even before a code goes into production. Cloud services make it easy to collate information like metrics, diagnostics, logs, and traces for analysis, and they are available at the developer’s behest. AI-based automated diagnostics along with real-time data give developers deep acumen into their System Semantics and characteristics.

Conclusion

Observability is more of an open-ended process of inculcating modern development principles to increase the reliability of complex distributed systems. The benefits of the Observability mindset helps organizations resolve production issues speedily, reduces dependency and cost on manual operations. It also benefits development teams to build dependable systems helping end customers with a seamless user experience.

Thursday, July 15, 2021

The Coevolution of API Centric and Event-based Architecture

When evaluating communication between different systems, there is always an argument of choosing between an API-first approach and an event-first approach. In a distributed ecosystem, it’s not one or the other, but the combination of both these strategies that can solve data transmission between one or more systems.

API’s are the de facto way of interacting for synchronous operations. That means performing tasks one at a time in sequential order. When designing systems with a specific responsibility, APIs shield the underlying systems from being accessed directly and expose only the reusable data, thus ensuring no duplication of data happens elsewhere. When using simple API’s all that is needed is a readable API structure and systems that follow a request and response pattern. API’s are beneficial in the case of a real-time integration where the requesting system needs information abruptly.

However, designing and Scaling APIs can also get intricate. In high transactions microservices architecture, throttling and caching of APIs are not simple as APIs need to scale on-demand. Also, in such integrations, API gateway becomes necessary to make the systems loosely coupled.

The below example depicts a reporting system that creates different reports based on the Customer, Order, and Catalog data. The source system exposes an API. The reporting system fetches the data via the API and sends the information to the underlying destination systems.

API First Architecture

This architecture looks fine if there are no changes to the Information from the source systems. But, if the order information has properties that keep getting updated, then the Reporting system needs to have the capability of ensuring that the changed state gets updated in subsequent systems.


Handling Cascading Failures

In a chain of systems that interact using APIs, handling errors or failure can also become cumbersome. Similarly, if there are multiple dependent API calls between two systems, the cascading failures become complex. The complexity further increases when there is a need for systems to react based on dynamic state changes. This is where Event-based architecture can help address some of the issues.

The basis of Event-based strategy is asynchronous means of communication. There is an intermediate system that decouples the source and the destination service interfaces. This strategy is apt for applications that need near real-time communication and when scalability is a bottleneck.




With an Event-based architecture, all the source system has to do is adhere to a contract, and on any state changes, trigger a message to the intermediate broker system. One or more destination systems can subscribe to the broker system to receive messages on any state changes. Also, since the source system triggers an event, the scalability of the APIs is not an issue.

Event First Architecture


With a pure Event-based architecture with an increase in the number of messages, the architecture can get complicated. Tracking the statuses of a message if they are processed or not becomes tricky. In this case, every order tracking needs to happen for the latest state, and error handling needs to be robust. Also, this entire process is slow and there is a huge latency between the end-to-end systems.

Another way of simplifying the architecture is by combining API and the event design. The below diagram illustrates that the Reporting system interacts with the Order system using both API and events. The Order system sends the state change notification to the broken. The Reporting system reads the state change and then triggers an API call to update the Order information. The reporting system makes API calls to the Catalog and Customer systems to fetch the static data. It can further push the created destination messages to consume using the event broker.




In conclusion, both API and events have their pros and cons and solve a specific problem. They are not a replacement for one another and architecture can be made less complex if they co-exist. In a modern micro-services architecture to have both of them handy can help ease distributed system interaction complexities.

Monday, July 5, 2021

Driving Digital Transformation using Sustainable Software Development


The term Digital Transformation in the last decade or so has become a well-known strategy in various organizations. Businesses across every domain are reviving their traditional businesses to adapt to a more modern digital marketplace.

But in the last few years, sustainable development has become one of the essential mainstays for a successful digital transformation journey. The Covid pandemic has also pushed organizations across different domains to rethink and emphasize environmental factors, climate changes, and human well-being to lure consumers.

Embracing a cloud-first model is one of the critical constituents in digital transformation and sustainable journeys. More and more organizations are speeding up their Cloud computing journeys and investing in modern SaaS/PaaS services, thus reducing environmental impacts and eliminating major infrastructure expenses. Organizations need to be wary and invest wisely in sustainable software-building methodologies for successful software implementations and cloud migrations seamlessly.

Organizations that strive to be data-driven have a better ability to monitor operations and analyze system behaviors accurately. The real-time analysis of information results in better usage of devices and improves the defined sustainable characteristics. Companies that invest in AI/ ML can have a very substantial benefit to sustainability. The science of reliable predictability in the digital realm can bridge gaps in system information interchange, zero wastages, improve storage and distribution mechanisms,  eco-friendly products, free delivery methods, reusable infrastructure, etc. All of these can directly help in subduing environmental consequences. 

In conclusion, the principles of building next-generation Digital software and Sustainable development go hand in hand.  In the modern agile world, both of these journeys have a common goal of not jeopardizing the capability of future needs. These can be applied to systems as much as they can be related to human well-being. Adaptable working methods of Extreme Programming, Agile, Lean, Kanban help teams to strive for rapidly focused executions. These ways of organization working improve distributed system communications, their collaborations, their usages, and velocity. All of these indirectly result in contributing to energy-efficient software development.

Sunday, May 23, 2021

Tips preparing for Professional AWS Solution Architect Exam

I recently cleared my AWS Solutions Architect Professional Exam with a total of 948/1000 and thoroughly enjoyed preparing for the exam. I spent a total of 6 months of preparation. This is in spite of the fact that I got 1000/1000 in the Associate Architect exam last year. 

The exam as such is really tough. It not only evaluates one's knowledge and experience on AWS, but one also has to strategize for reading lengthy questions, time each question, and also be prepared to sit continuously for 190 minutes to finish the exam. 

Below are some of the learnings and tips that I can share so that one can make good use of and benefit from studying for the exam. Preparation of the exam can be divided into basically 3 phases 

Phase 1 Preparation

To start with, the exam requires considerable experience on the platform, I would say at least 2 years of hands-on experience on core AWS services. I would definitely recommend passing the Associate exam as the Professional one is way too tough. 

a) Plan for taking a course and stick to the same. Select a course with a good rating on popular training sites like Udemy/Coursera or Udacity. Try out different courses for few days and choose a course where you are comfortable with the language and flow of the course. The basic content of all the highly rated courses is more or less the same. Also, choose a course that has practical samples on the topics that one is not comfortable with or has not worked on. 

b) Plan a date and book the exam date. Choose somewhere between 2 to 3 months. AWS allows you to change the date twice for a booked exam. 

c) Create a personal AWS account to practice as the exam covers way too many services which one may not have implemented in day-to-day professional work. 

d) The exam is not theoretical and requires vast experience in the services. There are several real-world scenarios based questions and there are multiple ways to solve a specific problem. Read through a lot of use cases from different organizations especially the ones from the latest AWS re: Invent.

Phase 2 Preparation

In this phase, get deeper into the course and practice the below points in structuring and helping to know the services better. 

a) AWS adds new services very frequently and one has to be well versed with each and every service that is present especially the new ones. AWS updates all the latest services in the below white paper. 

AWS overview - https://d1.awsstatic.com/whitepapers/aws-overview.pdf 



b) Each of the areas has several services that can perform the same task. Try to analyze which services are the best fit when considering Non-Functional requirements of Cost Optimization, Scalability, Performance, Duration, Automation, Scalability, Availability, Reliability, Security.

For example, S3 buckets are the most durable and cost optimization in terms of storage. But when it comes to performance EBS/EFS is better. Another example is when it comes to databases DynamoDB gives near real-time performance, but has limited data support. Aurora on the other hand is the most scalable when it comes to multi-region databases but is less scalable. 

c) Try to understand what combination of all of the services is the best fit for requirements.

How to migrate on-premise systems and data to the cloud. It could be using a physical device in Snowball, or Server Migration service or Database Migration service or how to transform content using AWS Transform or AWS DataSync or Storage Gateways.

d) Start attempting to write practice tests and get the feel of the exam complexity. Slowly improve the ability to attempt more and more questions using a stopwatch.  

Phase 3 Preparation

In this phase ensure that you have gone through the course and have a very good hold on the fundamentals of all the areas and are well versed with all services. 

a) It is very difficult to master each and every service in depth. So, it is absolutely ok if one knows just the basics of certain services.  

b) During this phase ensure you are at ease writing practice tests and are able to attempt 45-50 questions in a single sitting.  

c) Your accuracy has improved and so has your reading speed. When attempting questions you are now more confident eliminating the wrong options. 

d) By this time you will know that you have the confidence and better hold on the exam. If time is not a barrier, based on your comfort level try to push yourself to prepare and postpone the exam by a week or 2. This will just help you revise multiple times and improve the chances of clearing the exam.

Building Microservices by decreasing Entropy and increasing Negentropy - Series Part 5

Microservice’s journey is all about gradually overhaul, every time you make a change you need to keep the system in a better state or the ...