Sunday, December 11, 2022

Fine Tuning a WAF to avoid False Positives - Part 2

This week has been an action-packed week with some high-volume DDoS attacks on one of the web applications. We have been spending a lot of time understanding the importance of having a WAF for all our client-facing public domains. In today's Cloud architecture Web Application Firewalls (WAFs) is a crucial part of any organization's security posture. They protect web applications from DoS, DDoS, and attacks, such as SQL injection, cross-site scripting (XSS), and other malicious activities. However, WAFs need to be fine-tuned regularly to ensure they provide maximum protection without causing false positives. In this article, we will discuss some best practices we followed to fine-tune a WAF and prevent multiple attacks on our application.

1. The first step in fine-tuning a WAF is to understand the web application it is protecting. This includes identifying the application's components, such as the web server, application server, and database. Additionally, it is essential to identify the web application's behavior, including the type of traffic it receives, the HTTP methods it uses, and the expected user behavior. Understanding the web application will help to identify which rules should be enabled or disabled in the WAF.

2. Configure WAF logging WAF logging is a critical component of fine-tuning. It allows security teams to analyze WAF events and understand which rules generate false positives. WAF logs should be enabled for all rules, and log data should be retained for an extended period, such as 90 days or more.

3. Start with a default configuration WAFs come with a default configuration that provides a good starting point for fine-tuning. Start with the default configuration and enable or disable rules as necessary. Additionally, some WAFs have pre-built templates for specific applications, such as WordPress or Drupal. These templates can be an excellent starting point for fine-tuning.

4. Test the WAF Once the WAF is configured, it is essential to test it thoroughly. The WAF should be tested with a variety of traffic, including legitimate traffic and malicious traffic. This will help identify any false positives or negatives generated by the WAF.

5. Tune the WAF Based on the results of testing, the WAF should be fine-tuned. This may include enabling or disabling rules, adjusting rule thresholds, or creating custom rules to address specific attack vectors. Additionally, WAFs may have machine learning or AI capabilities that can help to reduce false positives.

6. Monitor the WAF After fine-tuning, the WAF should be monitored regularly to ensure it is providing maximum protection without causing false positives. WAF logs should be analyzed regularly, and any anomalies should be investigated immediately.

In conclusion, fine-tuning a WAF is a critical component of any organization's security posture. It requires a thorough understanding of the web application, careful configuration, and extensive testing. Additionally, WAFs should be regularly monitored and fine-tuned to ensure they provide maximum protection without generating false positives. By following these best practices, organizations can ensure their WAFs provide maximum protection against web application attacks.

Thursday, December 8, 2022

Demystifying the hidden costs after moving to the Cloud

The web application at a client was hosted using a combination of services on Azure. The architecture was quite simple and used the following services. Front Door, Api Manager, App Service, SQL Database, Service Bus, Redis Cache, and Azure Functions. As the application matured, little did we think of all the hidden costs of the cloud at the start of the project.

Azure Front Door used for efficient load balancing, WAF, Content Delivery Network, and as a DNS. However, the global routing of requests through Microsoft's network incurred data transfer and routing costs. What started as a seamless solution for enhanced user experience turned into a realization that global accessibility came at a price. Also, the complexity of configuring backend pools, health probes, and routing rules can lead to unintended expenses if not optimized.

App Services had a modest cost to begin with on low-scale Premium servers. But as the application garnered a lot of hits, so did the number of users and, subsequently, the resources consumed. The need for auto-scaling to handle increased traffic and custom domains brought unforeseen expenses, turning the initially reasonable hosting costs into a growing concern. So, keep an eye on the server configuration and the frequency of scaling events.

Azure SQL Database brought both power and complexity. Scaling to meet performance demands led to increased DTU consumption and storage requirements. The once manageable monthly expenses now reflected the intricate dance between database size, transaction units, and backup storage. Not scaling down the backups also incurred costs, especially for databases with high transaction rates. Inefficient queries and suboptimal indexing can increase resource consumption, impacting DTU usage and costs.

Azure Service Bus, the messenger between the application's distributed components, began with reasonable costs for message ingress and egress. Yet, as the communication patterns grew, the charges for additional features like transactions and dead-lettering added expenses to the budget. Also, long message TTLs can lead to increased storage costs.

Azure Cache for Redis, used for in-memory data storage, initially provided high-performance benefits. However, as the application scaled, the usage to accommodate larger datasets, the costs associated with caching capacity, and data transfer began to rise, challenging the notion that performance came without a price. Eviction of data from the cache, may result in increased data transfer costs, especially if the cache is frequently repopulated from the data source. Also, fine-tuning cache expiration policies is crucial to avoid unnecessary storage costs for stale or rarely accessed data.

Lastly, the Azure Functions, with its pay-as-you-go model, was supposed to be the least cost of all services as it allowed to invoke functions as needed. But, the cumulative charges for execution, execution time, and additional resources reminded me that serverless, too, had its hidden cost. Including unnecessary dependencies in your function can inflate execution times and costs.

Demystifying the expenses after moving to Azure required a keen understanding of its pricing models and a strategic approach to balancing innovation with fiscal responsibility.

Sunday, November 27, 2022

Choosing the right WAF for your Enterprise wide Applications - Part 1

This is a multi-part series on how to protect a web application using a WAF. To start with this part explains how to choose the right WAF for an Enterprise-wide web application.

Web Application Firewalls (WAFs) are a crucial part of any organization's security infrastructure, protecting their web applications from cyber threats. With so many WAFs available in the market, choosing the best one can be a daunting task. I have been reading Gartner reports, along with performing POCs and trying to choose a tool that best suits the client, Below are the different criteria to consider when choosing the best WAF for your organization.

Security Features

When choosing a WAF, the first and most crucial criterion is its security features. The WAF should have strong protection against various cyber threats, including DDoS, SQL injection, cross-site scripting (XSS), and other common OWASP web application vulnerabilities. Additionally, the WAF should offer threat intelligence services that provide continuous updates on the latest security threats and attack patterns.

Customization and Configuration

The WAF should be easily customizable and configurable to suit an organization's specific security needs. It should allow for custom rule creation, custom signature creation, and other customization options that allow you to fine-tune the WAF's security policies according to an organization's requirements. The ability to perform extensive rate-limiting or geo-blocking features is some of the common requirements of a WAF.

Performance and Scalability

The WAF should offer excellent performance and scalability, especially for high-traffic websites or applications. It should be able to handle a large number of concurrent connections without compromising performance or introducing latency. Additionally, the WAF should be scalable, allowing an organization to expand and grow without requiring a complete WAF overhaul. In simple words, it should not be a single point of failure.

Integration with Existing Security Infrastructure

The WAF should be easy to integrate with the organization's existing security infrastructure, including firewalls, intrusion detection and prevention systems (IDPS), and Security Information and Event Management (SIEM) systems. This integration should also allow for seamless communication and collaboration between the different security systems, providing a holistic approach to security.

Compliance and Regulations

The WAF should comply with various regulatory standards, such as the Payment Card Industry Data Security Standard (PCI DSS) or the General Data Protection Regulation (GDPR). Additionally, the WAF should be auditable, providing detailed logs and reports allowing compliance verification and audit trails.

Ease of Use and Management

The WAF should be easy to use and manage, with a user-friendly interface that allows security administrators to monitor and manage the WAF effectively. Additionally, the WAF should offer automation and orchestration capabilities, allowing for seamless deployment and management of the WAF across different environments.

In conclusion, choosing the best WAF for an organization requires careful consideration of various criteria, including security features, customization and configuration, performance, and scalability, integration with existing security infrastructure, compliance and regulations, and ease of use and management. Selecting the right WAF that meets an organization's specific security needs can protect web applications from various cyber threats and ensure your organization's continued success.

Monday, July 18, 2022

Using Well Architect Framework to Address Technical Debt - Part 1

Since getting my well-architected framework proficiency certification a year back, I have become a massive fan of the framework and have used it extensively at work. The Well Architected Framework is a tool with a set of standards and questionnaires that illustrates design patterns, key concepts, design principles, and best practices for designing, architecting, and running workloads in the cloud.

All major cloud providers like AWS, Azure, Google, and Oracle have defined the framework foundation, and they continue to evolve them with their platforms and services.

Organizations that have moved to the cloud have a different set of challenges. As all workloads are running in the cloud, the typical requirement from businesses is for more agility and focus on shipping functionalities to production. Teams are very less invested in improving the technical debts. This leads to more reactive rather than proactively continuous improvements and a huge pile load of epics to resolve.

The well-architected framework (WAF) suits really well for teams that are unaware of where to start with the technical debt in terms of priority. The fundamental pillars of the WAF are

a) System design b) Operational Excellence c) Security d) Reliability e) Performance f) Cost optimization and the newly added pillar g) Sustainability.

The framework can be fine-tuned to fit custom requirements based on the application domain. The framework is also apt to address typical Cloud challenges like the high cost of cloud subscriptions, Application Performance tuning, Cloud security, Operation Challenges in a Cloud or Hybrid setup, Quick recoveries from failure, and improvement on organizations' Green Index.

A dashboard helps to view the technical debts once the questionnaire is updated based on the WAF pillars. The below diagram illustrates the WAF dashboard heatmap and the technical debt based on prioritization and impact. The dashboard stresses the needed improvement and helps to measure the changes implemented by comparing them to all the possible best practices.

Performing these reviews on a timely basis helps the team to identify unknown risks and mitigate the problem very early. The WAF reviews fit well with the Agile ways of working and the principle of Continuous improvement.

Below are the links to Well-Architected Frameworks described by different cloud vendors.

a) https://aws.amazon.com/architecture/well-architected

b) https://cloud.google.com/architecture/framework

c) https://docs.microsoft.com/en-us/azure/architecture/framework/

d) https://docs.oracle.com/en/solutions/oci-best-practices

Monday, April 4, 2022

AWS managed Blockchain Blog

I have been part of an interesting case study on AWS-managed blockchain. Glad to be part of authoring the new AWS blog post on AWS Managed Blockchain.

--> https://aws.amazon.com/blogs/apn/capgemini-simplifies-the-letter-of-credit-process-with-amazon-managed-blockchain/

Sunday, March 20, 2022

The Sustainable Enterprise - Why cloud is key to business sustainability

I have been writing several articles on this topic and am pleased to contribute to this newly released white paper on the topic of "How Enterprises can achieve sustainable IT via the cloud, teaming up with Microsoft. Nice to share an Architects view and work with some of the market-leading experts on this topic.

Download the white paper here:

https://www.capgemini.com/se-en/resources/the-sustainable-enterprise-why-cloud-is-key-to-business-sustainability/

Friday, February 4, 2022

Harnessing Green Cloud computing to achieve Sustainable IT evolution

A few months back, I had written an article about Sustainability explaining what it is all about when it comes to software development. Since then I have come across this topic in several forums, including discussions with multiple client organizations that have pledged to quantify and improve on this subject.

Organizations that move their applications towards cloud services tremendously improve their IT environmental impacts and goals of being sustainable. They are several factors that an enterprise has to consider beyond just selecting a cloud provider to be considered environmentally sustainable.

Focus on the following 6 areas can help organizations kick start their Green IT revolution on Cloud.

a) Cost Aware Architecture thinking

In applications built on cloud infrastructure, there are several moving parts with innumerable services. Organizations who have moved to the cloud often find it very difficult to be cost-aware, ensuring optimal usage of these services.

They are so engrossed in building their core business applications that they don’t invest in cost-aware architecture teams that focus on optimizing the spending by eliminating unprovisioned infrastructure, resizing or terminating underutilized and using lifecycle management. Practices like energy audits, alerts and IT cloud analysis helps to identify costs and identify systems that need to be greened.

Cloud provides services like Azure Advisor and AWS Trusted Advisor helps to optimize and reduce overall cloud expenditure by recommending solutions to improve cost-effectiveness. Services like Azure Cost management and Billing, AWS Cost Explorer, and AWS Budgets can be used to analyze, understand, calculate, monitor, and forecast costs.

b) Sustainable development

Building applications using modern technologies and cloud services help optimize development code and ensures faster deployments. It also enables in reduction of redundant storage and end-users energy levels.

Sustainable development on the cloud has many parts. It involves an end-to-end view of how the data traverses wholistically. Improving load times by optimizing caching strategies reduces the data size, data transfer quantity, and bandwidth. With new innovative edge service solutions and by serving the content from the appropriate systems, energy-efficient applications can be built reducing the distance at which the data travels.

c) Agile Architecture

One of the core Agile principles is to promote sustainable development and improve ways of working by making the development teams deliver at a consistent pace.

Cloud services provide tools like Azure and AWS DevOps, which is commonplace for development teams to organize, plan, collaborate on code development, build and deploy applications. It allows organizations to create and improve products faster than traditional software development approaches.

d) Increase Observability

There is a direct correlation between an organization's Observability maturity and Sustainability. In Observability, the focus is to cultivate ways of working within development teams to have a holistic data-driven mindset when solving system issues. The concept of Observability is becoming more and more prominent with the emergence and improvement of AI and ML-based services.

Service to improve automation diagnostics, automatic infra healing, and the advent of myriads of services used for deep code and infra drills, real-time analysis, debugging and profiling, alerts and notifications, logging and tracing, etc indirectly helps in organizations return of investment, increasing productivity

e) Consumption-based Utilization

Rightly sized applications, enhanced deployment strategies, automated backup plans, and designing systems using Cloud's well-architected frameworks result in utilizing the underlying hardware and its energy efficiency. It also serves the organization's long-term goals of reducing consumption and power usages, improving network efficiencies, and securing systems. Utilizing the right cloud computing service also helps the applications to Scale Up or Out appropriately.

Using cloud-provided Carbon tracking calculators helps gauge systems or applications that require better optimization in terms of performance or better infrastructure.

Conclusion

With AWS introducing Sustainability as the 6th pillar, green cloud computing has become one of the interesting topics for all organizations across different domains. While we all have come across tons of articles predicting how to save the world from various natural catastrophes and climate changes when it comes to software development on the Cloud, it's the foundational changes that one can start with to bring about the transformation.

Monday, November 15, 2021

The fundamental principles for using microservices for modernization

The last few years I have spent a lot of time building new application on microservices and also moving parts of monolith to microservices. I I have researched and tried sharing my practical experience in several articles on this topic.

This week my second blog on some foundational principles of microservices published on Capgemini website.

https://www.capgemini.com/se-en/2021/11/the-fundamental-principles-for-using-microservices-for-modernization/

Wednesday, October 20, 2021

How to manage the move to microservices in a mature way

This week my very first blog on this topic is published on Capgemini website.

https://www.capgemini.com/se-en/2021/10/how-to-manage-the-move-to-microservices-in-a-mature-way/

Friday, September 3, 2021

The advent of Observability Driven Development

A distributed application landscape with high cardinality makes it difficult for dedicated operation teams to monitor system behavior via a dashboard or react abruptly to system alerts and notifications. In a microservices architecture with several moving parts, detecting failures becomes cumbersome, and developers end up looking at errors like finding a needle in a haystack.

What is Observability?

Observability is more than a quality attribute and one level above monitoring, where the focus applies more to cultivating ways of working within development teams to have a holistic data-driven mindset when it comes to solving system issues.

An observability thought process enables development teams to embed the monitoring aspect right at the nascent stage of development and testing.

Observability in a DevSecOps ecosystem

Several Organizations are adopting a DevSecOps culture, and it has become essential for development teams to become self-reliant and have a proactive approach to identify, heal and prevent systems faults. DevOps focuses on giving the development teams ability to make rapid decisions and more control to access infrastructure assets. Observability enhances this by empowering development teams to be more instinctive when it comes to defining system faults.

Furthermore, the modern ways of working with Agile, Test Driven Development, and Automation enable development teams to get deep insights into operations that can potentially be prone to failures.

Observability on Cloud platforms

Applications deployed on Cloud provide the development teams with several out-of-box myriads of system measurements. Developers can gauge and derive quality attributes of a system even before a code goes into production. Cloud services make it easy to collate information like metrics, diagnostics, logs, and traces for analysis, and they are available at the developer’s behest. AI-based automated diagnostics along with real-time data give developers deep acumen into their System Semantics and characteristics.

Conclusion

Observability is more of an open-ended process of inculcating modern development principles to increase the reliability of complex distributed systems. The benefits of the Observability mindset helps organizations resolve production issues speedily, reduces dependency and cost on manual operations. It also benefits development teams to build dependable systems helping end customers with a seamless user experience.

Monday, July 5, 2021

Driving Digital Transformation using Sustainable Software Development

The term Digital Transformation in the last decade or so has become a well-known strategy in various organizations. Businesses across every domain are reviving their traditional businesses to adapt to a more modern digital marketplace.

But in the last few years, sustainable development has become one of the essential mainstays for a successful digital transformation journey. The Covid pandemic has also pushed organizations across different domains to rethink and emphasize environmental factors, climate changes, and human well-being to lure consumers.

Embracing a cloud-first model is one of the critical constituents in digital transformation and sustainable journeys. More and more organizations are speeding up their Cloud computing journeys and investing in modern SaaS/PaaS services, thus reducing environmental impacts and eliminating major infrastructure expenses. Organizations need to be wary and invest wisely in sustainable software-building methodologies for successful software implementations and cloud migrations seamlessly.

Organizations that strive to be data-driven have a better ability to monitor operations and analyze system behaviors accurately. The real-time analysis of information results in better usage of devices and improves the defined sustainable characteristics. Companies that invest in AI/ ML can have a very substantial benefit to sustainability. The science of reliable predictability in the digital realm can bridge gaps in system information interchange, zero wastages, improve storage and distribution mechanisms, eco-friendly products, free delivery methods, reusable infrastructure, etc. All of these can directly help in subduing environmental consequences.

In conclusion, the principles of building next-generation Digital software and Sustainable development go hand in hand. In the modern agile world, both of these journeys have a common goal of not jeopardizing the capability of future needs. These can be applied to systems as much as they can be related to human well-being. Adaptable working methods of Extreme Programming, Agile, Lean, Kanban help teams to strive for rapidly focused executions. These ways of organization working improve distributed system communications, their collaborations, their usages, and velocity. All of these indirectly result in contributing to energy-efficient software development.

Monday, June 14, 2021

My Capgemini Cloud Expert Profile

I joined Capgemini as a Cloud Solutions Architect and happy to share my profile is on the expert's page of Capgemini Sweden.

https://www.capgemini.com/se-en/experts/expert/shailendra-bhatt-managing-delivery-architect-at-capgemini-sweden-digital-and-cloud-solutions/

Sunday, May 23, 2021

Tips preparing for Professional AWS Solution Architect Exam

I recently cleared my AWS Solutions Architect Professional Exam with a total of 948/1000 and thoroughly enjoyed preparing for the exam. I spent a total of 6 months of preparation. This is in spite of the fact that I got 1000/1000 in the Associate Architect exam last year.

The exam as such is really tough. It not only evaluates one's knowledge and experience on AWS, but one also has to strategize for reading lengthy questions, time each question, and also be prepared to sit continuously for 190 minutes to finish the exam.

Below are some of the learnings and tips that I can share so that one can make good use of and benefit from studying for the exam. Preparation of the exam can be divided into basically 3 phases

Phase 1 Preparation

To start with, the exam requires considerable experience on the platform, I would say at least 2 years of hands-on experience on core AWS services. I would definitely recommend passing the Associate exam as the Professional one is way too tough.

a) Plan for taking a course and stick to the same. Select a course with a good rating on popular training sites like Udemy/Coursera or Udacity. Try out different courses for few days and choose a course where you are comfortable with the language and flow of the course. The basic content of all the highly rated courses is more or less the same. Also, choose a course that has practical samples on the topics that one is not comfortable with or has not worked on.

b) Plan a date and book the exam date. Choose somewhere between 2 to 3 months. AWS allows you to change the date twice for a booked exam.

c) Create a personal AWS account to practice as the exam covers way too many services which one may not have implemented in day-to-day professional work.

d) The exam is not theoretical and requires vast experience in the services. There are several real-world scenarios based questions and there are multiple ways to solve a specific problem. Read through a lot of use cases from different organizations especially the ones from the latest AWS re: Invent.

Phase 2 Preparation

In this phase, get deeper into the course and practice the below points in structuring and helping to know the services better.

a) AWS adds new services very frequently and one has to be well versed with each and every service that is present especially the new ones. AWS updates all the latest services in the below white paper.

AWS overview - https://d1.awsstatic.com/whitepapers/aws-overview.pdf

b) Each of the areas has several services that can perform the same task. Try to analyze which services are the best fit when considering Non-Functional requirements of Cost Optimization, Scalability, Performance, Duration, Automation, Scalability, Availability, Reliability, Security.

For example, S3 buckets are the most durable and cost optimization in terms of storage. But when it comes to performance EBS/EFS is better. Another example is when it comes to databases DynamoDB gives near real-time performance, but has limited data support. Aurora on the other hand is the most scalable when it comes to multi-region databases but is less scalable.

c) Try to understand what combination of all of the services is the best fit for requirements.

How to migrate on-premise systems and data to the cloud. It could be using a physical device in Snowball, or Server Migration service or Database Migration service or how to transform content using AWS Transform or AWS DataSync or Storage Gateways.

d) Start attempting to write practice tests and get the feel of the exam complexity. Slowly improve the ability to attempt more and more questions using a stopwatch.

Phase 3 Preparation

In this phase ensure that you have gone through the course and have a very good hold on the fundamentals of all the areas and are well versed with all services.

a) It is very difficult to master each and every service in depth. So, it is absolutely ok if one knows just the basics of certain services.

b) During this phase ensure you are at ease writing practice tests and are able to attempt 45-50 questions in a single sitting.

c) Your accuracy has improved and so has your reading speed. When attempting questions you are now more confident eliminating the wrong options.

d) By this time you will know that you have the confidence and better hold on the exam. If time is not a barrier, based on your comfort level try to push yourself to prepare and postpone the exam by a week or 2. This will just help you revise multiple times and improve the chances of clearing the exam.

Monday, December 28, 2020

The Joy and Benefits of Running over 1000 KM in a Year

https://bhattshailendra.medium.com/the-joy-and-benefits-of-running-over-1000-km-in-a-year-4f80483625ca

Wednesday, November 11, 2020

Designing Highly Available Systems

Thursday, October 1, 2020

Building Composite Architectures

Recently after Gartner in its recent report highlighted “Composite Architecture” or “Composable Architecture” as one of the five emerging trends in modern innovation and technology for the next 10 years. I started coming across this topic in various technical forums.

“Composability” as such is not a new topic, as we have used this frequently in object-oriented programming to achieve Polymorphism. In software architecture terms it is defined as the combination of software systems to produce a new system. In other words, it is directly connected to the goal of agility and reusability and the whole crux of it is to respond to the changing business spectrum.

Domain-Driven Design to build Composable Application

If we take a step back and go back to the way a simple application created using domain-driven design using an onion architecture. The orchestration layer plays a pivotal role in making an application composable by interacting directly with the repository or service layers.

The orchestration layer as such can either be a WebHooks API, a Data importer, API Controller, Messaging service, or a simple REST or SOAP request.

This kind of atomic structure if done properly can result in designing a system that is open to change its external integration seamlessly and also meet the changing business landscape.

Atomic Architecture

If we take the earlier example and apply it in a larger context, the below visualization depicts a circular relationship between different layers in a typical business domain.

Here the applications are interconnected in an atomic way making the organization landscape plug-in and plug-out systems in an easier way. With the advent of the native SaaS-based platforms, this type of “Composable architecture” is getting more and more noticeable.

Elements of Composable Architecture

The basic building blocks of a composable system is still around the foundation of Containerization, Microservices, Cloud, API’s, headless architecture, etc.

Conclusion

With a Composable mindset, organizations can uplift isolated business operating models and move towards a more practical loosely coupled technology landscape where systems can be plugged in and out flexibly.

This kind of model perfectly fits with organizations adopting agile ways of working or building modern omnichannel integrations with different types of native Cloud-based SaaS platforms.

This model can also be applied to bridge gaps across the entire ecosystem of legacy and modern applications including areas of a unified experience, operations, transformations, infrastructure, external and internal system integrations.

Thursday, July 30, 2020

Using Level Of Connascence to Break the Monolith Application

Friday, June 19, 2020

10 Fundamental Principles one needs to ask before breaking the monolith platform

Below are some of the key principles that need to be evaluated when one starts to break out services from a monolithic platform.

1. Target Core Services or Fringe Services First?

Target Functionality that doesn’t require changes to the end customer application and possibly also doesn’t need any core database migration or changes. It becomes easier for subsequent services by building CICD pipelines, required alert and monitoring systems, testing strategies, and version control.

2. Split Schema or Code First?

If the core services are clear then always first split out the schema and keep the services together before splitting the application code out into microservices. If the services are too coarse-grained they will be split into smaller services creating another data migration. Also, two services accessing the same database results in tight coupling between these services.

3. Moving out Services Vertically or Horizontally?

Moving out of Services can happen either vertically or horizontally. Try to move out a single core service at a time by first moving the database, functionality, and then the front end. This technique avoids costly and repeated data migrations and makes it easier to adjust the service granularity when needed.

4. Building Micro or Macro or Mini services?

When creating a service, first identify the core services and define clear bounded contexts. Until then, the first step is to create a macro service until the core services are clearly demarcated. Once the demarcations are clear it is easy to further split into microservices.

5. Outside in or Inside Out Creation of Services?

The easiest way to create services is from outside-in, understanding how the various integrations need to talk to various applications. However, this leads to data inconsistencies and data integrity issues. Designing service inside-out is more time consuming but cleaner with clear defined boundaries for each service. If approached properly, this will reduce possible data integrity issues.

6. Where to build New functionalities?

Target any new functionality getting created as new micro-services, target services that are business-centric. Do not add a dependency to the monolithic platform. Ensure that the new services do not call the monolithic application directly and always access it via anti-corruption layer.

7. Rewriting Code or Capability?

When building new functionality try to rewrite capability and not the code. This may be time-consuming to build, but the monolithic platform already has a lot of redundant code. By rewriting capability it gives an opportunity to improve the granularity of the service, revisit business functionality, and maintain a clean codebase.

8. Incremental or Radical updates?

Target to decouple modules or services that result in reducing traffic towards the monolithic application, this will improve the performance of the application as well as help in decommissioning of infrastructure and helping cost (licenses).

9. Versioning Services Incrementally or Concurrently?

Having multiple versions of the same code leads to issue concerning maintainability and cost, but until the microservices and surrounding integrations are matured, maintaining multiple versions of the service endpoint at any given time helps in reducing failure risks and less dependent on external systems.

10. Where to build New functionalities?

Thursday, June 11, 2020

The myth of Sharing State when breaking large applications

One of the complex puzzles in a microservices journey is how and when to break the database. When thinking about breaking a legacy monolith application, the very first non-risky thought that comes to mind is to decompose the platform module by module as standalone microservices using multiple ORMs and hitting the same database.

If it was an application with limited tables and modules, would have been the simplest approach to move towards. If there is a firm partition between each microservices data with fewer dependencies, then it becomes fairly easy to adopt services and maintain one large database with several schemas.

However, legacy applications are seldom portable, and sharing data or state to all intents and purposes is convoluted. Below are some of the typical concerns that need to be evaluated building or maintaining applications with a single state.

Tight Coupling of Services

One of the key principles that Architects strive is to build a loosely coupled application that can be catered to future unknown requirements. In data terms what that essentially means is to build functionalities using new ways of persisting state without impacting the existing application or state.

Most of the legacy applications are built and maintained for years and years and have a very tight coupling of out of box and custom modules and libraries. This results in huge state dependencies between modules. If any new requirements to either build a module to be event-driven design or build a new non-SQL database for solving certain quality attributes is no easy task and requires a complete revamp of several services.

Weak Cohesion

The basic principle of building microservices is the Separation of Concern, i.e. each service, or a group of services to have its own dedicated state.

Large legacy applications generally have a large database with several schemas. Each database schema is accessed by several services, hence if any change to the logic that requires a DB change, it will impact all corresponding services. If a database table changes, all the related services will have to change and this creates huge dependencies between development teams with huge sunk cost fallacy.

Friday, January 10, 2020

Dealing with Concurrency Issues in large applications

The last few days have been hectic dealing with concurrency issues with our monolith application during the peak traffic period.

Concurrency issues are not easy to resolve, especially when you have an application with thousands of files. The error was in the order pipeline during checkout when hundreds of custom pipelines execute in parallel. When the error occurred, all the previous transactions got revoked.

Since the issue happened for the first time, to begin with, we just ignored the error, hoping it did not crop up again. As the traffic increased, the errors also increased simultaneously, and every error in the log pointed to a concurrent exception.

We did not have much logging, and that's when we started evaluating every table in the transaction and their relationships. We got the list of all the tables, and there were like close to 100 tables getting accessed. We decided to split the table in terms of read-only and write. Once we got the number of tables that were getting updated, we tried pinpointing the tables that had a foreign-key relationship. That further filtered the number of tables where the issue could potentially be present.

Lastly, on further analysis came across a table where locking was a possibility. Meanwhile, enabling logs gave details about concurrency errors on the same set of tables. The first thing noticed was there was no last-modified timestamp column on these tables. Then went back to the application code and added an explicit locking in the code and a check for validating the last modified timestamp.

All this took a week to resolve, and the issues made me realize how difficult it is to eradicate concurrency in systems. Years later, when I look back at this article, it will be a surprise not to have come across the same issues again.


You can Follow me on Home My Linkedin Profile My Medium Profile My PodCast Presentations View My Other Blog About Me