Wednesday, July 24, 2024

AI Aspirations but lacking the Automation Foundation

I am witnessing a growing need for more clarity among IT teams regarding AI and Automation. They see competitors touting AI initiatives and feel pressured to follow suit, often without even grasping the fundamental differences between AI and automation. Everyone wants to implement AI, but they do not realize that they have yet to scratch the surface of basic automation. 


In a recent event at a client, the management heads announced an AI workshop day and their plans to implement AI into their development process. However, as the workshop started, I observed the lack of technical know-how regarding AI. Even developers struggled to differentiate between rule-based automation and the more complex, adaptive nature of AI. This knowledge gap has led to unrealistic expectations and misaligned strategies.


Let me cite another example from a client and elaborate. A year back the business management was pushing to implement an AI-driven customer service chatbot, which was the need of the hour, and went live with some cutting-edge services and technology. However, since its implementation, the chatbot did not see much traffic. As I tried to understand the reasons were several:-


  1. Poor integrations to existing systems like CRM, customer service tools, or even marketing automation. This meant the chatbot could not even access or update customer information in real time. Everything was done manually.
  2. It lacked typical customer interaction functionalities like personalization, order tracking, appointment scheduling, and even FAQs efficiently as it lacked automated processes.
  3. It could not seamlessly hand off to a human agent 
  4. finally, the bot engine lacked sufficient training and updates.

All of the above reasons are directly related to the lack of automation in various aspects of IT and business.


One initiative that hopefully works is to begin by asking teams to map out their current automated processes. This exercise usually reveals significant gaps and helps shift the focus from AI to necessary automation steps.


As we read and learn from others successful AI implementation is a journey, not a destination. It requires a solid foundation of automated processes, clean data, and a clear understanding of organizational goals. Until this reality is grasped, AI initiatives will continue to fall short of expectations.

Friday, July 19, 2024

Learnings from Microsoft Global outages due to Crowdstrike incident — An Architect’s view

Today’s Global system outage finally got us some action to follow on in these quiet few weeks of summer vacation. 

Ironically, Microsoft themselves have so much content published to avoid a single point of failure, implementing robust testing and effective rollback/roll forward mechanisms, designing graceful degradation, diversifying critical infra, and the list goes on.

As an Architect,  it's an apt problem to preach upon and a perfect example to learn so many anti-patterns and what can go wrong if we are not careful with our simple system designs. I wanted to share some thoughts on what we should avoid to prevent similar issues in any IT system or landscape.

Don't Put All Your Eggs in One Basket

The first and foremost principle is to avoid a single point of failure. Relying too much on one vendor, service or solution is always risky. It's like putting all your eggs in one basket. If that basket falls, you're in big trouble. We need to mix things up and have backup plans.

Test, Test, and Test Again

We have heard the saying, "Measure twice, cut once"? in IT it's more like "test a hundred times, deploy once." We can't just roll out updates and hope for the best. We need to test thoroughly in a safe and like environment first.

Have an "Undo" Button

Sometimes, things go wrong no matter how careful we are. That's why we need a way to undo changes quickly. It's like having a time machine for our systems. If we can't roll back or roll forward easily, small problems can soon turn into big headaches.

Keep the Lines of Communication Open

When things go south, we need to be able to talk to everyone affected. It's not just about fixing the problem, it's about keeping people in the loop. We should have multiple ways to reach out and give updates.

Plan for the Worst

Our systems should be like cats - able to land on their feet. Even if part of the system fails, the rest should keep working. It's about being prepared for the worst while hoping for the best.

Know Your Weak Spots

We should regularly check our technology supply chain. Who and what third-party systems, services, and tools are we depending on? What could go wrong? It's like doing a health check-up but for our IT systems.

Change with Care

Rushing changes is asking for trouble, especially in production. We need a solid process for making updates. Think of it like air traffic control for our systems - everything needs to be cleared before it takes off.

Don't Put All Your Faith in One System

Using the same operating system or platform for everything is convenient, but risky. It's good to mix things up a bit. That way, if one system has issues, not everything should go down.

In the end, it's all about being prepared and thinking ahead. For me, the CrowdStrike incident is not a surprise and it's more of a wake-up call for all of us in IT. We need to learn from this to build stronger, more reliable systems that can weather any storm. 

Building Microservices by decreasing Entropy and increasing Negentropy - Series Part 5

Microservice’s journey is all about gradually overhaul, every time you make a change you need to keep the system in a better state or the ...