Monday, August 12, 2024

The Burden of Network on Architecture - Part 1 Network Bandwidth

IT infrastructure in most organizations is a separate department that operates and manages the IT environment of an enterprise. We as Solution Architects are confined to our landscape and very seldom dive deep into the hardware and network services that directly impact our applications. 

One of the critical factor is the Network Bandwidth that can make or break an Architecture's performance. Network bandwidths are not infinite and directly impact the Cost, Speed, and Performance of applications. It is very essential to understand how much bandwidth is allocated to a given set of applications and what all applications are controlled by the network. As the network traffic increases, the network bandwidth increases clogging the applications.  

I once at a client encountered a situation in an on-premise environment where a massive 10 TB file transfer brought an entire network to its knees. The transfer was initiated without proper planning, and it quickly saturated the available bandwidth. As a result, critical business applications slowed down, and some systems even crashed due to timeout errors. Employees couldn't access essential resources, and customer-facing services experienced significant delays.

This incident taught the importance of implementing robust traffic shaping and prioritization mechanisms. Post that incident, the client network team always had a bandwidth alert. They also ensured that large data transfers were scheduled during off-peak hours and that critical services have guaranteed bandwidth allocations. 

Perils of Holiday Code Freezes

Holiday Code Freeze has now become an age-old practice where deployments are stopped, and it's aimed to reduce any risk of new bugs or issues in the system when a majority of support staff is away. However, this also means the servers and infrastructure are left untouched for several weeks, and surprisingly, in my experience, this can be a bit chaotic. 

Application and Database Leaks

First and foremost, the most common error noticed is the Application leak. I recall one particular instance where an e-commerce application began to slow down significantly a week into the holiday break. The application retained references to several objects that were not required, causing the heap memory to fill up gradually. As the memory usage increased, the application became sluggish, eventually leading to crashes and "out of memory" errors.

Leaks can also happen when connecting to a database. In another example, the same e-commerce application began experiencing intermittent outages. The root cause was traced to a connection leak where the application was not releasing database connections after it was used. As the number of open connections grew, the database server eventually refused new connections, causing the application to crash. 

Similarly, I have also experienced code freeze situations, where thread leaks were consuming system resources and slowing down the application. This typically happens when threads are created but not terminated. 

Array index out-of-bounds errors

Another issue I recently encountered during a recent freeze was the Array index out-of-bounds error. The application was a CMS, and system downtime started in the middle of a week when an application tried to access an index in an array that didn't exist. It happened due to unexpected input and data changes not accounted for in the custom code.

Array Index out-of-bound exceptions can also be caused by data mismatch when interacting with external services or APIs, not under code freeze. Once, during a holiday season, a financial reporting application began throwing array index out-of-bounds exceptions. The root cause was traced back to an external data feed that had changed its format. The application was expecting a certain number of fields, but the external feed had added additional fields, causing the application to attempt to access non-existent indices. It led to errors that took the application offline until a patch was deployed after the freeze.

Cache Corruption

Cache corruption is another potential way of bringing down heavy cache-dependent applications. In online real-time applications, caches improve the application performances, but, on several occasions, I have seen over time, if not cleared, caches can become corrupt, leading to stale and receiving of incorrect data.

COnclusion

While it's funny that IT stakeholders think that the code freeze aims to maintain stability, in most cases, they expose underlying issues that might not be apparent during regular operations. The even more funnier thing is that a majority of the time, these issues are resolved by a simple server restart.

Wednesday, August 7, 2024

Extracting running data out of NRC/Nike + (Nike Run Club) using API's

For the past few weeks, I have been struggling to see the running kilometers getting updated in my  Nike + App. It could be a bug or a weird feature of the app and since this was kind of a demotivation, I decided to go ahead and create my own dashboard to calculate the results. Also, for some reason, Nike discontinued viewing and editing activities on the web.

Considering I had about 8 years of data and you never know when this kind of apps stop to exist or when they become paid versions. It's always better to persist your data to a known source and if required use it to feed it into any other application. I also went ahead and uploaded my data to UnderArmour's "MapMyFitness" App which has much better open-source documentation. 

It turns out that there is a lot of additional information the NRC app captures which are typically not shown on the mobile app. Few of the information include 

  1. Total Steps during the workout including detail split between intervals
  2. Weather Details during the workout 
  3. Amount of the time the workout was halted for 
  4. Location details including latitude and longitude information that can help you plot your own Map

Coming to the API part, I could not get hold of any official Nike documentation, but came across some older blogs https://gist.github.com/niw/858c1ecaef89858893681e46db63db66 in which they mentioned few API endpoints to fetch the historic activities. I ended up creating a  spring-boot version of fetching the activities and storing it in a CSV format in my Google Drive. 

The code can be downloaded here ->  https://github.com/shailendrabhatt/Nike-run-stats (currently unavailable)

The code also includes a postman repository which contains a Collection that can also be used to fetch one's activities. Just update the {{access_token}} and run the Get requests.

While the blog that had details of the API was good enough, a few tips that can be helpful 

  • Fetching the Authorization token can be tricky and it has an expiry time. For that, you will need a https://www.nike.com/se/en/nrc-app account and fetch the authorization token from the XML HTTP request headers for the URL type api.nike.com. There are few requests hitting this URL and the token can be fetched from any of them.
  • The API described in the link shows details of after_time, one can also fetch before_time information 
/sport/v3/me/activities/after_time/${time}
/sport/v3/me/activities/before_time/${time} 
  • Pagination can be easily achieved using the before_id and after_id. These ids are of different formats ranging from GUIDs to a single-digit number and can be confusing.

Saturday, August 3, 2024

Instilling the idea of Sustainability into Development Teams

Inculcating Green coding practices and patterns with the development team is a modern-day challenge. It can go a long way to reducing the carbon footprints and long-term sustainable goals of an organization. 

Good Green coding practices improve the quality of the software application and directly impact the energy efficiency on which the software applications are running. However, the software developers of today's agile work environment seldom focus away from rapid solution building in reduced sprint cycles. They have all the modern frameworks and libraries at their behest, and writing energy-efficient code is not always the focus. Furthermore, modern data centers and cloud infrastructure provide developers with unlimited resources resulting in high energy consumption and impacting the environment. 

Below are some of the factors that improve the programming practices and can show a drastic impact on the Green Index 

a) Fundamental Programming Practices

Some of the fundamental programming practices start with proper Error and Exception handling. It also includes paying extra attention to the modularity and structure of the code and being prepared for unexpected deviation and behavior, especially when integrating with a different component or system.

b) Efficient Code Development

Efficient code development helps to make the code more readable and maintainable. Efficient code writing includes avoiding memory leaks, high CPU cycles, and managing network and Infrasturc Storage in a proficient manner. It also includes avoiding expensive calls and unnecessary loops, and eliminating unessential operations. 

c) Secured Programming Mindset

A secured programming mindset ensures that the software application has no weak security features or vulnerabilities. Secured programming also includes protecting data, data encoding, and encryption. OWASP vulnerability list awareness and performing timely Penetration testing assessments ensure the application code is compliant with the required level of security.

d) Avoidance of Complexity

A complex code is the least modified and the hardest to follow. A piece of code developed may in the future be modified by several different developers, and hence avoiding complexity when writing code can go a long way to keep the code maintainable. Reducing the cyclomatic complexity of the methods by dividing the code and logic into smaller reusable components helps the code to remain simple and easy to understand. 

e) Clean Architecture concepts

Understanding Clean Architecture concepts is essential to allow changes to the codebases. In a layered architecture, understanding the concerns of tight coupling and weak cohesion helps in reusability, minimal disruption to changes, and avoids rewriting code and capabilities. 

Conclusion

As Architects and developers, it is essential to collect Green metrics on a timely basis and evaluate the compliance and violations of the code. Measuring these coding practices can be done using various static code analysis tools. The tools can further be integrated into the IDE, at the code compilation or deployment layer, or even as a standalone tool. 

With organizations in several industries now focusing on individual sustainability goals, green coding practices have become an integral part of every software developer. The little tweaks to our development approach can immensely contribute to the environmental impact in the long run.

Building Microservices by decreasing Entropy and increasing Negentropy - Series Part 5

Microservice’s journey is all about gradually overhaul, every time you make a change you need to keep the system in a better state or the ...