Road to Cloud

The Sweet Spot for Azure Kubernetes Service Multi-Tenancy

2024-12-26T00:00:00+00:00

Introduction

Created with DALL-E 3

Azure Kubernetes Service (AKS) is a powerful tool for managing containerized applications at scale. However, as organizations adopt AKS, the question of multi-tenancy often arises: how do you design an architecture that balances reliability, cost, and security when serving multiple tenants? For smaller companies with limited funding, where cost-optimization is paramount, this question is particularly crucial.

Having worked at a large Fortune 50 organization, where cloud cost was not always a critical strategic decider, I witnessed firsthand how cloud costs, initially seen as manageable, ballooned over time and became a significant issue. Cloud costs became such a significant issue, that they contributed to cost-cutting across IT, including layoffs (or RIFs as they are colloquially known). Now, working for a smaller organization, I’ve come to appreciate the “magic triangle” of cloud cost (directly related to funding), reliability and security. These three elements must work hand in hand to build a sustainable, scalable, secure and cost-conscious infrastructure. What does this mean in the context of Azure and Kubernetes?

The answer lies in identifying the “sweet spot” for multi-tenancy. While the Azure Architecture Center outlines several approaches with pros and cons, there is a sweet spot for startups and smaller organizations. This post explores that sweet spot: using namespaces as tenant separators to separate transactional workloads, combined with strict security measures to maintain isolation and control.

Why Namespace-Based Multi-Tenancy?

Namespaces in Kubernetes provide logical separation within a cluster, making them an ideal candidate for multi-tenancy in cost-conscious environments. Here are a few reasons why namespace-based separation is appealing:

Cost Efficiency: Running separate clusters for each tenant can be prohibitively expensive. By sharing a single AKS cluster, you reduce the operational and financial overhead associated with maintaining multiple clusters.
Resource Management: Kubernetes namespaces allow you to apply resource quotas, limits, and policies at the namespace level. This enables you to allocate resources fairly and prevent tenants from over-consuming cluster resources.
Scalability: A namespace-based approach is well-suited to scaling tenant workloads within a single cluster, allowing smaller companies to start lean and expand gradually.

Achieving Secure Namespace Separation

Security is a common concern when sharing a cluster across multiple tenants. To address this, it is critical to implement robust measures that prevent cross-tenant interference.

Kubernetes’ built-in network policies can enforce strict traffic isolation. For example, you can configure network policies to:

Block communication between namespaces.
Allow traffic only from trusted ingress controllers.
Restrict egress traffic to specific external endpoints.

To limit network traffic for pods within a namespace so that they can only communicate with other pods in the same namespace, you can define a Kubernetes NetworkPolicy as follows:

   apiVersion: networking.k8s.io/v1
   kind: NetworkPolicy
   metadata:
     name: 
     namespace: 
   spec:
     podSelector: {}
     policyTypes:
     - Ingress
     - Egress
     ingress:
     - from:
       - podSelector: {}
     egress:
     - to:
       - podSelector: {}

There you have it! Pretty simple and straightforward. We can use NetworkPolicy to balance security and optimize cost using a single Kubernetes cluster. For more information and more complex scenarios, take a look at the official Kubernetes Network Policies documentation.

Separate Resource Groups

However, here I must caution you against using namespaces for anything but purely transactional workloads that do not store data. While technically possible, I find it more challenging. While namespaces handle separation within a cluster for transaction based processing through pods, other Azure resources that store data (such as Storage Accounts, Databases, Key Vaults, etc.) should reside in tenant-specific Resource Groups (RGs). Data needs to be strictly separated to maintain compliance and security. Azure Platform as a Service (PaaS) offerings separated in tenant-specific RGs offload operational burden for organizations with limited staff and funding.

Final Thoughts

For startups and smaller companies, namespace-based multi-tenancy offers a pragmatic balance between cost and control. By leveraging namespaces for separation and implementing strict security measures, you can create a multi-tenant architecture that scales with your business.

PaaS Showdown: Choosing a Hosting Platform for a Ruby on Rails Application

2024-04-05T00:00:00+00:00

Created with DALL-E 3

Introduction

Nowdays there are many ways to host your web app ranging from containerization for portability to full-fledged IaaS (Infrastructure as a Service) deployments. However, these two options come with significant management overhead, which can be quite challenging. Another solution is choosing a platform, which manages this for you; this is often referred to as Platform as a Service (PaaS). Two popular choices represent Azure and Heroku. However, the choice between these two depends on individual application as well as business requirements. Let’s assume this scenario: a team is using Azure currently, but wants to consider Heroku. Let’s take a look at how this shakes out!

Requirements

In Scope

How would you begin analyzing a showdown between Azure and Heroku? Obviously, we’ll need to settle on some requirements. For this, let’s consider two important benchmarks: security and performance. This is a business critical application so performance is paramount. We’ll also need to make sure to host our database in Azure. Sensitive data will be flowing, so we’ll need to take that into account.

Additional Assumptions

Let’s assume this Ruby on Rails application requires pretty significant up-time. As a result, this business critical application requires High Availability (HA). Additionally, usually it is best practice for companies to advertise one IP address to the world. Multiple IP addresses can present challenges and unnecessary security risks. This analysis also assumes that SNAT (Source Network Address Translation) is required, so that only one IP address is advertised. This will avoid exposing additional attack vectors. Finally, one last assumption: this is a standard 3-tier app with a web front-end, an API middle tier, and a back-end database tier.

Out of Scope

For this analysis I have ommitted focusing on DevOps and CI/CD and how that could affect the outcome. Both platforms offer support for Git and should offer support for popular code repositories like GitHub. However, this adds an aditional layer of complexity and takes away from the main concerns of analyzing the impact on performance and security. DevOps and CI/CD practices wouldn’t necessarily change this analysis focused on security and performance.

Similarly, I have ommitted disaster recovery (DR), as this, again, introduces an additional layer of complexity and would require global load balancers (like Traffic Manager). This takes away from the main two requirements of analyzing security and performance. This would not change the analysis on performance and security.

Finally, I have also ommitted showing a Content Delivery Network (CDN) for cashing, which, for a critical application should probably be part of the architecture.

Option 1: Azure

Architecture Option 1

First, we’ll require a Hub-and-Spoke architecture. Most LOB (line of business) applications should be separated into their own subscriptions and as a result their own virtual networks (VNets). This creates an additional layer of security as it makes a blast radius of an attack smaller. Take a look at the Azure’s Cloud Adoption Framework Landing Zones for a more detailed deep dive on this. The only public endpoint will be a public IP attached to an Application Gateway (Azure’s Layer 7 load balancer), which will use the WAFv2 SKU to protect incoming web traffic. For application hosting, we should utilize an isolated Internal Load Balancer (ILB) Application Serivce Environment (ASE). This will ensure private traffic for security. App Services, Azure’s PaaS option, in an isolated environment can host a Ruby on Rails application either as a Linux web app or inside a Docker container. ASEs are great for workloads that require

High scale
Isolation and secure network access
High memory utilization
High requests per second (RPS). You can create multiple App Service Environments in a single Azure region or across multiple Azure regions. This flexibility makes an App Service Environment ideal for horizontally scaling stateless applications with a high RPS requirement.

Additionally, to further keep traffic private, we’ll utilize Private Endpoints to connect to an Azure SQL database. This combination offers full control of traffic flow, and is completely isolated over private network connections within a single LOB VNet with private IPs. This demonstrates network enclaving and offers protection and reduces the blast radius should an attack by a malicious actor occur.

Finally, in order to satisfy our HA requirement for this business-critical application, we’ll utilize zone-redundant deployments for all of our resources.

(Click to enlarge!)

Pros

Team is familiar with and currently uses Azure
Network enclaving increases security and reduces risk
Fine-grained security control
Better performance, especially when both application and database are hosted on Azure
Integration with Microsoft Ecosystem

Cons

App Service support for Ruby has ended, requiring containerization
Infrastructure complexity
Management overhead

Architecture Option 2

Another similar option to ASEs represents Azure Container Apps (ACA). Ever wanted to deploy your containers to Kubernetes but found all the plumbing too complicated? This is where ACA comes in as a managed Kubernetes platform. Whereas Azure Kubernetes Service (AKS) only abstracts away the Kubernetes control plane, ACA takes it a step further and also abstracts away everything else. ACA represents a fully managed Kubernetes PaaS. Most of the architecture will be similar to our ASE architecture above, but now instead of App Services, we are using ACA containers. The architecture appears simpler, cleaner. We maintain security through enclaving and deploying the ACA into an existing VNet with existing security controls. For more information on this see Azure Container Apps Environment Security.

(Click to enlarge!)

Pros

Team is familiar with and currently uses Azure
Network enclaving increases security and reduces risk
Fine-grained security control
Better performance, especially when both application and database are hosted on Azure
Integration with Microsoft Ecosystem
Includes functionality for auto-scaling and versioning

Cons

Kubernetes-specific knowledge still required to reap most benefits
Infrastructure complexity
Management overhead

Option 2: Heroku

Architecture Option 1

The combination of Heroku application hosting and Azure database hosting creates additional complexity. For security, right out of the gate we want to use Heroku Shield Spaces. This will offer isolation and security similar to Azure Application Service Environments (ASEs) or Azure Container App (ACA) Environments for isolated hosting. Here are the major selling points for Heroku Shield:

Dedicated environment for high compliance apps
Ability to sign BAAs for HIPAA compliance
PCI compliance
Keystroke logging
Space level log drains
Strict TLS enforcement

However, we still need to ensure network enclaving for Azure and individual VNets for LOB applications. Additionally, we’ll need to secure traffic in transit between our Ruby on Rails application hosted in Heroku and our database hosted in Azure. For this, we’ll use a Site-to-Site VPN at a minimum. Heroku offers this very capability with Shield Space VPN Connections. This VPN tunnel offers redundancy and IPSec encryption. (Another option would be to use HTTPS/TLS encrypted traffic, but this would not only introduce additional latency, it would also let our sensitive data flow accross the internet. This is the least desirable option and I would not recommend this.)

Finally, we’ll also need to still utilize the same Site-to-Site VPN tunnel for egress traffic from Heroku through our Azure VNet in order to only expose one public business IP address using SNAT.

(Click to enlarge!)

Pros

Heroku Shield is offered for high compliance applications
Ruby applications can run natively and do not require containerization
Managed infrastructure offers simplicity
Applications can be up and running fast

Cons

Limited control due to managed infrastructure
Complicated network routing with additional hops and points of failure between Heroku and Azure
Degraded performance due to additional Site-to-Site VPN hops, which introduce additional latency
Increased security risk because traffic flows over open internet, even though it is encrypted over a VPN tunnel
Team has not worked with Heroku
Potential for vendor lock-in

Architecture Option 2

But what would our Heroku architecture look like if both our web app and database were hosted in Heroku? As can be seen in the diagram below, this represents a massive simplification of the architecture. Not only do we drop the need to containerize our Ruby on Rails application as is the case with Azure PaaS services, but now we can also utilize Heroku Postgres for Shield Spaces. Our entire architecture can now be hosted in Heroku, inside an isolated, secure environment built specifically for high compliance applications.

(Click to enlarge!)

Pros

Heroku Shield is offered for high compliance applications
Ruby applications can run natively and do not require containerization
Managed infrastructure offers simplicity
Applications can be up and running fast
Heroku Postgres for Shield Spaces addition massively simplifies architecture

Cons

Limited control due to managed infrastructure
Team has not worked with Heroku
Potential for vendor lock-in

Recommendation

The choice between Azure and Heroku application hosting for Ruby on Rails depends on project size, complexity, priorities, level of needed control and business direction. Azure offers ASEs and ACAs as PaaS options for performance and high scale through isolation. This is ideal for larger, complex applications requiring flexibility, scalability and fine-grained control. Similarly, a team’s familiarity and built-out processes currently existing with Azure should not be discounted, because migrating those to Heroku could introduce additional cost and risk.

However, Heroku could offer faster application deployments for tasks such as prototyping and proof of concepts. Heroku is great for simplicity and fast deployment, because it does not require containerization for Ruby on Rails applications. In addition, if we drop the requirement to keep data on Azure, Heroku offers a clear choice - it massively simplifies the cloud architecture and maintainability. However, if there is a hard requirement to keep data within Azure, this creates additional complexity and really handicaps Heroku as a choice, because a Site-to-Site VPN would degrade performance and decrease security.

In the end, if a team currently uses containers to deploy a Ruby on Rails application, and absolutely must maintain data within Azure, I would stick to either Azure PaaS offering for web app hosting - Azure Application Service Environments or Azure Container Appps. The added overhead of setting up and maintaining a Site-to-Site VPN between Heroku and Azure really opens up security risks and decreases performance. However, without the data in Azure requirement, Heroku would present a clear winner and a much better choice for developer experience, decreased management overhead and deployment simplicity. The choice depends on data hosting preference, future business direction and a more detailed price comparison.

Deploy Azure Kubernetes Service (AKS) using GitHub and Terraform

2024-03-22T00:00:00+00:00

TL;DR

If you wanna jump straight into it, check out my GitHub repository for this project here. This includes a step-by-step guide, including prerequisites, configuration and deployment of AKS using OIDC, Terraform, GitHub Actions. The goal of this project is to demonstrate the three (3) main points below:

Use of OIDC for Microsoft Entra and GitHub Actions authentication,
Use of Azure Storage Account as Terraform Backend for state storage, and
Deployment of minimum viable product (MVP) Azure Kubernetes Service (AKS).

Flexibility, Portability and Security with Kubernetes, Terraform, GitHub, and OpenID Connect (OIDC)

Let’s talk about something that sounds complicated but is actually not as bad as it might seem: deploying Azure Kubernetes Service (AKS) using Terraform, orchestrated through GitHub Actions. And the cherry on top? Let’s make this entire process seamless and more secure by integrating OpenID Connect (OIDC) for authentication. This will give your developer experience a major upgrade!

Keeping It Simple with Trunk-Based Development

For this project, I kept is simple with a Trunk-Based Development strategy. Simply put, it honestly makes the most sense for a single person, or a very small highly skilled team working on a project. For a complete overview of all the different ways you can structure your repositories and workflow strategies, see my blog post on Branching Strategies.

Goodbye Passwords, Hello OIDC

Imagine deploying infrastructure without worrying about safeguarding a pile of secrets or passwords. From an engineer’s or software developer’s lense, this presents a host of challenges. How do you store, manage and rotate those secrets and passwords? Thanks to OIDC, that daunting endeavor is a thing of the past! OIDC simplifies authenticating with Azure, meaning less time fretting over security and more time doing what you love: crafting great code. It’s a smarter, streamlined way to work, and honestly, who doesn’t want that? For more information on how OIDC works with Entra, Azure, GitHub and Azure DevOps, see my blog post titled Authenticating GitHub and Azure DevOps using OpenID Connect. This project uses OIDC. It just makes sense.

Flexibility and Portability

Kubernetes

Why make a big deal about Kubernetes, or K8s as the cool kids call it? Simply put, it gives you the flexibility to manage your applications with ease, regardless of where you choose to run them. The major Cloud Service Providers (CSPs) have all jumped on board - Microsoft with Azure Kubernetes Service (AKS), AWS with Elastic Kubernetes Service (EKS), and Google Cloud Platform (GCP) with Google Kubernetes Service (GKS). Think of K8s as your cloud Swiss Army knife, ready to adapt to whatever your project needs. K8s represents a cloud-agnostic service, pitting the major CSPs against each other to earn your business. Competition is a good thing for the consumer (i.e. you and me!).

Terraform

Finally, what’s all the Terraform fuss about? Some might say - can’t I just use Azure Resource Manager (ARM) Templates? Well, technically, yes. But when you learn Terraform, a cross-CSP Infrastructure as Code (IaC) declarative deployment tool, it enables flexibility for IaC just like K8s does for application hosting. A CSP decides to raise prices for infrastrucuture? Don’t fret, take your Terraform and Kubernetes knowledge to another vendor! Once again, competition is the name of the game. Let’ the CSPs compete for your business!

Why This All Matters

Let’s make sure we hit the point of this post home. In the grand scheme of things, embracing tools such as OIDC, Kubernetes, Terraform and GitHub is about remaining flexible and secure in a technology landscape that is evolving at a rapid pace. It’s about ensuring that as developers, architects, or managers, we’re not just keeping pace but setting the pace, ready to adapt and thrive no matter what comes our way.

So, here’s to making deployment a smoother, more secure part of our development journey. Ready to take the next step? Let’s dive in and see just how much easier and more enjoyable your work can become. Check out the entire repo here!

Brewing Healthy Cloud Applications

2024-03-04T00:00:00+00:00

Introduction

One common misconception is that highly-available, resilient and elastic cloud architectures represent a silver bullet. This is not necessarily the case. Cloud architecture is tightly coupled to healthy programming techniques - a healthy cloud architecture and healthy code hosted on those architectues are two sides of the same coin. This fundamental concept often gets lost. Consider having the most resilient, multi-AZ (Availability Zone), multi-Region cloud architecture that unscrupulously autoscales additional nodes horizontally to handle additional load caused by sudden spikes of REST API calls. Now also consider that these additional calls are caused by unhealthy code. This is not only inefficient and can crash a REST API, but it can also crash an entire cloud infrastructure and cause runaway costs incurred by additional horizontal scaling. In this post, let’s examine healthy programming techniques such as reusing HTTP connections and building thread-safe code with a coffee shop metaphor.

In the world of software development, optimizing cloud application performance, especially REST APIs, is similar to managing a highly efficient coffee shop. Just as a coffee shop tries to serve as many customers as efficiently as possible, cloud applications try to handle requests and tasks effectively. What can coffee shops teach us about reusing HTTP connections, thread-safe programming, avoiding deadlocks, and employing the singleton pattern for better performance and resource management? Let’s dive in!

Reusing HTTP Connections: The Art of Efficient Service

Imagine as soon as you walk into your favorite coffee shop, the barista immediately serves your favorite coffee from a large coffee pot. I’d call that efficient service.

Creating a new connection for each HTTP request (similar to grinding new coffee beans for every cup) can put additional load on your web servers, aka cloud compute nodes. This can quickly overwhelm a node in a cloud architecture, causing autoscaling to kick in. Modern HTTP client libraries use connection pooling, similar to a coffee shop having a large pot of coffee ready to serve multiple customers quickly. This approach minimizes the overhead of establishing new connections for every request, similar to avoiding the time-consuming process of grinding beans and brewing coffee for every single cup of coffee.

Singleton Pattern: One Coffee Machine to Serve Them All

The singleton pattern in software development is like having a single, highly efficient coffee machine in a shop that all baristas share. This pattern ensures that only one instance of a resource (e.g., an HttpClient in .NET) is created and reused across the application, optimizing resource cloud hosting cost and avoiding runaway autoscaling of cloud compute nodes.

Dependency Injection (DI)

Use dependency injection to implement the singleton pattern - it is like a coffee shop where the manager ensures that all baristas use the same coffee machine, maintaining efficiency and consistency. It allows for flexible configuration and easy sharing of the coffee machine (or HttpClient) across different parts of the application.

In a .NET Core or .NET 5/6/7/8 application, you can use the built-in DI container to manage HttpClient instances efficiently. This approach ensures that HttpClient instances are reused properly, which is crucial for managing connections and resources effectively.

Step 1: Define Typed Client

First, create a class that will serve as your typed client. This class will encapsulate all logic for making HTTP requests to a specific external service.

using System.Net.Http;
using System.Threading.Tasks;

public class CoffeeServiceClient
{
    private readonly HttpClient _httpClient;

    public CoffeeServiceClient(HttpClient httpClient)
    {
        _httpClient = httpClient;
        // Assuming the external service requires an API key in the header
        _httpClient.DefaultRequestHeaders.Add("ApiKey", "YourApiKeyHere");
	// Other configuration goes here
    }

    public async Task<string> GetCoffeeAsync(string typeOfRoast)
    {
        var response = await _httpClient.GetAsync($"coffee/{typeOfRoast}");
        response.EnsureSuccessStatusCode();
        return await response.Content.ReadAsStringAsync();
    }
}

Step 2: Configure the Typed Client in Program.cs:

In your Program.cs, register the typed client with the dependency injection (DI) container using AddHttpClient. This method allows you to configure the HttpClient that will be injected into your typed client.

using Microsoft.AspNetCore.Builder;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;

var builder = WebApplication.CreateBuilder(args);

// Register the typed client with HttpClient configured
builder.Services.AddHttpClient<CoffeeServiceClient>(client =>
{
    client.BaseAddress = new Uri("https://api.coffeeapi.com/v1/");
    // Set other default configurations here
});

var app = builder.Build();

// Map controllers and run app (assuming you're using controllers)
app.MapControllers();

app.Run();

Step 3: Use the Typed Client in a Controller

Inject the typed client into your controllers or services where you need to make HTTP requests.

using Microsoft.AspNetCore.Mvc;
using System.Threading.Tasks;

[ApiController]
[Route("[controller]")]
public class CoffeeController : ControllerBase
{
    private readonly CoffeeServiceClient _coffeeServiceClient;

    public CoffeeController(CoffeeServiceClient coffeeServiceClient)
    {
        _coffeeServiceClient = coffeeServiceClient;
    }

    [HttpGet("{typeOfRoast}")]
    public async Task<IActionResult> Get(string typeOfRoast)
    {
        try
        {
            var data = await _coffeeServiceClient.GetCoffeeAsync(typeOfRoast);
            return Ok(data);
        }
        catch (HttpRequestException e)
        {
            return StatusCode(500, e.Message);
        }
    }
}

Thread-Safe Programming: The Coordination of Multiple Baristas

In a busy coffee shop, multiple baristas (threads) work in parallel to serve customers efficiently. However, without proper coordination, they might bump into each other or duplicate efforts, leading to wasted resources and time. Similarly, in software applications, thread-safe programming ensures that multiple threads access shared resources (like a shared coffee machine) in a manner that prevents conflicts and ensures consistency.

Key Concepts

First, some key concepts, simplified:

Synchronization Context: Imagine the coffee shop has a rule: When your coffee is ready, it must be handed to you personally, and you must receive it where you placed the order. This “personal handover” rule is like the synchronization context in programming, ensuring some tasks are completed in a specific “place” or thread.
Asynchronous Task (async/await): An asynchronous task is like ordering a coffee at a busy coffee shop. You place your order (async) and then wait (await) for your name to be called when the coffee is ready. While waiting, you can do other things instead of standing still and blocking the line of other people wanting to order.
Blocking Calls (.Result or .Wait()): This is like insisting on standing at the counter, staring at the barista uncomfortably until your coffee is ready, not doing anything else, and not letting anyone else order.

Deadlock Scenario: The Uncomfortable Stare

This example demonstrates how using .Result or .Wait() in a context where the synchronization context is captured can lead to a deadlock:

[HttpGet("{typeOfRoast}")]
public IActionResult Get(string typeOfRoast)
{
    try
    {
        // Incorrectly using .Result can lead to a deadlock
        var data = _coffeeServiceClient.GetCoffeeAsync(typeOfRoast).Result;
        return Ok(data);
    }
    catch (HttpRequestException e)
    {
        return StatusCode(500, e.Message);
    }
}

Placing the Order (Calling the Async Method): You go to the coffee shop and order a coffee (call an asynchronous method, GetCoffeeAsync). The coffee shop is busy (the system is doing work), so you’re told to wait for your name to be called (the async operation will complete in the future).
Waiting for the Coffee (Awaiting the Async Task): Instead of waiting normally, you decide to stand at the counter, not move and stare at the barista uncomfortably until you get your coffee (using .Result or .Wait()), effectively blocking anyone else from ordering (blocking the main thread).
The Coffee Shop Rule (Synchronization Context): The coffee shop has a rule: Coffee must be handed to you personally at the counter (the continuation after an await must happen on the original synchronization context, or thread). But, because you’re blocking the counter (the main thread), the barista can’t serve anyone else, nor can they complete your order, because they need you to step aside to finish it (the async operation needs the blocked thread to continue).
The Deadlock: You’re waiting for your coffee to be ready before you move (waiting for the async operation to complete), but the coffee shop can’t finish making your coffee until you stop blocking the counter and stop making the barista uncomfortable with your stare (the system can’t complete the async operation because it’s waiting for the blocked thread to become available). As a result, everything stops. You’re not moving. The barista can’t serve your coffee. No one else can order. This standstill is the deadlock.

Avoiding Deadlocks

To avoid the deadlock, you can ensure that the asynchronous method is allowed to complete without blocking the main thread. Here’s how you can do it:

[HttpGet("{typeOfRoast}")]
public async Task<IActionResult> Get(string typeOfRoast)
{
    try
    {
        // Properly awaiting the asynchronous operation
        var data = await _coffeeServiceClient.GetCoffeeAsync(typeOfRoast);
        return Ok(data);
    }
    catch (HttpRequestException e)
    {
        return StatusCode(500, e.Message);
    }
}

Just like in the coffee shop, where you could wait for your coffee without blocking the counter (maybe sit down or step aside), programming has a way to avoid deadlocks.

Use await properly: This is like waiting for your coffee without blocking the counter. You allow other customers to order, and the barista to serve other orders while yours is being prepared.
ConfigureAwait(false): This tells the system, “I don’t need to receive my coffee exactly at the counter where I ordered. You can call my name, and I’ll pick it up wherever I am.” This means the continuation of your task doesn’t need to be on the original thread, avoiding the need for you to block any “place” or thread.

Conclusion

Just as the goal of a coffee shop is to serve its customers efficiently and effectively, the goal of software development is to create applications that handle tasks and requests with optimal performance. By drawing lessons from the operation of a coffee shop—reusing resources like HTTP connections, ensuring thread-safe programming, avoiding deadlocks, and efficiently managing shared resources through the singleton pattern—we can brew applications that stand out for their performance and reliability, much like a cup of finely crafted coffee.

Authenticating GitHub and Azure DevOps using OpenID Connect

2024-03-01T00:00:00+00:00

Introduction

What if I told you, we could improve developer exprience and security with DevOps pipelines? Enter OpenID Connect (OIDC) to the rescue! Read on for more info!

Traditional Secrets Management

Traditionally teams have used Entra ID (formerly Azure AD, RIP) service principals to allow Azure DevOps and GitHub Workflows to deploy resources using Azure Resource Manager to Azure. This included setting up App Registrations, which are service principals with appropriate role-based access controls (RBAC). This is still documented in Microsoft’s documentation here.

The problem with this approach is that now we are creating a client secret, basically a password, which has to be stored, managed and secured somewhere, just like any other password. This could be Azure KeyVault or HashiCorp Vault. This creates a security risk, in case this client secret (password) gets exposed. This also presents a developer experience that is not ideal.

A Better Approach with OIDC

A solution that is becoming increasingly popular involves using token exchange with OpenID Connect (OIDC). Microsoft even recommends this in their latest and greatest documentation:

However, using hardcoded secrets requires you to create credentials in the cloud provider and then duplicate them in GitHub as a secret. With OpenID Connect (OIDC), you can take a different approach by configuring your workflow to request a short-lived access token directly from the cloud provider. Your cloud provider also needs to support OIDC on their end, and you must configure a trust relationship that controls which workflows are able to request the access tokens. Providers that currently support OIDC include Amazon Web Services, Azure, Google Cloud Platform, and HashiCorp Vault, among others.

A step-by-step guide to configure OIDC using Workload Identity Federation can be found here.

Azure DevOps, Entra ID OIDC Authentication

A picture is worth a thousand words. I’ve laid out the authentication flow in the diagram below for Azue DevOps. No more secrets management!

Azure DevOps, Entra ID, Azure Flow

Microsoft’s Chief Architect John Savill Explains

Microsoft’s Chief Architect John Savill explains OIDC authentication with Workload Identity Federation in his awesome YouTube videos that I have linked below.

GitHub, Entra ID OIDC Authentication

How does this flow look for GitHub and GitHub Actions Workflows? Well pretty similar to Azure DevOps and Azure Pipelines. Check it out, gone are secrets!

GitHub, Entra ID, Azure Flow

Microsoft’s Chief Architect John Savill Explains

Once again, also check out Microsoft’s Chief Architect John Savill explain this OIDC flow for GitHub.

Deploying to Azure using GitHub Actions and Terraform Cloud

2024-02-19T00:00:00+00:00

Azure Function App Project

Overview

This project demonstrates the deployment of an Azure Function App using Terraform, Terraform Cloud and GitHub Actions for CI/CD. The full code can be found in my GitHub repo here.

Prerequisites

Before you begin, you’ll need to have the following:

An Azure subscription.
A GitHub account.
A Terraform Cloud account, with a workspace configured and mapped to this repository.
Azure CLI installed locally (for development and testing).
Your favorite IDE - I prefer Visual Studio Code with the below extensions installed
- GitHub Actions
- HashiCorp HCL
- HashiCorp Terraform

Configuration

Azure Subscription

Azure Service Principal: Create a service principal with Contributor access and configure it as a variable in Terraform Cloud. You can accomplish this in Azure Portal or via Azure CLI. There are many guides out there for this. A Contributor access is only appropriate for this tutorial. You will want to adhere to the Principle of Least Privilege and only assign the necessary role-based access controls (RBAC) to deploy code to the appropriate scope.

Terraform Cloud

Workspace Setup: Ensure your Terraform Cloud workspace is set up and linked to your GitHub repository containing the Terraform configuration.
Variables: Configure the necessary environment variables and Terraform variables in your Terraform Cloud workspace. This includes Azure credentials, resource naming, and any other configurations specific to your deployment.
State Storage: Terraform Cloud will automatically manage the state of your infrastructure, providing a secure and collaborative environment for your team.

GitHub Actions

Workflow Configuration: The .github/workflows directory contains the YAML files for GitHub Actions. These define the CI/CD pipeline. Changes are currently configured to only be manually deployed. This is accomplished with a workflow_dispatch trigger in the GitHub actions workflows plan-apply.yml and destroy.yml.
Secrets: Set up the required secrets in your GitHub repository variable and secrets settings. This should include access tokens for Terraform Cloud TF_API_TOKEN.

What exactly does CI/CD mean?

2024-02-01T00:00:00+00:00

I often see people confusing the terms Continuous Integration, Continuous Delivery and Continuous Deployment. Let’s break it down in simpler terms.

Continuous Integration (CI) involves developers merging their tested code into a common branch. Depending on the branching strategy used, this can be as often as multiple times a day, or a few times a week. The next phase is Continuous Delivery (CD). Here, we need to make sure to package the code in an artifact and store it somewhere - this is our release artifact. Finally, there is Continuous Deployment (CD). This takes Continuous Delivery (CD) a step further by sending our finished release artifact out into the real world for people to use. It’s all about making things smoother and faster!

Keep it Lean and Eliminate Waste

2023-12-30T00:00:00+00:00

"Change is the only constant." Heraclitus, Greek Philosopher

In the world of cloud computing, acknowledging and embracing change becomes not just a philosophy but a practical approach for staying competitive and efficiently utilizing the ever-evolving capabilities of the cloud.

“Keep it lean and eliminate waste” is a guiding principle across Lean, Agile, DevOps, and software development. In Lean, the focus is on minimizing unnecessary processes and maximizing value, ensuring efficiency. Agile methodologies emphasize iterative development, adapting to changes, and delivering functional software incrementally, aligning with the idea of keeping it lean. DevOps extends this concept to the entire software development lifecycle, promoting collaboration and automation to eliminate bottlenecks and enhance efficiency. In software development, the mantra encourages teams to cut out unnecessary features, streamline workflows, and stay agile to create products that meet user needs effectively. It’s a unifying philosophy that promotes efficiency and continual improvement in the dynamic landscape of software development.

Lean, Agile and DevOps represent systems to force you to stop pretending that you know more than you really do. We must accept predictable unpredictability, embrace uncertainty, “epistemic humility”.

Use data to create knowledge. Use Machine Learning (ML) and Artificial Intelligence (AI) for augmentation and efficiency, resilience and ultimately customer experience.

The Seven Pillars

With the aim of keeping at lean and eliminating waste, I present the Seven Pillars. These pillars provide a systematic and collaborative approach to work toward becoming lean and eliminating waste.

Guiding Principles

Guiding principles in the context of cloud computing and software development serve as a collective North Star for teams, offering a shared vision and a set of fundamental beliefs that guide decision-making and actions. These principles become the foundation upon which shared goals are built, creating a sense of “skin in the game” for all team members. The guiding principles for this project rest on two great books - Implementing Lean Software Development and The DevOps Handbook. Each of these provide their own guiding principles listed below.

1. Eliminate Waste
2. Build Quality In (“Poka Yoke”)
3. Create Knowledge (“Epistemic Humility”)
4. Defer Commitment
5. Deliver Fast
6. Respect People
7. Optimize the Whole

1. The Principles of Flow
2. The Principles of Feedback
3. The Principles of Continual Learning

When teams align around shared goals based on these guiding principles, it fosters a collaborative environment where everyone is working towards a common objective. This shared commitment not only enhances teamwork but also cultivates a collective responsibility for the success of the project or initiative.

Furthermore, the opportunity to define shared goals provides a chance to tie Objectives and Key Results (OKRs) together. By aligning individual and team OKRs with overarching guiding principles, there is a cohesive framework that ensures everyone is moving in the same direction. This alignment not only streamlines efforts but also enhances the clarity and purpose of each team member’s contribution.

In essence, the combination of guiding principles, a North Star vision, and shared goals creates a powerful synergy. It establishes a strong foundation for collaboration, encourages a sense of ownership, and allows teams to navigate the complex landscape of cloud computing and software development with a unified purpose.

Guiding Principles, Education & Training

This is where the rubber starts to meet the road, from theory to practice. In education and training for cloud computing and software development, “Where the rubber meets the road” emphasizes the crucial application of knowledge. Achieving alignment on guiding principles becomes paramount at this juncture. Instilling a shared understanding of the aforementioned guiding principles lays a solid foundation. It ensures that theoretical learning translates seamlessly into practical application, guiding individuals as they navigate the challenges of real-world scenarios in cloud computing and software development projects. This alignment not only encourages collaboration but also empowers teams to confidently apply principles when it matters most – in the practical implementation of their skills.

Visibility, Accountability, Feedback

The importance of visibility, accountability, and feedback cannot be overstated.

Visibility ensures that everyone has a clear understanding of the data, metrics, and insights driving decision-making. This transparency is crucial as it aligns actions with guiding principles and allows teams to address issues openly.

Accountability, underpinned by data, metrics, and insights from Machine Learning (ML) and Artificial Intelligence (AI), establishes a foundation of trust. Empowering individuals to take ownership of their roles and linking shared OKRs to guiding principles fosters a sense of responsibility.

And finally feedback mechanisms complete this triad, providing a continuous loop for improvement. The Guiding Principles Scorecard becomes a dynamic tool, incorporating feedback to adapt and refine principles over time.

This iterative process ensures that guiding principles remain relevant, responsive to changing dynamics, and effective in guiding decisions and actions in the ever-evolving landscape of cloud computing and software development. In essence, visibility, accountability, and feedback form a symbiotic relationship, creating a framework that not only guides but also adapts, ensuring the resilience and relevance of guiding principles.

Self-service, Automation

In the landscape of software development and cloud computing, the concept of “One Entry Door with Automated Blueprints” becomes a pivotal solution to address the challenges posed by the shift left, YBYR (you build it, you run it) approach, which can lead to cognitive overload and increased complexity.

As teams navigate the evolving responsibilities brought by this paradigm shift, the need for self-service and automation becomes paramount. The complexity of tasks, such as dealing with intake forms, approval boards, documentation, and adherence to various patterns across DevSecFinMLOps, can be overwhelming. The traditional manual processes not only create inefficiencies and waste but also hinder the swift implementation and delivery of solutions.

The introduction of a single entry door, embodied by a centralized portal like Spotify’s Backstage, or any tool that consolidates access, offers a solution. This self-service portal serves as a unified destination, eliminating the need for teams to navigate multiple portals for diverse sets of requirements. It streamlines the process, reduces cognitive load, and accelerates the lead time and cycle time for delivering solutions.

Crucially, this central portal goes beyond just providing access; it incorporates self-service automated architecture blueprints based on enterprise-approved patterns and requirements. This not only empowers teams to initiate projects with ease but also ensures compliance with guiding principles and accelerates the adoption of best practices.

In conclusion, the integration of a single entry door with automated blueprints aligns with the principles of visibility, accountability, and feedback. It not only simplifies the onboarding process for teams but also contributes to the efficiency and resilience of software development and cloud computing practices in the face of evolving responsibilities and technologies.

Building a Cloud-Agnostic Web Application

2023-12-28T00:00:00+00:00

One of the core business tenets is business agility - the ability to adapt to changing market conditions. Similarly, businesses must be agile and must ensure availability through business continuity best practices. Businesses either change or die. How does this affect cloud in particular?

In general, a modular, cloud-agnostic, application that follows the 12 factor methodology can provide business agility, continuity, scalability and cost efficiency. Below I demonstrate avoiding vendor lock-in while using PaaS with an array of architectures. Though similar principles apply to XaaS (read “any”-as-a-Service). A modern web application is built using a modern cross-platform programming language, such as ASP.NET 7 MVC, which can run on any system - Linux based or Windows based. One can either take a code-first or database-first approach. This choice will largely depend on the size of the application and enterprise or team complexity - smaller teams may benefit from code-first, but larger teams often must follow strict change management processes, which require a database-first approach. The below diagrams show a systematic approach to design such a modern web-app. More detail and sample code can be found in my GitHub repo found here: GitHub Platform Agnostic App.

Architecture Diagrams

Logical Architecture using MVC Pattern

Microsoft Azure Physical Architecture

Google Cloud Platform Architecture

CI/CD with GitHub Actions

GitHub Actions for Microsoft Azure

name: GitHub Actions for Azure

env:
  AZURE_WEBAPP_NAME: friendly-octo-giggle   # set this to your application's name
  AZURE_WEBAPP_PACKAGE_PATH: '.'      # set this to the path to your web app project, defaults to the repository root
  DOTNET_VERSION: '6.0.x'                 # set this to the .NET Core version to use

on: 
  workflow_dispatch:
        
jobs:
  build:
    runs-on: ubuntu-latest
    
    steps:
      - uses: actions/checkout@v4

      - name: Set up .NET Core
        uses: actions/setup-dotnet@v3
        with:
          dotnet-version: ${{ env.DOTNET_VERSION }}

      #substitute production appsettings entries to appsettings json file
      - name: App Settings Variable Substitution
        uses: microsoft/variable-substitution@v1
        with:
          files: './BethanysPieShop/appsettings.json'
        env:
          ConnectionStrings.BethanysPieShopDbContextConnection: ${{ secrets.AZURE_DB_CONNECTION_STRING }}
          
      - name: Build with dotnet
        run: dotnet build --configuration Release

      - name: dotnet publish
        run: dotnet publish -c Release -o ${{env.DOTNET_ROOT}}/myapp

      - name: Upload artifact for deployment job
        uses: actions/upload-artifact@v3
        with:
          name: .net-app
          path: ${{env.DOTNET_ROOT}}/myapp    

  deploy:
    runs-on: ubuntu-latest
    needs: build
    environment:
      name: 'production'
      url: ${{ steps.deploy-to-webapp.outputs.webapp-url }}

    steps:
      - name: Download artifact from build job
        uses: actions/download-artifact@v3
        with:
          name: .net-app
          
      - name: Deploy to Azure Web App
        id: deploy-to-webapp
        uses: azure/webapps-deploy@v2
        with:
          app-name: ${{ env.AZURE_WEBAPP_NAME }}
          publish-profile: ${{ secrets.AZURE_WEBAPP_PUBLISH_PROFILE }}
          package: ${{ env.AZURE_WEBAPP_PACKAGE_PATH }}

GitHub Actions for Google Cloud Platform

name: GitHub Actions for GCP

on: 
  workflow_dispatch:

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      # Substitute production appsettings entries to appsettings json file
      - name: App Settings Variable Substitution
        uses: microsoft/variable-substitution@v1
        with:
          files: './BethanysPieShop/appsettings.json'
        env:
          ConnectionStrings.BethanysPieShopDbContextConnection: ${{ secrets.GCP_DB_CONNECTION_STRING }}

      - id: 'auth'
        name: Authenticate to Google Cloud
        uses: 'google-github-actions/auth@v1'
        with:
          credentials_json: '${{ secrets.GCP_CREDENTIALS }}'
          
      - name: 'Set up Google Cloud SDK'
        uses: 'google-github-actions/setup-gcloud@v1'
          
      - name: Deploy to App Engine
        run: gcloud app deploy ./BethanysPieShop/app.yaml

DevOps Quality Gates to the Cloud

2023-12-20T00:00:00+00:00

The Gates to the Cloud (Created with DALL-E 3!)

Quality gating is a practice in software engineering that involves setting up checkpoints or gates to make sure the application code is ready for deployment. One of the core messages of Keep it Lean and Eliminate Waste is the concept of “epistemic humility” - that we need to accept that quality of our knowledge about a specific component in delivery is poor and we need to plan with that in mind. As a result, there are two separate propositions that have to be evaluated independently:

A statement and
A test.

For example:

The Statement	The Test
“The grass is green.”	“It is certain that this is true.”
“This code behaves correctly.”	“It is certain that this is true.”
“The release is ready.”	“It is certain that this is true.”

This can be accomplished by automated testing, or gating, in a DevOps pipeline. The “X” intersection of the DevOps infinite loop is where this happens.

The Gates to the Cloud

Source Code Version Control

Use repositories for this. Without this you can’t track any issues found in any scans further down the pipeline. The 12 factor methodology outlines this too. We can use Azure DevOps Repos and GitHub for this purpose.

Artifact Version Control

It is very important to systematically manage and track changes to binary artifacts, such as open source packages, Docker images, VM images and build pipeline artifacts. A very good tool for something like this is Artifactory. It plays a crucial role in DevOps quality gating by providing a centralized and secure repository for binary artifacts.

Optimal Branching Strategy

The best branching strategy for commiting and managing code repositories will depend on your team’s specific make-up and requirements. See my post on DevOps Branching Strategies for help with this.

Unit Testing

Unit testing is essential to shifting testing left so issues are identified before they’re even deployed. You might be wondering what an appropriate Unit Test percentage coverage is. Well the answer is - IT DEPENDS. This will again depend on your team’s specific make-up on team and/or developer maturity. I found this discussion on StackOverflow nails this. In simple terms, a random number percentage (%) does not equal good code coverage. Tools such as SonarQube can help here. Code quality can also be increased with artificial intelligence (AI) tools such as GitHub Copilot.

Integration Testing

Integration testing ensures that all components work together as intended in any environment, which should be carbon copies of one another for predictability, stability and reliability. For example, in a web application scenario, integration testing ensures that user interfaces interact correctly with a RESTful API, which, in turn, communicates seamlessly with the underlying database. Testing might involve validating data flow, error handling, and overall system behavior to guarantee cohesion and reliability across the entire stack.

Performance and Load Testing

Here we need to make sure to test throughput of an application. This ivolves testing CPU load, memory (RAM) load, response times and service level agreements (SLAs) are met. For example, evaluating how the web application performs under a specified number of concurrent users, how the REST API responds to increased requests, and how the database manages a higher volume of transactions. This testing ensures that the different components can sustain optimal performance and reliability under varying load conditions, helping identify and address potential bottlenecks. Here we can use tools such as Apache JMeter, Loadrunner and Grafana k6.

Vulnerability Testing

Here we introduce DevSecOps, which integrates security practices seamlessly into the DevOps pipeline. It encompasses three major areas outlined in the table below.

DevSecOps Area	Description	Tooling
Dependency checks / Open Source Scanning	Examining and securing external software dependencies, such as code from third party sources	Artifactory, OWASP Dependency-Check, National Vulnerability Database (NVD)
Static Application Security Testing (SAST)	Identifying vulnerabilities in the source code	Checkmarx, Fortify, GitHub CodeQL
Dynamic Application Security Testing (DAST)	Assessing security during runtime	OWASP ZAP, Burp Suite, Acunetix

Rollbacks and Zero-downtime Releases

Automated rollbacks are crucial for maintaining system stability and resilience and enable zero-downtime releases. Techniques such as rolling updates, blue/green deployments, canary releases, and feature flags contribute to the success of automated rollbacks.

Rolling updates involve gradually replacing instances of the application with new versions, minimizing downtime - see Kubernetes - Performing a Rolling Update.
Blue/green deployments enable switching between two environments (blue for the existing version, green for the new one) to facilitate seamless rollbacks.
Canary releases deploy changes gradually to a subset of users for early validation.
Feature flags allow toggling specific features on or off, providing the ability to quickly disable problematic functionalities - see tools like LaunchDarkly or Flagsmith.

Automated Change Orders

This one is pretty simple - avoid manual committees and boards for approvals. This creates waste and decreases efficiency, speed and does not create knowledge through testing.

Final Thoughts

We discussed the concept of DevOps gates in the context of a continuous integration and continuous deployment (CI/CD) pipeline. We also explored the importance of implementing gates to ensure quality, security, and compliance at different stages of the development process. Above, I also provided a comprehensive list of DevOps gating, to enhance the efficiency and reliability of the software delivery pipeline.