Blue/Green Deployment Pipeline

A robust CI/CD pipeline implementing zero-downtime deployments for containerized applications using AWS ECS, CodePipeline, and Terraform.

AWS ECS Containers CI/CD Infrastructure as Code Zero-Downtime Terraform

Project Overview

In this project, I built a robust CI/CD pipeline implementing Blue/Green deployments for containerized applications using AWS ECS, CodePipeline, and Terraform. The infrastructure as code approach allows for repeatable, version-controlled deployments while the Blue/Green strategy enables zero-downtime releases with instant rollback capability.

This architecture solves a common challenge in modern development teams: how to deploy frequently with minimal risk. By creating two identical environments and seamlessly shifting traffic between them, we can validate new versions before exposure to all users.

Blue/Green Deployment Architecture Diagram

Technical Architecture

Core AWS Services
  • Networking: Custom VPC with public/private subnets across 3 AZs
  • Compute: ECS with Fargate for serverless container execution
  • Database: Aurora PostgreSQL for scalable, highly-available persistence
  • CI/CD: CodePipeline, CodeBuild, and CodeDeploy for the deployment workflow
  • Load Balancing: Application Load Balancer with Blue/Green target groups
  • Registry: ECR for storing container images
  • Security: KMS for encryption, IAM roles with least privilege permissions

Project Demonstration Video

Full Walkthrough

Complete walkthrough of the Blue/Green pipeline, Terraform configuration, and AWS resources

Deployment Flow

Commit Detection

Changes pushed to GitLab trigger the pipeline

Build Phase

CodeBuild compiles the app, packages it into a container, and pushes it to ECR

Deployment Preparation

CodeDeploy creates a new task definition with the updated image

Green Environment Creation

New tasks are deployed to a separate target group

Testing

The new version is verified through the test listener

Traffic Shifting

Production traffic gradually shifts from Blue to Green environment

Cleanup

After successful deployment, the old environment is terminated

Key Challenges & Solutions

Challenge 1: KMS Key Permissions for CodePipeline
The pipeline initially failed with an error indicating that CodePipeline couldn't use the KMS key for artifact encryption: "User is not authorized to perform: kms:GenerateDataKey on resource"
Solution

I enhanced the KMS key policy to explicitly grant the necessary permissions to the CodePipeline, CodeBuild, and CodeDeploy roles, ensuring proper encryption of artifacts throughout the pipeline.

Challenge 2: Terraform Module Dependencies
Managing resource dependencies in Terraform was challenging, especially when IAM roles created in one module needed to be referenced by resources in another module.
Solution

Rather than using data sources to look up resources that might not exist yet, I updated my module interfaces to accept role ARNs as variables, creating a clear dependency chain between modules.

Challenge 3: Buildspec YAML Syntax Issues
The pipeline was failing with YAML syntax errors in the buildspec.yml file, particularly when trying to create the appspec.yaml file dynamically during the build.
Solution

I implemented a more robust approach using Terraform's yamlencode function to programmatically generate valid YAML, eliminating syntax and escaping issues.

Results & Impact

The pipeline now successfully performs Blue/Green deployments for our Spring Boot application with these benefits:

Zero-downtime deployments

Users experience no interruption during updates

Instant rollback capability

If an issue is detected, traffic can immediately switch back

Reduced deployment risk

New versions are validated before receiving production traffic

Infrastructure as code

All resources are defined in Terraform, enabling consistent environments

Automation

The entire process from commit to production requires no manual intervention

Technical Implementation Details

Terraform Structure

I organized the Terraform configuration into modules for better maintainability:

blue-green-ecs-pipeline/
├── main.tf
├── variables.tf
├── outputs.tf
├── modules/
│   ├── networking/
│   ├── security/
│   ├── database/
│   ├── ecr/
│   ├── load_balancer/
│   ├── ecs/
│   └── codepipeline/

Each module encapsulates a specific part of the infrastructure, with clear interfaces between them.

Key Configuration Insights

The critical part of enabling Blue/Green deployments is properly configuring the load balancer and ECS service:

Creating two target groups (Blue and Green) that the CodeDeploy service can alternate between Setting up the ECS service with a CODE_DEPLOY deployment controller:

resource "aws_ecs_service" "main" {
  # Other configurations...
  
  deployment_controller {
    type = "CODE_DEPLOY"
  }
  
  lifecycle {
    ignore_changes = [
      task_definition,
      load_balancer
    ]
  }
}

Configuring CodeDeploy to manage the traffic shifting:

resource "aws_codedeploy_deployment_group" "main" {
  # Other configurations...
  
  deployment_style {
    deployment_option = "WITH_TRAFFIC_CONTROL"
    deployment_type   = "BLUE_GREEN"
  }
  
  blue_green_deployment_config {
    # Traffic shifting configuration
  }
}

Lessons Learned

This project taught me several valuable lessons:

  • Infrastructure design matters: Proper segmentation of resources between public and private subnets significantly enhances security
  • Dependency management is crucial: Clear resource dependencies in Terraform avoid race conditions and improve reliability
  • Proper IAM configuration is essential: Following the principle of least privilege reduces security risks
  • Validation before traffic shifting: The Blue/Green approach helps catch issues before they impact users

Future Enhancements

While the current implementation meets our immediate needs, I've identified several potential improvements:

Canary Deployments

Implement more granular traffic shifting (e.g., 10% increments)

Automated Testing

Add integration and load tests during the deployment process

CloudWatch Alarms

Configure automatic rollbacks based on error rates or latency

Cross-region Redundancy

Extend the architecture to multiple AWS regions for disaster recovery

Cost Optimization

Implement auto-scaling for ECS tasks based on demand patterns