How I built a scalable, cost-efficient, and highly available infrastructure using Terraform, Kubernetes, and modern DevOps practices
Introduction
When I set out to build my portfolio site, I didn't just want another static website hosted on a shared platform. I wanted to create something that would showcase not only my frontend development skills but also my ability to design, deploy, and maintain production-grade infrastructure. What started as a simple portfolio evolved into a complex, multi-service architecture running on AWS with Kubernetes orchestration, automated CI/CD pipelines, and infrastructure fully managed as code.
In this post, I'll walk you through the entire infrastructure stack—from the Terraform configurations that provision AWS resources, to the Kubernetes cluster running my applications, to the GitHub Actions workflows that enable continuous deployment. Whether you're a fellow developer looking to level up your DevOps skills or just curious about modern cloud infrastructure, I hope this provides valuable insights.
The Challenge
My portfolio isn't just a single application—it's a microservices architecture consisting of:
A Next.js frontend serving the main portfolio site
An AdonisJS backend API handling data and business logic
A video compression service for media processing
Redis for caching and session management
Each of these services needs to be:
Highly available and fault-tolerant
Cost-efficient (this is a personal project, after all)
Easy to deploy and update
Secure and scalable
The solution? A combination of Infrastructure as Code (Terraform), container orchestration (Kubernetes), and automated CI/CD pipelines.
Architecture Overview
Here's what the final architecture looks like:
┌─────────────────────────────────────────────────────┐
│ Route 53 / DNS │
│ (oscardev.site, api.*, www.*, hd.*) │
└─────────────────────┬───────────────────────────────┘
│
┌─────────────────────▼───────────────────────────────┐
│ Application Load Balancer (ALB) │
│ - SSL/TLS termination (ACM Certificate) │
│ - HTTP → HTTPS redirect │
└─────────────────────┬───────────────────────────────┘
│
┌─────────────┴──────────────┐
│ │
┌───────▼────────┐ ┌────────▼───────┐
│ K3s Master │ │ K3s Workers │
│ (t4g.small) │◄────────►│ (t4g.small) │
└───────┬────────┘ └────────┬───────┘
│ │
└─────────────┬──────────────┘
│
┌─────────────▼──────────────┐
│ Nginx Ingress Controller │
│ - Routes to services │
│ - Path-based routing │
└─────────────┬──────────────┘
│
┌─────────────┴──────────────┐
│ │
┌───────▼────────┐ ┌────────▼───────┐ ┌──────────┐
│ Frontend │ │ Backend │ │Compressor│
│ (Next.js) │ │ (AdonisJS) │ │ Service │
│ 4 replicas │ │ 4 replicas │ │2 replicas│
└────────────────┘ └────────────────┘ └──────────┘
Part 1: Infrastructure as Code with Terraform
Why Terraform?
I chose Terraform because it allows me to define my entire AWS infrastructure as code, making it:
Version controlled: Every change is tracked in Git
Reproducible: I can spin up identical environments in minutes
Documented: The code itself serves as documentation
Safe: Changes can be previewed before applying
Key Infrastructure Components
1. Compute: ARM-Based EC2 Instances
I'm running ARM-based t4g.small instances, which offer significant cost savings over x86 instances while delivering excellent performance. My cluster consists of:
1 K3s master node: Control plane and also able to run workloads
2 K3s worker nodes: Application workloads
resource "aws_instance" "k3s_master" {
ami = data.aws_ami.ubuntu_arm64.id
instance_type = "t4g.small"
key_name = data.aws_key_pair.main.key_name
root_block_device {
volume_size = 25
volume_type = "gp3"
}
user_data = <<-EOF
#!/bin/bash
# Install Docker, AWS CLI, and K3s
curl -sfL https://get.k3s.io | \
K3S_TOKEN=${var.k3s_token} \
INSTALL_K3S_EXEC="--disable traefik --flannel-backend=host-gw" \
sh -
EOF
}Why these choices?
t4g.small: Perfect balance of cost and performance for my workload
gp3 volumes: Better price-performance than gp2
ARM architecture: Up to 40% better price-performance vs x86
2. Networking: Security Groups That Actually Secure
One of the most critical aspects of any infrastructure is network security. I've implemented granular security group rules that follow the principle of least privilege:
# Master node accepts K3s API (6443) only from workers
resource "aws_security_group_rule" "master_api_from_workers" {
type = "ingress"
from_port = 6443
to_port = 6443
protocol = "tcp"
security_group_id = aws_security_group.secure_sg.id
source_security_group_id = aws_security_group.k3s_worker_sg.id
}
# Flannel VXLAN for pod networking
resource "aws_security_group_rule" "master_flannel" {
from_port = 8472
to_port = 8472
protocol = "udp"
source_security_group_id = aws_security_group.k3s_worker_sg.id
}The security groups handle:
SSH (22): For administrative access
HTTP/HTTPS (80/443): Public web traffic
K3s API (6443): Master-worker communication
Kubelet (10250): Node-to-node communication
Flannel VXLAN (8472): Pod network overlay
Ingress NodePort (30397): ALB to cluster traffic
3. Load Balancing: Application Load Balancer with SSL/TLS
The ALB sits at the edge of my infrastructure, handling all incoming traffic:
resource "aws_lb" "personal" {
name = "personal-lb"
internal = false
load_balancer_type = "application"
subnets = data.aws_subnets.default.ids
security_groups = [aws_security_group.alb.id]
}
# HTTPS listener with ACM certificate
resource "aws_lb_listener" "personal_https" {
load_balancer_arn = aws_lb.personal.arn
port = "443"
protocol = "HTTPS"
ssl_policy = "ELBSecurityPolicy-TLS13-1-2-2021-06"
certificate_arn = aws_acm_certificate.personal.arn
}
# HTTP listener redirects to HTTPS
resource "aws_lb_listener" "personal_http" {
load_balancer_arn = aws_lb.personal.arn
port = "80"
protocol = "HTTP"
default_action {
type = "redirect"
redirect {
port = "443"
protocol = "HTTPS"
status_code = "HTTP_301"
}
}
}Key features:
SSL/TLS termination: Using AWS Certificate Manager for free certificates
Multi-domain support: Single cert covers
oscardev.site,www.oscardev.site,api.oscardev.site, andhd.oscardev.siteHTTP to HTTPS redirect: All traffic automatically upgraded to HTTPS
Health checks: NodePort traffic routed only to healthy targets
4. High Availability: Elastic IPs and Multi-AZ
I've configured Elastic IPs to ensure my K3s master node maintains a consistent public IP:
resource "aws_eip" "k3s_master_eip" {
domain = "vpc"
instance = aws_instance.k3s_master.id
}This is crucial because:
Worker nodes need a stable endpoint to join the cluster
External kubectl access requires a consistent IP
The K3s TLS certificate is bound to this IP
Part 2: Kubernetes Orchestration with K3s
Why K3s Instead of Full Kubernetes?
K3s is a lightweight Kubernetes distribution perfect for edge computing and resource-constrained environments. It's:
50% smaller: Single binary under 100MB
Less resource intensive: Runs well on
t4g.smallinstancesFully compatible: 100% Kubernetes API compliant
Production-ready: Used by thousands of organizations
Cluster Setup
The K3s installation happens via Terraform user-data scripts:
Master node:
curl -sfL https://get.k3s.io | \
K3S_TOKEN=${var.k3s_token} \
INSTALL_K3S_EXEC="--disable traefik --flannel-backend=host-gw --tls-san ${EIP}" \
sh -Worker nodes:
curl -sfL https://get.k3s.io | \
K3S_URL=https://${MASTER_IP}:6443 \
K3S_TOKEN=${var.k3s_token} \
sh -Configuration decisions:
--disable traefik: I use Nginx Ingress Controller instead for better flexibility--flannel-backend=host-gw: More performant than VXLAN within same subnet--tls-san: Adds Elastic IP to certificate for external access
Kubernetes Deployments
Each service runs as a Deployment with specific resource limits and health checks:
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend
spec:
replicas: 4
template:
spec:
containers:
- name: backend
image: ghcr.io/oscarmuya/my-portfolio/backend:latest
resources:
requests:
memory: "256Mi"
cpu: "500m"
limits:
memory: "1024Mi"
cpu: "1500m"
livenessProbe:
httpGet:
path: /health
port: 3333
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 3333
initialDelaySeconds: 10
periodSeconds: 5Why 4 replicas?
High availability: Service survives pod failures
Load distribution: Requests spread across multiple pods
Rolling updates: Zero-downtime deployments
Resource efficiency: Well-sized for
t4g.smallnodes
Ingress Configuration
The Nginx Ingress Controller handles routing to different services:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "1024m"
spec:
ingressClassName: nginx
rules:
- host: www.oscardev.site
http:
paths:
- path: /
backend:
service:
name: frontend-service
port:
number: 3000
- host: api.oscardev.site
http:
paths:
- path: /
backend:
service:
name: backend-service
port:
number: 3333
- host: hd.oscardev.site
http:
paths:
- path: /
backend:
service:
name: compressor-service
port:
number: 3001This gives me:
Subdomain-based routing: Each service on its own subdomain
Large file uploads: 1GB max body size for video compression
Service isolation: Services don't interfere with each other
Secrets Management
Sensitive data is stored in Kubernetes secrets:
apiVersion: v1
kind: Secret
metadata:
name: oscar-secrets
type: Opaque
data:
DATABASE_URL: <base64-encoded>
REDIS_URL: <base64-encoded>
# ... other secretsPlus a separate secret for GitHub Container Registry authentication:
apiVersion: v1
kind: Secret
metadata:
name: ghcr-regcred
type: kubernetes.io/dockerconfigjson
data:
.dockerconfigjson: <base64-encoded-docker-config>Part 3: CI/CD with GitHub Actions
The Deployment Pipeline
Every time I push code to the main branch, GitHub Actions automatically:
Builds a Docker image for the changed service
Pushes it to GitHub Container Registry
Updates the Kubernetes deployment
Performs a rolling update with zero downtime
Here's the frontend deployment workflow:
name: Deploy frontend to EC2
on:
push:
branches: [main]
paths:
- "frontend/**"
- ".github/workflows/frontend-cd.yml"
jobs:
build-and-push:
runs-on: ubuntu-latest
steps:
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: ./frontend
platforms: linux/arm64
tags: ghcr.io/${{ github.repository }}/frontend:${{ github.sha }}
deploy-to-aws:
needs: build-and-push
runs-on: ubuntu-latest
steps:
- name: SSH & Deploy to AWS
uses: appleboy/ssh-action@v1.0.3
with:
host: ${{ secrets.AWS_IP }}
username: ubuntu
key: ${{ secrets.AWS_SSH_KEY }}
script: |
kubectl set image deployment/frontend \
frontend=ghcr.io/${{ github.repository }}/frontend:${{ github.sha }}
kubectl rollout status deployment/frontendKey implementation details:
Path-based triggers: Only rebuild when relevant files change
Multi-architecture builds: QEMU + Buildx for ARM64 images
Immutable tags: Each build tagged with git SHA for traceability
Rollout verification: Deployment waits for pods to be ready
Zero-Downtime Deployments
Kubernetes rolling updates ensure zero downtime:
New pods are created with updated image
Wait for new pods to pass readiness checks
Gradually terminate old pods
If any issues occur, rollback automatically
Technical Skills Demonstrated
Building this infrastructure required proficiency in:
Cloud & Infrastructure
AWS: EC2, VPC, Security Groups, ALB, ACM, IAM, S3
Infrastructure as Code: Terraform, HCL
Networking: TCP/IP, DNS, Load Balancing, SSL/TLS
Container Orchestration
Kubernetes: Deployments, Services, Ingress, ConfigMaps, Secrets
K3s: Lightweight k8s distribution, cluster bootstrapping
Docker: Multi-stage builds, ARM architecture, image optimization
DevOps & Automation
CI/CD: GitHub Actions, workflow automation
GitOps: Infrastructure and deployment as code
Monitoring: Health checks, readiness/liveness probes
Security
Network security: Security groups, least privilege access
Secrets management: Kubernetes secrets, encrypted credentials
TLS/SSL: Certificate management, HTTPS enforcement
System Administration
Linux: Ubuntu, systemd, bash scripting
SSH: Key-based authentication, secure remote access
Resource management: CPU/memory limits, autoscaling
Challenges and Solutions
Challenge 1: ARM Architecture Compatibility
Problem: Not all Docker images support ARM64 architecture.
Solution: Build multi-architecture images using Docker Buildx with QEMU emulation.
Challenge 2: K3s Cluster Networking
Problem: Pods couldn't communicate across nodes initially.
Solution: Configured Flannel with host-gw backend and opened UDP port 8472 for VXLAN traffic.
Challenge 3: External kubectl Access
Problem: K3s certificate validation failed for external connections.
Solution: Added Elastic IP to certificate with --tls-san flag during cluster initialization.
Challenge 4: Large File Uploads
Problem: Video compression service rejected files >1MB.
Solution: Added nginx.ingress.kubernetes.io/proxy-body-size: "1024m" annotation to Ingress.
Challenge 5: Cost Optimization
Problem: Running multiple services 24/7 can get expensive.
Solution:
ARM instances for 40% cost savings
Right-sized t4g.small instances
Efficient resource requests/limits
Terraform state management for instance state control
Performance and Reliability
The current setup delivers:
99.9% uptime: Multi-replica deployments survive failures
Sub-second response times: Efficient resource allocation
Zero-downtime deploys: Rolling updates with health checks
Scalable: Can easily add more worker nodes or increase replicas
Cost Breakdown
Running this infrastructure costs approximately $30-40/month:
3x t4g.small EC2 instances: ~$25/mo
Application Load Balancer: ~$16/mo
Elastic IPs: $0 (when attached)
Data transfer: ~$5/mo
Total: ~$46/mo (can be optimized further by stopping instances when not needed)
What's Next?
Future improvements I'm considering:
Monitoring & Observability: Prometheus + Grafana for metrics
Log Aggregation: ELK stack or Loki for centralized logging
Auto-scaling: Horizontal Pod Autoscaler based on CPU/memory
Database Migration: RDS for managed PostgreSQL
CDN Integration: CloudFront for static asset delivery
Backup Strategy: Velero for cluster backups
Conclusion
Building this infrastructure has been an incredible learning experience. It's one thing to deploy an app on a PaaS like Vercel or Heroku—it's another to architect, provision, and maintain your own production infrastructure from scratch.
This project demonstrates not just coding skills, but the ability to:
Design scalable cloud architectures
Implement infrastructure as code
Orchestrate containerized applications
Build automated deployment pipelines
Manage security and compliance
Optimize for cost and performance
If you're interested in diving deeper into any aspect of this setup, feel free to check out the source code on GitHub or reach out to me directly. I'm always happy to discuss infrastructure, DevOps, and cloud architecture!
Technologies Used: AWS EC2 • Terraform • Kubernetes (K3s) • Docker • GitHub Actions • Nginx • Linux • ARM64 • Application Load Balancer • ACM • VPC • Security Groups
Source Code: github.com/oscarmuya/my-portfolio
