spring-petclinic-on-aws-eks-infrastructure-engineer-journey

Introduction
As part of the DevOps Micro Internship (DMI) Cohort 2, I participated in a collaborative project to deploy the Spring PetClinic Microservices application on Amazon Elastic Kubernetes Service (EKS).
The objective was not simply to deploy an application, but to simulate a real-world cloud engineering environment where team members owned different components of the platform and worked together to deliver a production-style deployment.
In this project, I served as both the Infrastructure Engineer and Team Lead, responsible for designing, provisioning, securing, and supporting the AWS infrastructure that hosted the application.
This article documents the architecture, responsibilities, challenges, lessons learned, and key takeaways from the project.
Project Overview
Spring PetClinic is a microservices-based application composed of multiple services including:
API Gateway
Config Server
Discovery Server
Customers Service
Vets Service
Visits Service
GenAI Service
Multiple MySQL databases
The application was deployed on Amazon EKS using Kubernetes, with AWS services providing networking, storage, security, and load balancing capabilities.
My Responsibilities
As the Infrastructure Engineer, my responsibilities included:
Provisioning AWS infrastructure
Deploying and managing Amazon EKS
Managing worker nodes
Configuring IAM roles and access controls
Implementing OIDC integration
Installing Kubernetes add-ons
Configuring AWS Load Balancer Controller
Managing Kubernetes networking
Enabling persistent storage using EBS CSI Driver
Supporting deployment and database teams
Troubleshooting infrastructure issues
In addition, I served as Team Lead, helping coordinate activities, support teammates, unblock technical issues, and ensure progress across different project areas.
Architecture Overview
The deployment architecture followed a cloud-native microservices model:
Users
↓
AWS Application Load Balancer
↓
Kubernetes Ingress
↓
API Gateway
↓
Microservices
↓
MySQL Databases
↓
Amazon EBS Volumes
AWS services used included:
Amazon EKS
Amazon EC2
Amazon ECR
Amazon VPC
IAM
Application Load Balancer
EBS CSI Driver
Route Tables
NAT Gateway
Infrastructure Provisioning
Infrastructure was provisioned using Terraform.
Key resources included:
VPC
Public Subnets
Private Subnets
Internet Gateway
NAT Gateway
Route Tables
EKS Cluster
Managed Node Groups
IAM Roles
This approach ensured repeatability, version control, and Infrastructure as Code best practices.
Major Challenges and Troubleshooting
1. AWS Load Balancer Controller Failure
One of the most significant issues involved the AWS Load Balancer Controller.
The controller failed with AccessDenied errors and was unable to provision Application Load Balancers.
After investigation, I discovered that the IAM trust relationship referenced an incorrect OIDC provider.
By correcting the trust policy and validating the cluster's OIDC configuration, the controller was successfully restored.
2. EBS CSI Driver CrashLoopBackOff
The EBS CSI Driver initially entered a CrashLoopBackOff state.
The root cause was insufficient IAM permissions.
After attaching the correct AWS-managed policy and validating the service account configuration, persistent volume provisioning began working correctly.
3. Node Capacity and Worker Recovery
As additional services were deployed, a single worker node became resource constrained.
The cluster experienced scheduling pressure and reduced capacity.
To resolve this:
Worker node capacity was increased
Additional nodes were added
Unhealthy instances were replaced through Auto Scaling
The cluster subsequently stabilised.
4. NAT Gateway Issues
At one stage, the NAT Gateway was removed for cost reduction purposes.
This caused worker nodes to lose outbound internet access, resulting in image pull failures and connectivity issues.
Recreating the NAT Gateway restored normal cluster operations.
My Responsibilities as Infrastructure Engineer
My primary responsibility was ensuring that the cloud infrastructure was available, secure, scalable, and ready for application deployment.
Infrastructure Provisioning with Terraform
I provisioned the AWS infrastructure using Terraform.
This included:
VPC creation
Public and private subnets
Route tables
Internet Gateway
Security groups
EKS cluster resources
Worker node groups
Using Infrastructure as Code allowed the environment to be reproducible and version-controlled.
Lessons Learned
This project reinforced several important lessons:
Infrastructure Is More Than Terraform
Building infrastructure is only one part of the job. Maintaining reliability, troubleshooting issues, and supporting users are equally important.
IAM Is Critical
Many AWS-related failures ultimately came down to permissions, trust relationships, and access configuration.
Documentation Matters
Having clear documentation dramatically reduced troubleshooting time and improved collaboration.
Communication Is a Technical Skill
As Team Lead, I learned that project success depends heavily on communication, coordination, and ownership.
Cost Optimisation Cannot Be Ignored
Cloud resources are powerful, but they must be managed carefully. Monitoring resource usage and understanding AWS pricing became a valuable lesson throughout this project.
Final Outcome
The project successfully delivered:
✅ Amazon EKS cluster deployment
✅ Kubernetes-based microservices platform
✅ Persistent database storage
✅ AWS Load Balancer integration
✅ Secure IAM and OIDC configuration
✅ Infrastructure as Code using Terraform
✅ Multi-node Kubernetes environment
✅ Real-world troubleshooting experience
Most importantly, the project provided hands-on exposure to the type of challenges faced by Cloud and DevOps Engineers in production environments.
Closing Thoughts
This project was one of the most valuable practical learning experiences in my cloud engineering journey.
It strengthened my understanding of AWS, Kubernetes, Terraform, networking, IAM, troubleshooting, and team collaboration.
While there were challenges along the way, each issue provided an opportunity to learn and improve.
I am grateful to the DMI mentors, co-mentors, and teammates who contributed to the experience.
🚀 DMI Cohort 3 is starting on 27 June 2026. If you are interested in gaining hands-on DevOps experience, you can apply here:
https://docs.google.com/forms/d/e/1FAIpQLSel7ai7nyb0P1qLW4vEyfB\_nEsD4lUF1XG88vmAaFGBOb6hPA/viewform

