A detailed walkthroughA detailed walkthrough of deploying Spring PetClinic Microservices on AWS EKS using Terraform, K8s

Introduction

As part of the DevOps Micro Internship (DMI) Cohort 2, I participated in a collaborative project to deploy the Spring PetClinic Microservices application on Amazon Elastic Kubernetes Service (EKS).

The objective was not simply to deploy an application, but to simulate a real-world cloud engineering environment where team members owned different components of the platform and worked together to deliver a production-style deployment.

In this project, I served as both the Infrastructure Engineer and Team Lead, responsible for designing, provisioning, securing, and supporting the AWS infrastructure that hosted the application.

This article documents the architecture, responsibilities, challenges, lessons learned, and key takeaways from the project.

Project Overview

Spring PetClinic is a microservices-based application composed of multiple services including:

API Gateway
Config Server
Discovery Server
Customers Service
Vets Service
Visits Service
GenAI Service
Multiple MySQL databases

The application was deployed on Amazon EKS using Kubernetes, with AWS services providing networking, storage, security, and load balancing capabilities.

My Responsibilities

As the Infrastructure Engineer, my responsibilities included:

Provisioning AWS infrastructure
Deploying and managing Amazon EKS
Managing worker nodes
Configuring IAM roles and access controls
Implementing OIDC integration
Installing Kubernetes add-ons
Configuring AWS Load Balancer Controller
Managing Kubernetes networking
Enabling persistent storage using EBS CSI Driver
Supporting deployment and database teams
Troubleshooting infrastructure issues

In addition, I served as Team Lead, helping coordinate activities, support teammates, unblock technical issues, and ensure progress across different project areas.

Architecture Overview

The deployment architecture followed a cloud-native microservices model:

Users

↓

AWS Application Load Balancer

↓

Kubernetes Ingress

↓

API Gateway

↓

Microservices

↓

MySQL Databases

↓

Amazon EBS Volumes

AWS services used included:

Amazon EKS
Amazon EC2
Amazon ECR
Amazon VPC
IAM
Application Load Balancer
EBS CSI Driver
Route Tables
NAT Gateway

Infrastructure Provisioning

Infrastructure was provisioned using Terraform.

Key resources included:

VPC
Public Subnets
Private Subnets
Internet Gateway
NAT Gateway
Route Tables
EKS Cluster
Managed Node Groups
IAM Roles

This approach ensured repeatability, version control, and Infrastructure as Code best practices.

Major Challenges and Troubleshooting

1. AWS Load Balancer Controller Failure

One of the most significant issues involved the AWS Load Balancer Controller.

The controller failed with AccessDenied errors and was unable to provision Application Load Balancers.

After investigation, I discovered that the IAM trust relationship referenced an incorrect OIDC provider.

By correcting the trust policy and validating the cluster's OIDC configuration, the controller was successfully restored.

2. EBS CSI Driver CrashLoopBackOff

The EBS CSI Driver initially entered a CrashLoopBackOff state.

The root cause was insufficient IAM permissions.

After attaching the correct AWS-managed policy and validating the service account configuration, persistent volume provisioning began working correctly.

3. Node Capacity and Worker Recovery

As additional services were deployed, a single worker node became resource constrained.

The cluster experienced scheduling pressure and reduced capacity.

To resolve this:

Worker node capacity was increased
Additional nodes were added
Unhealthy instances were replaced through Auto Scaling

The cluster subsequently stabilised.

4. NAT Gateway Issues

At one stage, the NAT Gateway was removed for cost reduction purposes.

This caused worker nodes to lose outbound internet access, resulting in image pull failures and connectivity issues.

Recreating the NAT Gateway restored normal cluster operations.

My Responsibilities as Infrastructure Engineer

My primary responsibility was ensuring that the cloud infrastructure was available, secure, scalable, and ready for application deployment.

Infrastructure Provisioning with Terraform

I provisioned the AWS infrastructure using Terraform.

This included:

VPC creation
Public and private subnets
Route tables
Internet Gateway
Security groups
EKS cluster resources
Worker node groups

Using Infrastructure as Code allowed the environment to be reproducible and version-controlled.

Lessons Learned

This project reinforced several important lessons:

Infrastructure Is More Than Terraform

Building infrastructure is only one part of the job. Maintaining reliability, troubleshooting issues, and supporting users are equally important.

IAM Is Critical

Many AWS-related failures ultimately came down to permissions, trust relationships, and access configuration.

Documentation Matters

Having clear documentation dramatically reduced troubleshooting time and improved collaboration.

Communication Is a Technical Skill

As Team Lead, I learned that project success depends heavily on communication, coordination, and ownership.

Cost Optimisation Cannot Be Ignored

Cloud resources are powerful, but they must be managed carefully. Monitoring resource usage and understanding AWS pricing became a valuable lesson throughout this project.

Final Outcome

The project successfully delivered:

✅ Amazon EKS cluster deployment

✅ Kubernetes-based microservices platform

✅ Persistent database storage

✅ AWS Load Balancer integration

✅ Secure IAM and OIDC configuration

✅ Infrastructure as Code using Terraform

✅ Multi-node Kubernetes environment

✅ Real-world troubleshooting experience

Most importantly, the project provided hands-on exposure to the type of challenges faced by Cloud and DevOps Engineers in production environments.

Closing Thoughts

This project was one of the most valuable practical learning experiences in my cloud engineering journey.

It strengthened my understanding of AWS, Kubernetes, Terraform, networking, IAM, troubleshooting, and team collaboration.

While there were challenges along the way, each issue provided an opportunity to learn and improve.

I am grateful to the DMI mentors, co-mentors, and teammates who contributed to the experience.

🚀 DMI Cohort 3 is starting on 27 June 2026. If you are interested in gaining hands-on DevOps experience, you can apply here:

https://docs.google.com/forms/d/e/1FAIpQLSel7ai7nyb0P1qLW4vEyfB\_nEsD4lUF1XG88vmAaFGBOb6hPA/viewform

spring-petclinic-on-aws-eks-infrastructure-engineer-journey

Introduction

Project Overview

My Responsibilities

Architecture Overview

Infrastructure Provisioning