Skip to main content

Command Palette

Search for a command to run...

spring-petclinic-on-aws-eks-infrastructure-engineer-journey

Updated
5 min read
spring-petclinic-on-aws-eks-infrastructure-engineer-journey
C
Cloud & DevOps Engineer | Infrastructure Engineer | AWS | Kubernetes | Amazon EKS | Terraform | Docker | AI-Assisted DevOps

Introduction

As part of the DevOps Micro Internship (DMI) Cohort 2, I participated in a collaborative project to deploy the Spring PetClinic Microservices application on Amazon Elastic Kubernetes Service (EKS).

The objective was not simply to deploy an application, but to simulate a real-world cloud engineering environment where team members owned different components of the platform and worked together to deliver a production-style deployment.

In this project, I served as both the Infrastructure Engineer and Team Lead, responsible for designing, provisioning, securing, and supporting the AWS infrastructure that hosted the application.

This article documents the architecture, responsibilities, challenges, lessons learned, and key takeaways from the project.


Project Overview

Spring PetClinic is a microservices-based application composed of multiple services including:

  • API Gateway

  • Config Server

  • Discovery Server

  • Customers Service

  • Vets Service

  • Visits Service

  • GenAI Service

  • Multiple MySQL databases

The application was deployed on Amazon EKS using Kubernetes, with AWS services providing networking, storage, security, and load balancing capabilities.


My Responsibilities

As the Infrastructure Engineer, my responsibilities included:

  • Provisioning AWS infrastructure

  • Deploying and managing Amazon EKS

  • Managing worker nodes

  • Configuring IAM roles and access controls

  • Implementing OIDC integration

  • Installing Kubernetes add-ons

  • Configuring AWS Load Balancer Controller

  • Managing Kubernetes networking

  • Enabling persistent storage using EBS CSI Driver

  • Supporting deployment and database teams

  • Troubleshooting infrastructure issues

In addition, I served as Team Lead, helping coordinate activities, support teammates, unblock technical issues, and ensure progress across different project areas.


Architecture Overview

The deployment architecture followed a cloud-native microservices model:

Users

AWS Application Load Balancer

Kubernetes Ingress

API Gateway

Microservices

MySQL Databases

Amazon EBS Volumes

AWS services used included:

  • Amazon EKS

  • Amazon EC2

  • Amazon ECR

  • Amazon VPC

  • IAM

  • Application Load Balancer

  • EBS CSI Driver

  • Route Tables

  • NAT Gateway


Infrastructure Provisioning

Infrastructure was provisioned using Terraform.

Key resources included:

  • VPC

  • Public Subnets

  • Private Subnets

  • Internet Gateway

  • NAT Gateway

  • Route Tables

  • EKS Cluster

  • Managed Node Groups

  • IAM Roles

This approach ensured repeatability, version control, and Infrastructure as Code best practices.


Major Challenges and Troubleshooting

1. AWS Load Balancer Controller Failure

One of the most significant issues involved the AWS Load Balancer Controller.

The controller failed with AccessDenied errors and was unable to provision Application Load Balancers.

After investigation, I discovered that the IAM trust relationship referenced an incorrect OIDC provider.

By correcting the trust policy and validating the cluster's OIDC configuration, the controller was successfully restored.

2. EBS CSI Driver CrashLoopBackOff

The EBS CSI Driver initially entered a CrashLoopBackOff state.

The root cause was insufficient IAM permissions.

After attaching the correct AWS-managed policy and validating the service account configuration, persistent volume provisioning began working correctly.

3. Node Capacity and Worker Recovery

As additional services were deployed, a single worker node became resource constrained.

The cluster experienced scheduling pressure and reduced capacity.

To resolve this:

  • Worker node capacity was increased

  • Additional nodes were added

  • Unhealthy instances were replaced through Auto Scaling

The cluster subsequently stabilised.

4. NAT Gateway Issues

At one stage, the NAT Gateway was removed for cost reduction purposes.

This caused worker nodes to lose outbound internet access, resulting in image pull failures and connectivity issues.

Recreating the NAT Gateway restored normal cluster operations.

My Responsibilities as Infrastructure Engineer

My primary responsibility was ensuring that the cloud infrastructure was available, secure, scalable, and ready for application deployment.

Infrastructure Provisioning with Terraform

I provisioned the AWS infrastructure using Terraform.

This included:

  • VPC creation

  • Public and private subnets

  • Route tables

  • Internet Gateway

  • Security groups

  • EKS cluster resources

  • Worker node groups

Using Infrastructure as Code allowed the environment to be reproducible and version-controlled.


Lessons Learned

This project reinforced several important lessons:

Infrastructure Is More Than Terraform

Building infrastructure is only one part of the job. Maintaining reliability, troubleshooting issues, and supporting users are equally important.

IAM Is Critical

Many AWS-related failures ultimately came down to permissions, trust relationships, and access configuration.

Documentation Matters

Having clear documentation dramatically reduced troubleshooting time and improved collaboration.

Communication Is a Technical Skill

As Team Lead, I learned that project success depends heavily on communication, coordination, and ownership.

Cost Optimisation Cannot Be Ignored

Cloud resources are powerful, but they must be managed carefully. Monitoring resource usage and understanding AWS pricing became a valuable lesson throughout this project.


Final Outcome

The project successfully delivered:

✅ Amazon EKS cluster deployment

✅ Kubernetes-based microservices platform

✅ Persistent database storage

✅ AWS Load Balancer integration

✅ Secure IAM and OIDC configuration

✅ Infrastructure as Code using Terraform

✅ Multi-node Kubernetes environment

✅ Real-world troubleshooting experience

Most importantly, the project provided hands-on exposure to the type of challenges faced by Cloud and DevOps Engineers in production environments.


Closing Thoughts

This project was one of the most valuable practical learning experiences in my cloud engineering journey.

It strengthened my understanding of AWS, Kubernetes, Terraform, networking, IAM, troubleshooting, and team collaboration.

While there were challenges along the way, each issue provided an opportunity to learn and improve.

I am grateful to the DMI mentors, co-mentors, and teammates who contributed to the experience.

🚀 DMI Cohort 3 is starting on 27 June 2026. If you are interested in gaining hands-on DevOps experience, you can apply here:

https://docs.google.com/forms/d/e/1FAIpQLSel7ai7nyb0P1qLW4vEyfB\_nEsD4lUF1XG88vmAaFGBOb6hPA/viewform