Introduction
In this lab, you will start a local Kubernetes cluster using Minikube, deploy a sample NGINX application, and then scale it to meet varying demands. You will observe load balancing across multiple pods, monitor cluster events, and gain a brief introduction to Horizontal Pod Autoscaler (HPA) for future scaling automation. This lab aims to provide a comprehensive, hands-on experience for understanding Kubernetes scaling and load balancing.
Start the Kubernetes Cluster
In this step, you'll learn how to start and verify a local Kubernetes cluster using Minikube. This is an essential first step for deploying and managing containerized applications in a Kubernetes environment.
First, start the Minikube cluster:
minikube start
Example output:
😄 minikube v1.29.0 on Ubuntu 22.04
✨ Automatically selected the docker driver
📌 Using Docker driver with root permissions
🔥 Creating kubernetes in kubernetes cluster
🔄 Restarting existing kubernetes cluster
🐳 Preparing Kubernetes v1.26.1 on Docker 20.10.23 ...
🚀 Launching Kubernetes ...
🌟 Enabling addons: storage-provisioner, default-storageclass
🏄 Done! kubectl is now configured to use "minikube" cluster and "default" namespace
Verify the cluster status using multiple commands:
minikube status
kubectl get nodes
Example output for minikube status:
minikube
type: Control Plane
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured
Example output for kubectl get nodes:
NAME STATUS ROLES AGE VERSION
minikube Ready control-plane 1m v1.26.1
These commands confirm that:
- Minikube is successfully running.
- A local Kubernetes cluster has been created.
- The cluster is ready to use.
- You have a single-node cluster with control plane capabilities.
Deploy a Sample Application
In this step, you'll learn how to deploy a simple web application using a Kubernetes Deployment with a single replica. We'll create a YAML manifest for an NGINX web server and apply it to the Minikube cluster. Understanding how to deploy an application is fundamental to using Kubernetes.
First, create a directory for your Kubernetes manifests:
mkdir -p ~/project/k8s-manifests
cd ~/project/k8s-manifests
Create a new YAML file for the deployment:
nano nginx-deployment.yaml
Add the following deployment configuration:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:latest
ports:
- containerPort: 80
Save the file (Ctrl+X, then Y, then Enter).
Explanation of the YAML configuration:
apiVersion: apps/v1: Specifies the API version for Deployments.kind: Deployment: Indicates that this is a Deployment object, used to manage replicated applications.metadata: Contains metadata about the Deployment.name: nginx-deployment: The name of the deployment.labels: app: nginx: A label used to identify this deployment.
spec: Contains the deployment specification.replicas: 1: The desired number of pod instances (replicas). In this initial deployment, we have only one replica.selector: Defines how the deployment will select pods to manage.matchLabels: app: nginx: Pods with the labelapp: nginxwill be managed by this deployment.
template: The pod template. It specifies the configuration for the pods the deployment creates.metadata.labels: app: nginx: Label that applies to pods managed by this deploymentspec.containers: Defines the containers in the podname: nginx: The name of the container.image: nginx:latest: The Docker image for the container (using the latest NGINX image).ports: containerPort: 80: Exposes port 80 in the container.
Apply the deployment to the Kubernetes cluster:
kubectl apply -f nginx-deployment.yaml
Example output:
deployment.apps/nginx-deployment created
Verify the deployment status:
kubectl get deployments
kubectl get pods
Example output for kubectl get deployments:
NAME READY UP-TO-DATE AVAILABLE AGE
nginx-deployment 1/1 1 1 30s
Example output for kubectl get pods:
NAME READY STATUS RESTARTS AGE
nginx-deployment-xxx-yyy 1/1 Running 0 30s
Key points about this deployment:
- We created a Deployment with a single replica.
- The deployment uses the latest NGINX image.
- The container exposes port 80.
- The deployment has a label
app: nginxfor identification.
Inspect the deployment details:
kubectl describe deployment nginx-deployment
Example output will show deployment configuration, events, and current state.
Scale Deployments to Handle Increased Load
In this step, you will learn how to scale your application to handle more traffic. In the real world, as your application becomes more popular, one replica may not be sufficient to handle the load. To address this, Kubernetes allows you to easily scale out your application by increasing the number of pod instances (replicas).
Before scaling, let's briefly discuss why multiple replicas are necessary. A single replica of an application can only handle a certain amount of concurrent requests. If the traffic increases beyond that capacity, the application can become slow or unresponsive. By having multiple replicas, the load can be distributed across different pod instances, ensuring that the application remains responsive and available. This concept is essential for creating scalable applications.
You will now learn how to scale your Kubernetes deployment by modifying the replicas field in the YAML manifest, and also by using the kubectl scale command.
Open the previously created deployment manifest:
nano ~/project/k8s-manifests/nginx-deployment.yaml
Modify the replicas field from 1 to 3:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3 ## Changed from 1 to 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:latest
ports:
- containerPort: 80
Save the file (Ctrl+X, then Y, then Enter).
Apply the updated deployment:
kubectl apply -f ~/project/k8s-manifests/nginx-deployment.yaml
Example output:
deployment.apps/nginx-deployment configured
Verify the scaled deployment:
kubectl get deployments
kubectl get pods
Example output for kubectl get deployments:
NAME READY UP-TO-DATE AVAILABLE AGE
nginx-deployment 3/3 3 3 5m
Example output for kubectl get pods:
NAME READY STATUS RESTARTS AGE
nginx-deployment-xxx-yyy 1/1 Running 0 5m
nginx-deployment-xxx-zzz 1/1 Running 0 30s
nginx-deployment-xxx-www 1/1 Running 0 30s
Alternative scaling method using kubectl scale:
kubectl scale deployment nginx-deployment --replicas=4
Example output:
deployment.apps/nginx-deployment scaled
Verify the new number of replicas:
kubectl get deployments
kubectl get pods
Key points about scaling:
- Modify
replicasin the YAML file or use thekubectl scalecommand. - Use
kubectl applyto update the deployment when making changes to the YAML file. - Kubernetes ensures the desired number of replicas are running.
- You can scale both up (increase replicas) or down (decrease replicas).
Verify Load Balancing by Checking Multiple Pod Responses
In this step, you'll learn how to verify load balancing in Kubernetes by creating a Service and checking responses from multiple pods. Load balancing is crucial for distributing traffic across multiple replicas, ensuring that no single pod is overwhelmed. Kubernetes Services handle this process automatically.
Create a service to expose the deployment:
nano ~/project/k8s-manifests/nginx-service.yaml
Add the following service configuration:
apiVersion: v1
kind: Service
metadata:
name: nginx-service
spec:
selector:
app: nginx
type: ClusterIP
ports:
- port: 80
targetPort: 80
Save the file (Ctrl+X, then Y, then Enter).
Explanation of the YAML configuration:
apiVersion: v1: Specifies the API version for Services.kind: Service: Indicates that this is a Service object.metadata: Contains metadata about the Service.name: nginx-service: The name of the Service.
spec: Contains the service specification.selector: Defines which pods this service will route traffic to.app: nginx: Selects pods with the labelapp: nginx, which matches the pods created in the previous step.
type: ClusterIP: Creates an internal service with a cluster IP address, used for internal communication. This service type is only reachable within the Kubernetes cluster.ports: Defines how the service will map traffic.port: 80: The port that the service exposes.targetPort: 80: The port that the application inside the container is listening on.
Apply the service:
kubectl apply -f ~/project/k8s-manifests/nginx-service.yaml
Example output:
service/nginx-service created
Verify the service:
kubectl get services
Example output:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 30m
nginx-service ClusterIP 10.96.xxx.xxx <none> 80/TCP 30s
Now, to truly verify load balancing, you will create a temporary pod and send multiple requests to the service. This allows you to see that the requests are being distributed across different NGINX pods.
Create a temporary pod to test load balancing:
kubectl run curl-test --image=curlimages/curl --rm -it -- sh
This command does the following:
kubectl run curl-test: Creates a new pod namedcurl-test.--image=curlimages/curl: Uses a Docker image withcurlinstalled.--rm: Automatically removes the pod when it is finished.-it: Allocates a pseudo-TTY and keeps stdin open.-- sh: Starts a shell session in the pod.
Inside the temporary pod, run multiple requests:
for i in $(seq 1 10); do curl -s nginx-service | grep -q "Welcome to nginx!" && echo "Welcome to nginx - Request $i"; done
This loop will send 10 requests to the nginx-service. Each request should be routed to one of the available NGINX pods. The output will print Welcome to nginx - Request $i for each successful request.
Example output:
Welcome to nginx - Request 1
Welcome to nginx - Request 2
Welcome to nginx - Request 3
...
Exit the temporary pod:
exit
Key points about load balancing:
- Services distribute traffic across all matching pods.
- Each request can potentially hit a different pod.
- Kubernetes uses a round-robin approach by default.
- The
ClusterIPservice type provides internal load balancing. - The curl test shows the load being distributed across multiple NGINX instances.
Dynamically Adjust the Deployment Scale to Meet Demand
In this step, you will practice dynamically adjusting your Kubernetes deployment scale to meet changing application demands using the kubectl scale command. This step emphasizes the practical aspect of adjusting the number of running replicas without directly modifying the YAML file, which can be useful for rapid adjustments in response to traffic spikes.
First, check the current deployment status:
kubectl get deployments
Example output:
NAME READY UP-TO-DATE AVAILABLE AGE
nginx-deployment 4/4 4 4 45m
Scale the deployment using the kubectl command:
kubectl scale deployment nginx-deployment --replicas=5
Example output:
deployment.apps/nginx-deployment scaled
Verify the new number of replicas:
kubectl get deployments
kubectl get pods
Example output for deployments:
NAME READY UP-TO-DATE AVAILABLE AGE
nginx-deployment 5/5 5 5 46m
Example output for pods:
NAME READY STATUS RESTARTS AGE
nginx-deployment-xxx-yyy 1/1 Running 0 1m
nginx-deployment-xxx-zzz 1/1 Running 0 1m
nginx-deployment-xxx-www 1/1 Running 0 1m
nginx-deployment-xxx-aaa 1/1 Running 0 1m
nginx-deployment-xxx-bbb 1/1 Running 0 1m
Now, update the deployment YAML for persistent scaling:
nano ~/project/k8s-manifests/nginx-deployment.yaml
Modify the replicas field:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 5 ## Updated from previous value
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:latest
ports:
- containerPort: 80
Apply the updated configuration:
kubectl apply -f ~/project/k8s-manifests/nginx-deployment.yaml
Example output:
deployment.apps/nginx-deployment configured
Simulate scaling down for reduced demand:
kubectl scale deployment nginx-deployment --replicas=2
Example output:
deployment.apps/nginx-deployment scaled
Verify the reduced number of replicas:
kubectl get deployments
kubectl get pods
Key points about scaling:
- Use
kubectl scalefor quick, temporary scaling. - Update YAML for persistent configuration.
- Kubernetes ensures smooth scaling with minimal disruption.
- You can scale up or down based on application needs using both command and configuration.
Monitor Deployment and Pod Events for Changes
In this step, you'll learn how to monitor Kubernetes deployments and pods using various kubectl commands to track changes, troubleshoot issues, and understand the lifecycle of your applications. Observability is crucial for ensuring the health and performance of your applications.
Describe the current deployment to get detailed information:
kubectl describe deployment nginx-deployment
Example output:
Name: nginx-deployment
Namespace: default
CreationTimestamp: [timestamp]
Labels: app=nginx
Replicas: 2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=nginx
Containers:
nginx:
Image: nginx:latest
Port: 80/TCP
Host Port: 0/TCP
Environment: <none>
Mounts: <none>
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: nginx-deployment-xxx (2/2 replicas created)
Events: <some deployment events>
Get detailed information about individual pods:
kubectl describe pods -l app=nginx
Example output will show details for each pod, including:
- Current status
- Container information
- Events
- IP addresses
- Node information
View cluster-wide events:
kubectl get events
Example output:
LAST SEEN TYPE REASON OBJECT MESSAGE
5m Normal Scheduled pod/nginx-deployment-xxx-yyy Successfully assigned default/nginx-deployment-xxx-yyy to minikube
5m Normal Pulled pod/nginx-deployment-xxx-yyy Container image "nginx:latest" already present on machine
5m Normal Created pod/nginx-deployment-xxx-yyy Created container nginx
5m Normal Started pod/nginx-deployment-xxx-yyy Started container nginx
Filter events for specific resources:
kubectl get events --field-selector involvedObject.kind=Deployment
Example output will show only deployment-related events.
Simulate an event by deleting a pod:
## Get a pod name
POD_NAME=$(kubectl get pods -l app=nginx -o jsonpath='{.items[0].metadata.name}')
## Delete the pod
kubectl delete pod $POD_NAME
Observe the events and pod recreation:
kubectl get events
kubectl get pods
Key points about monitoring:
kubectl describeprovides detailed resource information.kubectl get eventsshows cluster-wide events.- Kubernetes automatically replaces deleted pods.
- Events help troubleshoot deployment issues.
- Use
describefor detailed object information andeventsto track actions.
Briefly Introduce Horizontal Pod Autoscaler (HPA) for Future Learning
In this step, you'll get an introduction to Horizontal Pod Autoscaler (HPA), a powerful Kubernetes feature that automatically scales applications based on resource utilization. HPA allows you to define scaling rules based on metrics like CPU utilization, memory usage, or even custom metrics.
Understanding HPA:
HPA automatically adjusts the number of running pod replicas in a Deployment, ReplicaSet, or StatefulSet based on observed CPU or memory usage, or based on custom metrics provided by your applications. This ensures that your application can automatically scale to handle changing traffic loads, improving performance and availability.
Enable metrics server addon in Minikube:
minikube addons enable metrics-server
Example output:
* The 'metrics-server' addon is enabled
The metrics server provides Kubernetes with usage data about your resources and it is essential for the HPA to function.
Create a deployment with resource requests:
nano ~/project/k8s-manifests/hpa-example.yaml
Add the following content:
apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
spec:
selector:
matchLabels:
run: php-apache
replicas: 1
template:
metadata:
labels:
run: php-apache
spec:
containers:
- name: php-apache
image: k8s.gcr.io/hpa-example
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
requests:
cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
name: php-apache
labels:
run: php-apache
spec:
ports:
- port: 80
selector:
run: php-apache
Apply the deployment:
kubectl apply -f ~/project/k8s-manifests/hpa-example.yaml
Explanation of the YAML configuration:
- This YAML file defines a Deployment for a PHP application and the corresponding Service.
- The Deployment configuration is very similar to the NGINX one, with the exception of:
name: php-apache: The name of the deployment and pod container.image: k8s.gcr.io/hpa-example: The Docker image for the container.resources: This section specifies the resource requirements for the container.limits.cpu: 500m: The maximum CPU allowed to use by the container.requests.cpu: 200m: The guaranteed CPU amount assigned to the container.
- The service is a standard service configuration, exposing the deployment internally.
Create an HPA configuration:
nano ~/project/k8s-manifests/php-apache-hpa.yaml
Add the following HPA manifest:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
Apply the HPA configuration:
kubectl apply -f ~/project/k8s-manifests/php-apache-hpa.yaml
Explanation of the YAML configuration:
apiVersion: autoscaling/v2: Specifies the API version for HorizontalPodAutoscaler.kind: HorizontalPodAutoscaler: Indicates that this is an HPA object.metadata: Contains metadata about the HPA.name: php-apache: The name of the HPA.
spec: Contains the HPA specification.scaleTargetRef: Defines the target Deployment that will be scaled.apiVersion: apps/v1: The API version of the target resource.kind: Deployment: The target resource type, which is a Deployment.name: php-apache: The name of the target Deployment to scale.
minReplicas: 1: The minimum number of replicas to keep running.maxReplicas: 10: The maximum number of replicas to scale to.metrics: Defines how to determine scaling metrics.type: Resource: Scales based on a resource metric.resource.name: cpu: Scales based on CPU usage.resource.target.type: Utilization: Scales based on a percentage of the CPU requested by the podresource.target.averageUtilization: 50: Scales when average CPU usage across all pods exceeds 50% of the requests.
Verify the HPA configuration:
kubectl get hpa
Example output:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 0%/50% 1 10 1 30s
Simulate Load and Observe Autoscaling in Real-time
To simulate high load and trigger the autoscaler, you'll run a load generator in one terminal and monitor the scaling activity in a separate terminal.
First, open a terminal for the load generator:
kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
DO NOT CLOSE the terminal with the load generator. OPEN ANOTHER TERMINAL to monitor the scaling activity.
In the second terminal, you can use several commands to observe the autoscaling in real-time:
- Monitor HPA status (updates every few seconds):
kubectl get hpa -w
- Watch the pods being created as the HPA scales up:
kubectl get pods -w
- Track events related to the scaling activity:
kubectl get events --sort-by='.lastTimestamp' -w
You can run any of these commands to observe different aspects of the autoscaling process. For example, watching pods with -w flag shows you pods being created in real-time as the system scales:
Example output for kubectl get pods -w:
NAME READY STATUS RESTARTS AGE
php-apache-xxxxxxxxx-xxxxx 1/1 Running 0 2m
load-generator 1/1 Running 0 30s
php-apache-xxxxxxxxx-yyyyy 0/1 Pending 0 0s
php-apache-xxxxxxxxx-yyyyy 0/1 ContainerCreating 0 0s
php-apache-xxxxxxxxx-yyyyy 1/1 Running 0 3s
php-apache-xxxxxxxxx-zzzzz 0/1 Pending 0 0s
php-apache-xxxxxxxxx-zzzzz 0/1 ContainerCreating 0 0s
php-apache-xxxxxxxxx-zzzzz 1/1 Running 0 2s
You will see the HPA responding to the increased load by scaling up the number of pods. The metrics update may take a minute or more to reflect changes:
Example output for kubectl get hpa -w:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 0%/50% 1 10 1 30s
php-apache Deployment/php-apache 68%/50% 1 10 1 90s
php-apache Deployment/php-apache 68%/50% 1 10 2 90s
php-apache Deployment/php-apache 79%/50% 1 10 2 2m
php-apache Deployment/php-apache 79%/50% 1 10 4 2m15s
When you're done observing, press Ctrl+C to stop the monitoring command, and go back to the first terminal and press Ctrl+C to stop the load generator.
Key points about HPA:
- Automatically scales pods based on resource utilization, which improves application resilience.
- Can scale based on CPU, memory, or custom metrics.
- Defines min and max replica counts, ensuring balanced and efficient scaling.
- HPA is a crucial component for maintaining application performance and availability under varying load.
- Using the
-w(watch) flag with kubectl commands provides real-time monitoring of cluster changes.
Summary
In this lab, you've gained hands-on experience with scaling and load balancing in Kubernetes. You started by creating a local Kubernetes cluster with Minikube and deploying a basic NGINX web application. You then explored different scaling methods, including modifying deployment YAML files and using kubectl scale to adjust the number of pod replicas. You learned how to verify load balancing using Kubernetes Services and a temporary test pod.
Furthermore, you learned how to monitor deployments and pods through kubectl describe and kubectl get events commands. Finally, you gained a basic understanding of the Horizontal Pod Autoscaler (HPA), including how it can automatically scale your application based on resource utilization, using an example based on a php-apache image. This lab provides a comprehensive introduction to Kubernetes scaling, load balancing, monitoring, and autoscaling techniques, and sets the foundation for managing more complex applications in Kubernetes.


