Briefly Introduce Horizontal Pod Autoscaler (HPA) for Future Learning
In this step, you'll get an introduction to Horizontal Pod Autoscaler (HPA), a powerful Kubernetes feature that automatically scales applications based on resource utilization. HPA allows you to define scaling rules based on metrics like CPU utilization, memory usage, or even custom metrics.
Understanding HPA:
HPA automatically adjusts the number of running pod replicas in a Deployment, ReplicaSet, or StatefulSet based on observed CPU or memory usage, or based on custom metrics provided by your applications. This ensures that your application can automatically scale to handle changing traffic loads, improving performance and availability.
Enable metrics server addon in Minikube:
minikube addons enable metrics-server
Example output:
* The 'metrics-server' addon is enabled
The metrics server provides Kubernetes with usage data about your resources and it is essential for the HPA to function.
Create a deployment with resource requests:
nano ~/project/k8s-manifests/hpa-example.yaml
Add the following content:
apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
spec:
selector:
matchLabels:
run: php-apache
replicas: 1
template:
metadata:
labels:
run: php-apache
spec:
containers:
- name: php-apache
image: k8s.gcr.io/hpa-example
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
requests:
cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
name: php-apache
labels:
run: php-apache
spec:
ports:
- port: 80
selector:
run: php-apache
Apply the deployment:
kubectl apply -f ~/project/k8s-manifests/hpa-example.yaml
Explanation of the YAML configuration:
- This YAML file defines a Deployment for a PHP application and the corresponding Service.
- The Deployment configuration is very similar to the NGINX one, with the exception of:
name: php-apache
: The name of the deployment and pod container.
image: k8s.gcr.io/hpa-example
: The Docker image for the container.
resources
: This section specifies the resource requirements for the container.
limits.cpu: 500m
: The maximum CPU allowed to use by the container.
requests.cpu: 200m
: The guaranteed CPU amount assigned to the container.
- The service is a standard service configuration, exposing the deployment internally.
Create an HPA configuration:
nano ~/project/k8s-manifests/php-apache-hpa.yaml
Add the following HPA manifest:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
Apply the HPA configuration:
kubectl apply -f ~/project/k8s-manifests/php-apache-hpa.yaml
Explanation of the YAML configuration:
apiVersion: autoscaling/v2
: Specifies the API version for HorizontalPodAutoscaler.
kind: HorizontalPodAutoscaler
: Indicates that this is an HPA object.
metadata
: Contains metadata about the HPA.
name: php-apache
: The name of the HPA.
spec
: Contains the HPA specification.
scaleTargetRef
: Defines the target Deployment that will be scaled.
apiVersion: apps/v1
: The API version of the target resource.
kind: Deployment
: The target resource type, which is a Deployment.
name: php-apache
: The name of the target Deployment to scale.
minReplicas: 1
: The minimum number of replicas to keep running.
maxReplicas: 10
: The maximum number of replicas to scale to.
metrics
: Defines how to determine scaling metrics.
type: Resource
: Scales based on a resource metric.
resource.name: cpu
: Scales based on CPU usage.
resource.target.type: Utilization
: Scales based on a percentage of the CPU requested by the pod
resource.target.averageUtilization: 50
: Scales when average CPU usage across all pods exceeds 50% of the requests.
Verify the HPA configuration:
kubectl get hpa
Example output:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 0%/50% 1 10 1 30s
Simulate load to trigger scaling (optional):
kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
DO NOT CLOSE the terminal with the load generator. OPEN ANOTHER TERMINAL and monitor the HPA behavior:
kubectl get hpa
Wait for a few seconds (it may take more than a minute and a half) to see the HPA scale the deployment based on CPU utilization.
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 68%/50% 1 10 2 72s
Press Ctrl+C
to stop the load generator.
Key points about HPA:
- Automatically scales pods based on resource utilization, which improves application resilience.
- Can scale based on CPU, memory, or custom metrics.
- Defines min and max replica counts, ensuring balanced and efficient scaling.
- HPA is a crucial component for maintaining application performance and availability under varying load.