Allocating resources to pods running inside the Kubernetes cluster is challenging as it gives rise to the questions such as how much CPU and RAM to allocate to pods for high performance and how to create enough replicas of these pods to handle the incoming load. For this, Kubernetes comes with a powerful feature called autoscaling.
In this hands-on lab, we will be going to see about different types of autoscaling and do enable autoscaling of pods through metrics server.
Lab Setup
You can start the lab setup by clicking on the Lab Setup button on the right side of the screen. Please note that there are app-specific URLs exposed specifically for the hands-on lab purpose.
Our lab has been set up with all necessary tools like base OS (Ubuntu), developer tools like Git, Vim, wget, and others.
Autoscaling
Autoscaling is one of the important features in a Kubernetes cluster which helps in increasing or decreasing the number of pods or nodes according to the demand for service responses to it.
This helps in improving the overall resource utilization of the cluster by automatically adjusting the application resources and pods according to the load at a time and thus avoiding many manual tasks.
This autoscaling uses two types of mechanisms:
- Pod-based scaling - To automate the scaling of pods through Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA) methods.
- Node-based scaling - To automate the cluster node scaling through the Cluster Autoscaler(CA) method.
Horizontal Pod Autoscaler (HPA)
Horizontal Pod Autoscaler (HPA) scales the number of pod replicas automatically of workload resources such as Deployments/StatefulSets to match the workload demand. Horizontal scaling means that if the load gets increased then HPA instructs workload resources to deploy more pods(i.e. to scale up), and similarly, if the load gets decreased and the number of pods is already above the minimum configuration then HPA instructs workload resources to scale down.
Vertical Pod Autoscaler (HPA)
Vertical Pod Autoscaler (VPA) provides dynamic provisioning of compute resources (CPU and memory) to workload resources such as Deployments/StatefulSets based on the analysis of metrics collection from these workloads and automatically updates them so that the cluster resources are used efficiently.
Cluster Autoscaler (CA)
Cluster Autoscaler (CA) helps in maintaining the size of the Kubernetes cluster by dynamically adding or removing the nodes of the cluster based on the node utilization metrics and the number of pending pods that could not get scheduled due to shortage of resources.

As of now, we have discussed types of autoscaling in the Kubernetes cluster. In the next section, we will be seeing the working of Horizontal Pod Autoscaler as to what basis it scales the pod.
Working of Horizontal Pod Autoscaler
Horizontal Pod Autoscaling can work with both stateful and stateless applications but it can not work with Daemonsets as it can’t be scaled. This HPA is implemented as a Kubernetes API resource and controller.
This resource helps the controller running inside the Kubernetes control plane to adjust the desired scale of workload resources timely by matching the observed metrics such as average CPU utilization, average memory utilization, or any other custom metrics.
In the earlier Kubernetes version, Heapster was used as a metrics collector but due to its limited functionality it’s not in much use, later on, metrics API and metrics-server were introduced for metrics collection which can collect metrics from Kubernetes objects and can collect metrics on the number of HTTP requests.
By default, HPA collects metrics through metrics API, and the most commonly used is resource metrics which is implemented through metrics.k8s.io
API provided through metrics-server.
Kubernetes implements the HPA in a control loop through the Kube-Controller manager by setting an interval of 15 seconds through --horizontal-pod-autoscaler-sync-period
parameter.
In each of these periods, the Kube-controller-manager makes the resource utilization query against the metrics specified in the HPA definition and finds the target to be resourced through the scaleTargetRef
field, and obtains the metrics either from resource metrics API or through custom metrics API.
HPA definition has a threshold value of CPU or memory utilization of workload and metrics are collected from metrics server and if the utilization is above the threshold then HPA scales up the pod and if it is less then it scales down.
The way HPA calculates a number of replicas is through the following formula
For example, if the current metric value is 400m, and the desired value is 100m, the number of replicas will be four times since 400.0 / 100.0 == 4.0
When a targetAverageValue
or targetAverageUtilization
is specified, the currentMetricValue
is calculated by taking the average of the given metric across all the pods in the HorizontalPodAutoscaler's scale target.
In the next section, we will be going to deploy a python web app through a NodePort service and then will increase the load on the app through locust and scale the pods through HPA.
Horizontal Pod Autoscaler LAB
As we triggered the lab through the LAB SETUP button, a terminal, and an IDE comes for us which already have a Kubernetes cluster running in them. This can be checked by running the kubectl get nodes
command.
- Deploy the front end of the application by creating a deployment and exposing it through a service.
kubectl apply -f frontend.yaml
- Create the backend of the app by creating its deployment and exposing it through service.
kubectl apply -f backend.yaml
kubectl get pods,svc
- To access the application through browser deploy the ingress for it
kubectl apply -f ingress.yaml
kubectl get ingress
Now, access the app through the app-port-80
URL under the LAB-URL section and will get an rsvp app like shown in the image below.

- Then to collect metrics through metrics server, configure it through the following
kubectl apply -f components.yaml
Check the metrics-server
pod in the kube-system
namespace
kubectl get pods -n kube-system
- Check the resource utilization of the pods by running the following command
kubectl top pod --namespace default
- Now to increase the load and usage on the app, install locust through pip, and install Flask as a prerequisite for locust.
sudo apt update
apt install python3-pip -y
pip install flask
pip install locust
- Create a locustfile for load testing
locust -f locust_file.py --host <APP_URL> --users 500 --spawn-rate 20 --web-port=8089
Here, replace <APP_URL>
with the rsvp app URL and access the locust UI from the app-port-8089
URL under the lab URL section and will see a locust UI as shown in the image below.
Click the Start swarming
button in the locust UI to enable the load on the rsvp app and will see an output like below

- To enable scaling of pods, create a Horizontal Pod Autoscaler for the
rsvp
deployment (frontend.yaml
)
Here, inside scaleTargetRef
(line 9) the kind of resource and its name has to be mentioned on which HPA has to be applied.
kubectl apply -f hpa.yaml
kubectl get hpa
As soon as the load starts getting increased on the app, HPA will start working and will scale up the pods.
kubectl top pod --namespace default
kubectl get hpa
kubectl get pods
What Next?
As we have seen scaling the application on the basis of CPU metrics through HPA. In the next hands-on lab, we will be scaling the same application with KEDA, a Kubernetes Event-Driven Autoscaling.
Conclusion
In this hands-on lab, we saw about autoscaling in Kubernetes and learned about the working of HPA and how to implement it.