AutoScaling in Kubernetes

March 31, 2022

To know about autoscaling and its types in Kubernetes

Allocating resources to pods running inside the Kubernetes cluster is challenging as it gives rise to the questions such as how much CPU and RAM to allocate to pods for high performance and how to create enough replicas of these pods to handle the incoming load. For this, Kubernetes comes with a powerful feature called autoscaling.

In this blog, we will be going to see about different types of autoscaling and do enable autoscaling of pods through metrics server.

Autoscaling

Autoscaling is one of the important features in a Kubernetes cluster which helps in increasing or decreasing the number of pods or nodes according to the demand for service responses to it.

This helps in improving the overall resource utilization of the cluster by automatically adjusting the application resources and pods according to the load at a time and thus avoiding many manual tasks.

This autoscaling uses two types of mechanisms:

Pod-based scaling – To automate the scaling of pods through Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA) methods.
Node-based scaling – To automate the cluster node scaling through the Cluster Autoscaler(CA) method.

Horizontal Pod Autoscaler (HPA)

Horizontal Pod Autoscaler (HPA) scales the number of pod replicas automatically of workload resources such as Deployments/StatefulSets to match the workload demand. Horizontal scaling means that if the load gets increased then HPA instructs workload resources to deploy more pods(i.e. to scale up), and similarly, if the load gets decreased and the number of pods is already above the minimum configuration then HPA instructs workload resources to scale down.

Vertical Pod Autoscaler (HPA)

Vertical Pod Autoscaler (VPA) provides dynamic provisioning of compute resources (CPU and memory) to workload resources such as Deployments/StatefulSets based on the analysis of metrics collection from these workloads and automatically updates them so that the cluster resources are used efficiently.

Cluster Autoscaler (CA)

Cluster Autoscaler (CA) helps in maintaining the size of the Kubernetes cluster by dynamically adding or removing the nodes of the cluster based on the node utilization metrics and the number of pending pods that could not get scheduled due to shortage of resources.

As of now, we have discussed types of autoscaling in the Kubernetes cluster. In the next section, we will be seeing the working of Horizontal Pod Autoscaler as to what basis it scales the pod.

Working of Horizontal Pod Autoscaler

Horizontal Pod Autoscaling can work with both stateful and stateless applications but it can not work with Daemonsets as it can’t be scaled. This HPA is implemented as a Kubernetes API resource and controller.

This resource helps the controller running inside the Kubernetes control plane to adjust the desired scale of workload resources timely by matching the observed metrics such as average CPU utilization, average memory utilization, or any other custom metrics.

In the earlier Kubernetes version, Heapster was used as a metrics collector but due to its limited functionality it’s not in much use, later on, metrics API and metrics-server were introduced for metrics collection which can collect metrics from Kubernetes objects and can collect metrics on the number of HTTP requests.

By default, HPA collects metrics through metrics API, and the most commonly used is resource metrics which is implemented through metrics.k8s.io API provided through metrics-server.

*Figure 4: HPA working with Metrics Server*

Kubernetes implements the HPA in a control loop through the Kube-Controller manager by setting an interval of 15 seconds through --horizontal-pod-autoscaler-sync-period parameter.

In each of these periods, the Kube-controller-manager makes the resource utilization query against the metrics specified in the HPA definition and finds the target to be resourced through the scaleTargetRef field, and obtains the metrics either from resource metrics API or through custom metrics API.

HPA definition has a threshold value of CPU or memory utilization of workload and metrics are collected from metrics server and if the utilization is above the threshold then HPA scales up the pod and if it is less then it scales down.

The way HPA calculates a number of replicas is through the following formula

desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]

For example, if the current metric value is 400m, and the desired value is 100m, the number of replicas will be four times since 400.0 / 100.0 == 4.0

When a targetAverageValue or targetAverageUtilization is specified, the currentMetricValue is calculated by taking the average of the given metric across all the pods in the HorizontalPodAutoscaler’s scale target.

In the next section, we will be going to deploy a python web app through a NodePort service and then will increase the load on the app through locust and scale the pods through HPA.

Horizontal Pod Autoscaler LAB

Deploy the front end of the application by creating a deployment and exposing it through a service.

# frontend.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rsvp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: rsvp
  template:
    metadata:
      labels:
        app: rsvp
    spec:
      containers:
      - name: rsvp-app
        image: teamcloudyuga/rsvpapp:latest
        resources:
          limits:
            cpu: "50m"
          requests:
            cpu: "50m" 
        livenessProbe:
          httpGet:
            path: /
            port: 5000
          periodSeconds: 30
          timeoutSeconds: 1
          initialDelaySeconds: 50
        env:
        - name: MONGODB_HOST
          value: mongodb
        ports:
        - containerPort: 5000
          name: web-port
---
apiVersion: v1
kind: Service
metadata:
  name: rsvp
  labels:
    app: rsvp
spec:
  type: NodePort
  ports:
  - port: 80
    targetPort: web-port
    protocol: TCP
  selector:
    app: rsvp

kubectl apply -f frontend.yaml

Note: Remember to add resources (line 19) attributes for which you want to collect metrics using metrics-server and scale it using HPA.

Create the backend of the app by creating its deployment and exposing it through service.

# backend.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rsvp-db
spec:
  replicas: 1
  selector:
    matchLabels:
      appdb: rsvpdb
  template:
    metadata:
      labels:
        appdb: rsvpdb
    spec:
      volumes:
        - name: voldb
          emptyDir: {}
      containers:
      - name: rsvpd-db
        image: teamcloudyuga/mongo:3.3
        volumeMounts:
        - name: voldb
          mountPath: /data/db
        ports:
        - containerPort: 27017
---
apiVersion: v1
kind: Service
metadata:
  name: mongodb
  labels:
    app: rsvpdb
spec:
  ports:
  - port: 27017
    protocol: TCP
  selector:
    appdb: rsvpdb

kubectl apply -f backend.yaml

kubectl get pods,svc

To access the application through browser deploy the ingress for it

# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: rsvp-ingress 
spec:
  rules:
  - http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: rsvp
            port:
              number: 80

kubectl apply -f ingress.yaml

kubectl get ingress

Now, access the app through the app-port-80 URL under the LAB-URL section and will get an rsvp app like shown in the image below.

Then, to collect metrics through the metrics server, configure it through the following.

apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
    rbac.authorization.k8s.io/aggregate-to-admin: "true"
    rbac.authorization.k8s.io/aggregate-to-edit: "true"
    rbac.authorization.k8s.io/aggregate-to-view: "true"
  name: system:aggregated-metrics-reader
rules:
- apiGroups:
  - metrics.k8s.io
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server-auth-reader
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server:system:auth-delegator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:metrics-server
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  ports:
  - name: https
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  strategy:
    rollingUpdate:
      maxUnavailable: 0
  template:
    metadata:
      labels:
        k8s-app: metrics-server
    spec:
      containers:
      - args:
        - --cert-dir=/tmp
        - --kubelet-insecure-tls=true
        - --secure-port=4443
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        - --kubelet-use-node-status-port
        - --metric-resolution=15s
        image: registry.k8s.io/metrics-server/metrics-server:v0.6.4
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /livez
            port: https
            scheme: HTTPS
          periodSeconds: 10
        name: metrics-server
        ports:
        - containerPort: 4443
          name: https
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /readyz
            port: https
            scheme: HTTPS
          initialDelaySeconds: 20
          periodSeconds: 10
        resources:
          requests:
            cpu: 100m
            memory: 200Mi
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
        volumeMounts:
        - mountPath: /tmp
          name: tmp-dir
      nodeSelector:
        kubernetes.io/os: linux
      priorityClassName: system-cluster-critical
      serviceAccountName: metrics-server
      volumes:
      - emptyDir: {}
        name: tmp-dir
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  labels:
    k8s-app: metrics-server
  name: v1beta1.metrics.k8s.io
spec:
  group: metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: metrics-server
    namespace: kube-system
  version: v1beta1
  versionPriority: 100

kubectl apply -f components.yaml

NOTE: Configure args – –kubelet-insecure-tls=true flag in the metrics-server deployment (check line number 136).

Check the metrics-server pod in the kube-system namespace

kubectl get pods -n kube-system

Check the resource utilization of the pods by running the following command

kubectl top pod --namespace default

Now to increase the load and usage on the app, install locust through pip, and install Flask as a prerequisite for locust.

apt update && apt install python3-pip -y

pip install flask
pip install locust

Create a locustfile for load testing

# locust_file.py
import time
from locust import HttpUser, task, between

class WebsiteUser(HttpUser):
    wait_time = between(1, 5)

    @task
    def check_page(self):
        self.client.get(url="/")

locust -f locust_file.py --host <APP_URL> --users 100 --spawn-rate 20 --web-port=8089

Here, replace <APP_URL> with the rsvp app URL and access the locust UI.

Click the Start swarming button in the locust UI to enable the load on the rsvp app and will see an output like below.

*Figure 7:Locust UI after starting load on rsvp app*

To enable scaling of pods, create a Horizontal Pod Autoscaler for the rsvp deployment (frontend.yaml)

# hpa.yaml
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: frontend-hpa
spec:
  maxReplicas: 5
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: rsvp
  targetCPUUtilizationPercentage: 20

Here, inside scaleTargetRef (line 9) the kind of resource and its name has to be mentioned on which HPA has to be applied.

NOTE: In the rsvp deployment, we have specified CPU limits and requests and in this hpa object target value of CPU utilization is 20% . So as soon as the CPU utilization hits 20% or greater than it, scaling will take place.

kubectl apply -f hpa.yaml

kubectl get hpa

As soon as the load starts getting increased on the app, HPA will start working and will scale up the pods.

kubectl top pod --namespace default

kubectl get hpa

kubectl get pods

What Next?

As we have seen scaling the application on the basis of CPU metrics through HPA. In the blog, we will be scaling the same application with KEDA, a Kubernetes Event-Driven Autoscaling.

Conclusion

In this blog, we saw about autoscaling in Kubernetes and learned about the working of HPA and how to implement it.

Join Our Newsletter

[mautic type="form" id="9"]

Share this article:

Oshi Gupta

All Posts

AutoScaling in Kubernetes

To know about autoscaling and its types in Kubernetes

Autoscaling

Horizontal Pod Autoscaler (HPA)

Vertical Pod Autoscaler (HPA)

Cluster Autoscaler (CA)

Working of Horizontal Pod Autoscaler

Horizontal Pod Autoscaler LAB

What Next?

Conclusion

Join Our Newsletter

Oshi Gupta

Want to Master your Cloud Career ?

Company

Resources

© Copyright CloudYuga 2025. All rights reserved.

Terms & Conditions

Privacy Policy

Refund Policy