Cluster Autoscaler and Horizontal Pod Autoscaler for on-premise Kubernetes Clusters

7 min readJun 12, 2020

The purpose of the Cluster Autoscaler is to provision new Worker Nodes in a Kubernetes Cluster during periods of high demand when the cluster does not have sufficient resources such as CPU, memory, or GPU to run new Pods. And, during quiet periods when the cluster no longer require these resources, the Cluster Autoscaler will terminate the extra Worker Nodes. The Horizontal Pod Autoscaler works by creating additional replicas of a Pod on the Worker Node where it runs when the monitored resources (such as CPU) it consumes has exceeded the defined threshold. Similarly, the Horizontal Pod Autoscaler will delete these additional Pod replicas when the consumed resources have dropped below the defined threshold. Together, the Cluster Autoscaler and Horizontal Pod Autoscaler help us to effectively manage resources and the running costs of managed Kubernetes Services offered and supported by many Public Cloud Providers.

On-Premises Kubernetes Clusters

Now, for those who want to deploy on-premises Kubernetes Clusters in their private infrastructure, Kubernetes resource scaling feature such as the Cluster Autoscaler is not generally available. Although the Cluster Autoscaler is an upstream Kubernetes Community project, it is available mostly on Public Cloud Providers offering managed Kubernetes Services as there are Cloud Provider specific integrations required.

https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler

To make Cluster Autoscaler works for on-premises Kubernetes Clusters, I have implemented a version of the Cluster Autoscaler based on the upstream Kubernetes Community project to work on the Cisco Container Platform. The Cisco Container Platform is an on-premises Kubernetes as a Service platform based on 100% upstream Kubernetes.

Deploying On-Premises Cluster Autoscaler and Horizontal Pod Autoscaler

I will briefly talk about the integration call flows between Cluster Autoscaler with Cisco Container Platform, but will stop short of going into code walk discussion.

Below is a summary of how Cluster Autoscaler works on an on-premises Cisco Container Platform Kubernetes Cluster:

The Cisco Container Platform supports standards based REST APIs to provision new or terminate existing Worker Node in Tenant Clusters managed by it
The cluster monitoring logic, cluster scale up or down decisions in the upstream Cluster Autoscaler code have been preserved and unmodified
Each scale up decision made by the Cluster Autoscaler invokes REST API calls to the Cisco Container Platform to provision a new Worker Node when conditions such as pending or unschedulable Pods is detected.
Each scale down decision made by the Cluster Autoscaler invokes REST API calls to the Cisco Container Platform to terminate an existing Worker Node after nodes are under-utilized after 10 minutes.

The Horizontal Pod Autoscaler requires a Metric Server to be deployed on the Kubernetes Cluster to provide metrics via the resource metrics API. The Horizontal Pod Autoscaler automatically scales the number of Pods based on observed CPU utilization or custom metrics.

Below shows my Cisco Container Platform Tenant Cluster provisioned with a single Master Node and two Worker Nodes.

On-Premises Cisco Container Platform Kubernetes Cluster with One Master and Two Worker Nodes

The Cluster Autoscaler is deployed on the Master Node in the Cluster and in the kube-system namespace. I have also specified the minimum node size of 1 and the maximum node size of 5. This means that the Cluster Autoscaler will maintain a minimum Cluster size of at least one Worker Node and it will not increase the Cluster beyond 5 Worker Nodes. Then, the Cluster Autoscaler is provided with the credentials and API token to authenticate with the Cisco Container Platform’s REST API server and make REST API calls.

Cluster Autoscaler Container Pod running on Master Node in the kube-system namespace

The metric server required by the Horizontal Pod Autoscaler is deployed on the Cluster in the kube-system namespace.

With both the Cluster Autoscaler and the metric server required by the Horizontal Pod Autoscaler in place, I am ready to verify they are working as intended on the on-premises Kubernetes Cluster. First of all, I will need to deploy some Pods on the Cluster to get it to scale up. To meet this objective, I will use the Hipster Store microservices demo application which is located at:

GoogleCloudPlatform/microservices-demo

Online Boutique is a cloud-native microservices demo application. Online Boutique consists of a 10-tier microservices…

github.com

I have created a single deployment YAML file to deploy the Hipster Store application onto my Cluster.

Deploying the Hipster Store demo application on the on-premises Cluster

Next, I will create the Horizontal Pod Autoscaler for the frontend container Pod, which is one of the ten microservices in the Hipster Store application. In my example below, I have instructed the Horizontal Pod Autoscaler to monitor the CPU utilization of the frontend container Pod, and if it exceeds 50% utilization, it will create additional replicas of the frontend container Pod, up to the maximum of 10. Then, when the CPU utilization drops below 50%, it will reduce the number of container Pods down to the minimum count of 1 Pod.

Horizontal Pod Autoscaler deployed for the frontend Pod with 50% CPU utilization and pod size from 1 to 10

Scaling Up Behavior

With Cluster Autoscaler and Horizontal Pod Autoscaler deployed on a Kubernetes Cluster, I expect the Horizontal Pod Autoscaler will scale up the number of frontend container Pods as its CPU load increases due to increased traffic. The Hipster Store application has a loadgenerator container Pod which generates simulated user traffic to the frontend container Pod. Immediately, the Horizontal Pod Autoscaler has detected the CPU utilization has crossed the 50% mark, and it has created new replicas of the frontend container Pod.

The Horizontal Pod Autoscaler triggered the horizontal scaling of the frontend container Pod to 8 Pods

On the Cluster, there are 8 replicas of the frontend container Pods created by the Horizontal Pod Autoscaler

8 replicas of the frontend container Pods running on the Cluster now

The Cluster Autoscaler has not kicked in yet at this point as the existing Cluster is still able to handle the overall load. The Cluster size at this point is still two Worker Nodes.

To trigger the Cluster Autoscaler, more load on the Cluster is required. To achieve this goal, I need additional replicas of the loadgenerator container Pod. The number of loadgenerator container Pods is increased to 5. At this point, Pending loadgenerator container Pods can be seen.

The Cluster is now unable to handle the new additional loadgenerator container Pods

The Cluster Autoscaler has now kicked in, and it made REST API calls to the Cisco Container Platform to provision one new Worker Node into the Cluster. The Cisco Container Platform takes about less than 2 minutes to add the new Worker Node. The Cluster now has 3 Worker Nodes, and all the Pending loadgenerator container Pods are running. And, due to increased traffic load from the new loadgenerator container Pods, the Horizontal Pod Autoscaler has scaled up the number of frontend Pod replicas to the maximum of 10 Pods.

A new Worker Node is provisioned into the Cluster

The Pending loadgenerator container Pods can now be scheduled and running

Scaling Down Behavior

To observe the Cluster Autoscaler and Horizontal Pod Autoscaler Scaling Down behavior, first I will need to reduce the number of loadgenerator container Pods by scaling it back to one Pod.

Scaling down the loadgenerator container Pod

The Cluster Autoscaler, by default, will terminate an existing Worker Node after nodes are under-utilized after 10 minutes. After 10 minutes later, the Cluster Autoscaler finally deleted an existing Worker Node by making REST API calls to the Cisco Container Platform.

Scale down logs from the Cluster Autoscaler

The number of Worker Nodes in the Cluster is reduced back to two Worker Nodes.

Cluster Autoscaler reduced the Worker Node size back to two Worker Nodes

And, the Horizontal Pod Autoscaler has also started to scale down the number of frontend container Pods, from the previous high of 10 Pods down to 9 Pods now. It will take some time for the CPU spike to quiet down before Horizontal Pod Autoscaler continues to reduce the number of frontend Pods.

Horizontal Pod Autoscaler scaling down when CPU utilization drops

Conclusion

In conclusion, Kubernetes resource management features such as the Cluster Autoscaler and Horizontal Pod Autoscaler can help us to effectively manage infrastructure resources for on-premises Clusters as well as they do on the Public Cloud Providers’ managed Kubernetes Services