Cluster Autoscaler and Horizontal Pod Autoscaler for on-premise Kubernetes Clusters
The purpose of the Cluster Autoscaler is to provision new Worker Nodes in a Kubernetes Cluster during periods of high demand when the cluster does not have sufficient resources such as CPU, memory, or GPU to run new Pods. And, during quiet periods when the cluster no longer require these resources, the Cluster Autoscaler will terminate the extra Worker Nodes. The Horizontal Pod Autoscaler works by creating additional replicas of a Pod on the Worker Node where it runs when the monitored resources (such as CPU) it consumes has exceeded the defined threshold. Similarly, the Horizontal Pod Autoscaler will delete these additional Pod replicas when the consumed resources have dropped below the defined threshold. Together, the Cluster Autoscaler and Horizontal Pod Autoscaler help us to effectively manage resources and the running costs of managed Kubernetes Services offered and supported by many Public Cloud Providers.
On-Premises Kubernetes Clusters
Now, for those who want to deploy on-premises Kubernetes Clusters in their private infrastructure, Kubernetes resource scaling feature such as the Cluster Autoscaler is not generally available. Although the Cluster Autoscaler is an upstream Kubernetes Community project, it is available mostly on Public Cloud Providers offering managed Kubernetes Services as there are Cloud Provider specific integrations required.
https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler
To make Cluster Autoscaler works for on-premises Kubernetes Clusters, I have implemented a version of the Cluster Autoscaler based on the upstream Kubernetes Community project to work on the Cisco Container Platform. The Cisco Container Platform is an on-premises Kubernetes as a Service platform based on 100% upstream Kubernetes.
Deploying On-Premises Cluster Autoscaler and Horizontal Pod Autoscaler
I will briefly talk about the integration call flows between Cluster Autoscaler with Cisco Container Platform, but will stop short of going into code walk discussion.
Below is a summary of how Cluster Autoscaler works on an on-premises Cisco Container Platform Kubernetes Cluster:
- The Cisco Container Platform supports standards based REST APIs to provision new or terminate existing Worker Node in Tenant Clusters managed by it
- The cluster monitoring logic, cluster scale up or down decisions in the upstream Cluster Autoscaler code have been preserved and unmodified
- Each scale up decision made by the Cluster Autoscaler invokes REST API calls to the Cisco Container Platform to provision a new Worker Node when conditions such as pending or unschedulable Pods is detected.
- Each scale down decision made by the Cluster Autoscaler invokes REST API calls to the Cisco Container Platform to terminate an existing Worker Node after nodes are under-utilized after 10 minutes.
The Horizontal Pod Autoscaler requires a Metric Server to be deployed on the Kubernetes Cluster to provide metrics via the resource metrics API. The Horizontal Pod Autoscaler automatically scales the number of Pods based on observed CPU utilization or custom metrics.
Below shows my Cisco Container Platform Tenant Cluster provisioned with a single Master Node and two Worker Nodes.
The Cluster Autoscaler is deployed on the Master Node in the Cluster and in the kube-system namespace. I have also specified the minimum node size of 1 and the maximum node size of 5. This means that the Cluster Autoscaler will maintain a minimum Cluster size of at least one Worker Node and it will not increase the Cluster beyond 5 Worker Nodes. Then, the Cluster Autoscaler is provided with the credentials and API token to authenticate with the Cisco Container Platform’s REST API server and make REST API calls.
The metric server required by the Horizontal Pod Autoscaler is deployed on the Cluster in the kube-system namespace.
With both the Cluster Autoscaler and the metric server required by the Horizontal Pod Autoscaler in place, I am ready to verify they are working as intended on the on-premises Kubernetes Cluster. First of all, I will need to deploy some Pods on the Cluster to get it to scale up. To meet this objective, I will use the Hipster Store microservices demo application which is located at:
I have created a single deployment YAML file to deploy the Hipster Store application onto my Cluster.
Next, I will create the Horizontal Pod Autoscaler for the frontend container Pod, which is one of the ten microservices in the Hipster Store application. In my example below, I have instructed the Horizontal Pod Autoscaler to monitor the CPU utilization of the frontend container Pod, and if it exceeds 50% utilization, it will create additional replicas of the frontend container Pod, up to the maximum of 10. Then, when the CPU utilization drops below 50%, it will reduce the number of container Pods down to the minimum count of 1 Pod.
Scaling Up Behavior
With Cluster Autoscaler and Horizontal Pod Autoscaler deployed on a Kubernetes Cluster, I expect the Horizontal Pod Autoscaler will scale up the number of frontend container Pods as its CPU load increases due to increased traffic. The Hipster Store application has a loadgenerator container Pod which generates simulated user traffic to the frontend container Pod. Immediately, the Horizontal Pod Autoscaler has detected the CPU utilization has crossed the 50% mark, and it has created new replicas of the frontend container Pod.
On the Cluster, there are 8 replicas of the frontend container Pods created by the Horizontal Pod Autoscaler
The Cluster Autoscaler has not kicked in yet at this point as the existing Cluster is still able to handle the overall load. The Cluster size at this point is still two Worker Nodes.
To trigger the Cluster Autoscaler, more load on the Cluster is required. To achieve this goal, I need additional replicas of the loadgenerator container Pod. The number of loadgenerator container Pods is increased to 5. At this point, Pending loadgenerator container Pods can be seen.
The Cluster Autoscaler has now kicked in, and it made REST API calls to the Cisco Container Platform to provision one new Worker Node into the Cluster. The Cisco Container Platform takes about less than 2 minutes to add the new Worker Node. The Cluster now has 3 Worker Nodes, and all the Pending loadgenerator container Pods are running. And, due to increased traffic load from the new loadgenerator container Pods, the Horizontal Pod Autoscaler has scaled up the number of frontend Pod replicas to the maximum of 10 Pods.
Scaling Down Behavior
To observe the Cluster Autoscaler and Horizontal Pod Autoscaler Scaling Down behavior, first I will need to reduce the number of loadgenerator container Pods by scaling it back to one Pod.
The Cluster Autoscaler, by default, will terminate an existing Worker Node after nodes are under-utilized after 10 minutes. After 10 minutes later, the Cluster Autoscaler finally deleted an existing Worker Node by making REST API calls to the Cisco Container Platform.
The number of Worker Nodes in the Cluster is reduced back to two Worker Nodes.
And, the Horizontal Pod Autoscaler has also started to scale down the number of frontend container Pods, from the previous high of 10 Pods down to 9 Pods now. It will take some time for the CPU spike to quiet down before Horizontal Pod Autoscaler continues to reduce the number of frontend Pods.
Conclusion
In conclusion, Kubernetes resource management features such as the Cluster Autoscaler and Horizontal Pod Autoscaler can help us to effectively manage infrastructure resources for on-premises Clusters as well as they do on the Public Cloud Providers’ managed Kubernetes Services