Optimizing resource usage and implementing autoscaling in AKS clusters

Azure Kubernetes Service (AKS) maximizes resource efficiency and scalability in cloud-based container management. DevOps teams can leverage resource optimization and autoscaling to enhance AKS performance and cost-effectiveness.

This hands-on guide demonstrates how to implement Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and Cluster Autoscaler to ensure your AKS clusters perform optimally under varying workloads.

Primer on AKS resource management

AKS relies on three essential basic building blocks:

  • Nodes: Nodes are your AKS cluster’s workhorses. They’re the physical or virtual machines that run your applications. Each node has a specific amount of CPU, memory, and storage resources.
  • Pods: Pods are the smallest deployable units in Kubernetes and AKS. They can contain one or more containers (separate compartments within a Pod), with each container hosting your applications.
  • Container resources: You can allocate precise amounts of CPU and memory resources to individual containers, ensuring efficient operation.

You typically pay based on the cloud resources you use. Optimizing resource allocation prevents over-provisioning (which wastes money) and under-provisioning (which can degrade application performance). As your application’s needs fluctuate, effective resource management enables scaling resources up or down based on demand.

Ideally, your applications would always have the optimal amount of resources precisely when needed. Autoscaling dynamically adjusts resources based on real-time demand, adding or removing nodes and adjusting Pod counts. It ensures that your applications have enough resources during high demand while saving costs during low usage periods.

Understanding resource requests and limits

Resource requests and limits in Kubernetes help you manage the CPU, memory, and other computing resources your applications need.

Resource requests specify the minimum amount of a resource that a container requires. Think of it as reserving a portion of resources to ensure that your container can start and run under normal conditions.

Resource limits define the maximum amount of a resource that a container can use. If a container surpasses this limit, Kubernetes throttles its CPU usage and may terminate it if memory consumption exceeds the limit.

These settings are critical in AKS for two main reasons:

  • Ensuring Pod scheduling: AKS uses these requests to determine where to place Pods. It ensures that each node has enough resources to meet its Pods’ demands.
  • Preventing resource starvation: Setting limits prevents any single container from using more than its share of resources—preventing it from impacting other containers in the cluster.

Configuring resource requests and limits

Follow these steps to strike a balance between ensuring your containers have enough resources while avoiding over-allocation:

  • Analyze your application’s needs: Understand your application’s typical resource usage. Use monitoring tools to gather data on this CPU and memory usage.
  • Set reasonable defaults: To begin, set default requests and limits for your containers. You can do this in your Pod configuration’s container specification section.
  • Use namespace defaults: Set default resource requests and limits at the namespace level in AKS for broader control. This approach ensures that all Pods within the namespace adhere to these guidelines unless overridden.
  • Adjust as necessary: Continuously monitor your application, adjusting its requests and limits to optimize performance. Remember, these settings aren’t permanent and may need fine-tuning as your application evolves.

By providing adequate resources to your applications, you avoid sluggish performance and potential downtime. Proper limits protect your cluster from rogue containers that might otherwise monopolize resources, ensuring a stable, predictable environment.

Implementing Horizontal Pod Autoscaler in AKS

HPA automatically adjusts the number of Pod replicas in a deployment, ReplicaSet, or StatefulSet based on observed CPU or memory use. This ensures your application maintains optimal performance during traffic spikes while conserving resources during quieter periods.

Let’s walk through the process of setting up HPA in AKS.

Prerequisites

To follow along, ensure you have:

  • An Azure account
  • An Azure CLI installed on your machine
  • kubectl CLI installed on your machine
  • A running AKS cluster

Create a resource group and cluster

First, open the Azure CLI and log in to your Azure account. Use the az login command to open a login window in your browser.

Note: You can create a resource group (a container for holding related resources) to spin an AKS cluster. Use the following command in Azure CLI:

az group create --name myResourceGroup --location eastus

It creates a new resource group named “myResourceGroup” in the “East US” Azure region. Choose a distinctive name for your resource group to ensure it’s unique within Azure.

Next, create the AKS cluster:

az aks create --resource-group myResourceGroup --name myAKSCluster --node-count 2 --generate-ssh-keys

This command creates an AKS cluster named “myAKSCluster” associated with the resource group “myResourceGroup”. The --node-count 2 specifies that your cluster should have two nodes. The --generate-ssh-keys part creates SSH keys if you don’t already have them to securely connect to the cluster’s nodes.

Now, configure your Azure CLI to connect to the cluster:

az aks get-credentials --resource-group myResourceGroup --name myAKSCluster

This command fetches the access credentials for myAKSCluster, enabling your Azure CLI to interact seamlessly with that cluster.

Deploy a test application

Next, deploy a sample application (in this case, a PHP Apache server) on your Kubernetes cluster:

kubectl apply -f https://k8s.io/examples/application/php-apache.yaml

This command pulls the following predefined configuration (manifest) and sets up your application accordingly.

apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
spec:
selector:
matchLabels:
run: php-apache
template:
metadata:
labels:
run: php-apache
spec:
containers:
- name: php-apache
image: registry.k8s.io/hpa-example
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
requests:
cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
name: php-apache
labels:
run: php-apache
spec:
ports:
- port: 80
selector:
run: php-apache

The manifest is a YAML file that describes your application, including the following:

  • A deployment: Sets up the PHP Apache server. It specifies the Docker image, the number of CPU resources, and which port the server should listen on.
  • A service: Creates a way to access your PHP Apache server from outside the Kubernetes cluster

Use the following command to confirm your Pods are running as expected:

kubectl get pods

Create a Horizontal Pod Autoscaler

Next, set up an HPA to automatically increase or decrease the number of Pods running your application based on CPU usage:

kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10

The command creates the HPA, specifying it should maintain a CPU usage of 50% across all Pods and can scale between 1 and 10 Pods.

Now, run:

kubectl describe hpa php-apache

You should get the following output:

Reference:                Deployment/php-apacheTarget CPU utilization:   50%Current CPU utilization:  10%Min replicas:             1Max replicas:             10Deployment pods:          1 current / 10 desired

To confirm the HPA works, use the following command to run a temporary Pod that generates loads to your PHP Apache server.

kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"

Run the command below in a separate terminal to track the number of Pods changing in response to the traffic.

kubectl get pods --watch


You should get an output like this:

NAME                         READY   STATUS    RESTARTS   AGEload-generator               1/1     Running   0          12sphp-apache-89bfc85bb-67xzk   1/1     Running   0          8m5sphp-apache-89bfc85bb-bvzj5   1/1     Running   0          11sphp-apache-89bfc85bb-j2bsw   0/1     Pending   0          0sphp-apache-89bfc85bb-tgz4z   0/1     Pending   0          0sphp-apache-89bfc85bb-j2bsw   0/1     Pending   0          0sphp-apache-89bfc85bb-tgz4z   0/1     Pending   0          0sphp-apache-89bfc85bb-j2bsw   0/1     ContainerCreating   0          0sphp-apache-89bfc85bb-tgz4z   0/1     ContainerCreating   0          0sphp-apache-89bfc85bb-j2bsw   1/1     Running             0          2sphp-apache-89bfc85bb-tgz4z   1/1     Running             0          2sphp-apache-89bfc85bb-kbr5l   0/1     Pending             0          0sphp-apache-89bfc85bb-fh7n5   0/1     Pending             0          0sphp-apache-89bfc85bb-kbr5l   0/1     Pending             0          0sphp-apache-89bfc85bb-fh7n5   0/1     Pending             0          0sphp-apache-89bfc85bb-kbr5l   0/1     ContainerCreating   0          0sphp-apache-89bfc85bb-fh7n5   0/1     ContainerCreating   0          0sphp-apache-89bfc85bb-kbr5l   1/1     Running             0          1sphp-apache-89bfc85bb-fh7n5   1/1     Running             0          2s

While generating the load, observe how the Pods scale to handle it.

Run the following command to check HPA again:

kubectl describe hpa php-apache

You should get output similar to the following, with more deployed Pods:

Reference:                Deployment/php-apacheTarget CPU utilization:   50%Current CPU utilization:  20%Min replicas:             1Max replicas:             10Deployment pods:          10 current / 10 desired

Press <Ctrl> + C to stop generating load after a few minutes. The Pods scale down since they no longer need the extra resources.

Now, check HPA again. You should get output like the following, with fewer deployed Pods.

Reference:                Deployment/php-apacheTarget CPU utilization:   50%Current CPU utilization:  10%Min replicas:             1Max replicas:             10Deployment pods:          1 current / 10 desired

After the test, use the following commands to remove the HPA, the PHP Apache deployment, and the created service.

kubectl delete hpa php-apache
kubectl delete deployments.apps php-apache
kubectl delete service php-apache

Configuring Vertical Pod Autoscaler in AKS

VPA optimizes the resource allocation within individual Pods. Unlike HPA, which adjusts the number of Pod replicas, VPA fine-tunes the individual Pod’s CPU and memory limits and requests based on usage trends and requirements. This approach ensures that Pods receive precisely the resources they need.

Set up VPA in AKS

Let’s continue with our previous example to explore implementing VPA to optimize the application’s performance.

Run the command below to enable the Vertical Pod Autoscaler feature in your AKS cluster.

az aks update --resource-group myResourceGroup --name myAKSCluster --enable-cluster-autoscaler --min-count 1 --max-count 3

This action activates the VPA feature on myAKSCluster in myResourceGroup. You should get the following output:

vpa-admission-controller-7f7644f998-2fsfv   1/1     Running   0          85svpa-admission-controller-7f7644f998-vl6hx   1/1     Running   0          85svpa-recommender-85b7594bff-npbhm            1/1     Running   0          85svpa-updater-684c549c84-lgg5p                1/1     Running   0          85s

After enabling VPA, new components start running in your cluster. These components include vpa-admission-controller, vpa-recommender, and vpa-updater. They monitor your application’s resource usage and adjust its resource requests accordingly.

Deploy a test application

Now, deploy a sample application (named hamster) using a Kubernetes manifest file. Run the following command:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/vertical-pod-autoscaler/examples/hamster.yaml

Next, use the following command to check the status of your application’s Pods.

kubectl get pods

You should get output like this indicating the hamster Pods are running:

NAME                       READY   STATUS    RESTARTS   AGEhamster-8688cd95f9-kqscz   1/1     Running   0          19shamster-8688cd95f9-n8zf2   1/1     Running   0          19s

Use the command kubectl describe vpa/hamster-vpa to view the VPA’s recommendations for the hamster application. These recommendations include how much CPU and memory the application should ideally use (Target), as well as the minimum (Lower Bound) and maximum (Upper Bound) recommended limits.

That command might have an output like below:

Recommendation:
Container Recommendations:
Container Name: hamster
Lower Bound:
Cpu: 100m
Memory: 50Mi
Target:
Cpu: 100m
Memory: 50Mi
Uncapped Target:
Cpu: 1m
Memory: 5242880
Upper Bound:
Cpu: 100m
Memory: 50Mi

The VPA adjusts your Pods’ resource requests based on actual usage after a few minutes. Use the command below to watch these changes in real time.

kubectl get pods --watch

The output might look like the following.

NAME                       READY   STATUS        RESTARTS   AGEhamster-8688cd95f9-5tm9x   1/1     Running       0          27shamster-8688cd95f9-kqscz   1/1     Running       0          3m53shamster-8688cd95f9-n8zf2   1/1     Terminating   0          3m53s

After VPA makes adjustments, use the following command to check the VPA status again.

kubectl describe vpa/hamster-vpa

The Target values should now be different, reflecting the new resource requests based on the application’s needs, like the output below.

Recommendation:
Container Recommendations:
Container Name: hamster
Lower Bound:
Cpu: 355m
Memory: 50Mi
Target:
Cpu: 587m
Memory: 50Mi
Uncapped Target:
Cpu: 587m
Memory: 11500k
Upper Bound:
Cpu: 1
Memory: 500Mi

When you use kubectl describe pods to describe the Pods, you’ll notice that the resource requests for CPU and memory match VPA’s new targets:

 Requests:      cpu:        587m      memory:     50Mi

Using Cluster Autoscaler for effective node scaling

Effective node scaling is essential for ensuring optimal performance and cost efficiency in your AKS cluster. Cluster Autoscaler in AKS dynamically adjusts the number of nodes in your cluster.

When demand increases, it automatically adds nodes so your applications have the necessary resources. Conversely, when demand drops, it reduces the number of nodes to cut costs. This automatic scaling is essential for handling varying workloads efficiently.

Enabling and configuring Cluster Autoscaler in AKS

Let’s walk through how to enable and configure Cluster Autoscaler in your AKS cluster.

If you already have an AKS cluster running, use the following command to enable Cluster Autoscaler.

az aks update --resource-group myResourceGroup --name myAKSCluster --enable-cluster-autoscaler --min-count 1 --max-count 3

This command updates your existing AKS cluster (myAKSCluster) in the resource group myResourceGroup. The flag --enable-cluster-autoscaler turns on the autoscaling feature, while --min-count 1 and --max-count 3 set the minimum and maximum number of nodes. In this case, your cluster can scale down to 1 node and scale up to 3 nodes as needed.

Accessing the cluster in the Azure portal

After setting up your cluster, you can view and manage it in the Azure portal. Log in, then go to Kubernetes Services to find your cluster (myAKSCluster).

List of Kubernetes Services with myAKSCluster type and its detailsFig. 1: List of Kubernetes Services with myAKSCluster type and its details

Click on myAKSCluster. Go to Settings, then Node Pools. You’ll find details about your node pool named nodepool1.

myAKSCluster Settings including Node poolsFig. 2: myAKSCluster Settings including Node pools

Click on nodepool1 to view its configuration details, including autoscaling settings. The Scale method is Autoscale.

Autoscaling settings and other configuration detailsFig. 3: Autoscaling settings and other configuration details

Best practices for resource optimization and autoscaling in AKS

Efficient resource management and autoscaling are crucial for optimizing performance, ensuring availability, and controlling costs. The following best practices help you maintain a well-balanced AKS environment:

  • Regular monitoring and adjustments: Continuously monitor resource usage and performance metrics. Adjust resource requests, limits, and autoscaler parameters based on observed data.
  • Balanced resource allocation: Set appropriate CPU and memory requests and limits for Pods to avoid overuse and underuse. Balance allocating enough resources for performance while avoiding over-provisioning and over-spending.
  • Effective use of autoscalers: Implement HPA for scaling Pods based on CPU and memory usage. Use VPA to adjust Pod resource requests. Configure Cluster Autoscaler to scale nodes based on workload demands.
  • Anticipate and plan for workload changes: Track seasonal trends or expected traffic spikes and adjust scaling policies accordingly. Use predictive scaling strategies if applicable.
  • Leverage Azure monitoring tools: Use Azure Monitor and AKS diagnostics to perform comprehensive monitoring. Or, leverage Site24X7 Kubernetes Monitoring for deeper insights into your Kubernetes performance via a single, intuitive interface.

Conclusion

Setting appropriate resource requests and limits is crucial for ensuring optimal performance of your applications. HPA, VPA, and Cluster Autoscaler empower your AKS clusters to respond dynamically to workload changes and handle peak loads, reducing costs during low demand.

Continuous monitoring and regular adjustments ensure your cluster remains aligned with your application needs and operational objectives. These practices make your AKS clusters resilient, performant, and cost-effective. You’ll experience fewer downtimes, better resource use, and an overall smoother operational experience.

Want to get a complete understanding of what’s going on inside your AKS clusters? Try Site24x7 for free to gain deeper insights into your AKS clusters’ performance, resource usage, and more.

Was this article helpful?

Related Articles

Write For Us

Write for Site24x7 is a special writing program that supports writers who create content for Site24x7 "Learn" portal. Get paid for your writing.

Facing infrastructure performance issues?
  • Get complete visibility into on-prem and cloud systems
  • Identify resource spikes and prevent outages
  • Automate alerting and incident response workflows
  • Optimize capacity planning with predictive analytics
Write For Us

Write for Site24x7 is a special writing program that supports writers who create content for Site24x7 “Learn” portal. Get paid for your writing.

Apply Now
Write For Us