Azure Kubernetes Service (AKS) maximizes resource efficiency and scalability in cloud-based container management. DevOps teams can leverage resource optimization and autoscaling to enhance AKS performance and cost-effectiveness.
This hands-on guide demonstrates how to implement Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and Cluster Autoscaler to ensure your AKS clusters perform optimally under varying workloads.
AKS relies on three essential basic building blocks:
You typically pay based on the cloud resources you use. Optimizing resource allocation prevents over-provisioning (which wastes money) and under-provisioning (which can degrade application performance). As your application’s needs fluctuate, effective resource management enables scaling resources up or down based on demand.
Ideally, your applications would always have the optimal amount of resources precisely when needed. Autoscaling dynamically adjusts resources based on real-time demand, adding or removing nodes and adjusting Pod counts. It ensures that your applications have enough resources during high demand while saving costs during low usage periods.
Resource requests and limits in Kubernetes help you manage the CPU, memory, and other computing resources your applications need.
Resource requests specify the minimum amount of a resource that a container requires. Think of it as reserving a portion of resources to ensure that your container can start and run under normal conditions.
Resource limits define the maximum amount of a resource that a container can use. If a container surpasses this limit, Kubernetes throttles its CPU usage and may terminate it if memory consumption exceeds the limit.
These settings are critical in AKS for two main reasons:
Follow these steps to strike a balance between ensuring your containers have enough resources while avoiding over-allocation:
By providing adequate resources to your applications, you avoid sluggish performance and potential downtime. Proper limits protect your cluster from rogue containers that might otherwise monopolize resources, ensuring a stable, predictable environment.
HPA automatically adjusts the number of Pod replicas in a deployment, ReplicaSet, or StatefulSet based on observed CPU or memory use. This ensures your application maintains optimal performance during traffic spikes while conserving resources during quieter periods.
Let’s walk through the process of setting up HPA in AKS.
To follow along, ensure you have:
First, open the Azure CLI and log in to your Azure account. Use the az login command to open a login window in your browser.
Note: You can create a resource group (a container for holding related resources) to spin an AKS cluster. Use the following command in Azure CLI:
az group create --name myResourceGroup --location eastus
It creates a new resource group named “myResourceGroup” in the “East US” Azure region. Choose a distinctive name for your resource group to ensure it’s unique within Azure.
Next, create the AKS cluster:
az aks create --resource-group myResourceGroup --name myAKSCluster --node-count 2 --generate-ssh-keys
This command creates an AKS cluster named “myAKSCluster” associated with the resource group “myResourceGroup”. The --node-count 2 specifies that your cluster should have two nodes. The --generate-ssh-keys part creates SSH keys if you don’t already have them to securely connect to the cluster’s nodes.
Now, configure your Azure CLI to connect to the cluster:
az aks get-credentials --resource-group myResourceGroup --name myAKSCluster
This command fetches the access credentials for myAKSCluster, enabling your Azure CLI to interact seamlessly with that cluster.
Next, deploy a sample application (in this case, a PHP Apache server) on your Kubernetes cluster:
kubectl apply -f https://k8s.io/examples/application/php-apache.yaml
This command pulls the following predefined configuration (manifest) and sets up your application accordingly.
apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
spec:
selector:
matchLabels:
run: php-apache
template:
metadata:
labels:
run: php-apache
spec:
containers:
- name: php-apache
image: registry.k8s.io/hpa-example
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
requests:
cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
name: php-apache
labels:
run: php-apache
spec:
ports:
- port: 80
selector:
run: php-apache
The manifest is a YAML file that describes your application, including the following:
Use the following command to confirm your Pods are running as expected:
kubectl get pods
Next, set up an HPA to automatically increase or decrease the number of Pods running your application based on CPU usage:
kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
The command creates the HPA, specifying it should maintain a CPU usage of 50% across all Pods and can scale between 1 and 10 Pods.
Now, run:
kubectl describe hpa php-apache
You should get the following output:
Reference: Deployment/php-apacheTarget CPU utilization: 50%Current CPU utilization: 10%Min replicas: 1Max replicas: 10Deployment pods: 1 current / 10 desired
To confirm the HPA works, use the following command to run a temporary Pod that generates loads to your PHP Apache server.
kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
Run the command below in a separate terminal to track the number of Pods changing in response to the traffic.
kubectl get pods --watch
You should get an output like this:
NAME READY STATUS RESTARTS AGEload-generator 1/1 Running 0 12sphp-apache-89bfc85bb-67xzk 1/1 Running 0 8m5sphp-apache-89bfc85bb-bvzj5 1/1 Running 0 11sphp-apache-89bfc85bb-j2bsw 0/1 Pending 0 0sphp-apache-89bfc85bb-tgz4z 0/1 Pending 0 0sphp-apache-89bfc85bb-j2bsw 0/1 Pending 0 0sphp-apache-89bfc85bb-tgz4z 0/1 Pending 0 0sphp-apache-89bfc85bb-j2bsw 0/1 ContainerCreating 0 0sphp-apache-89bfc85bb-tgz4z 0/1 ContainerCreating 0 0sphp-apache-89bfc85bb-j2bsw 1/1 Running 0 2sphp-apache-89bfc85bb-tgz4z 1/1 Running 0 2sphp-apache-89bfc85bb-kbr5l 0/1 Pending 0 0sphp-apache-89bfc85bb-fh7n5 0/1 Pending 0 0sphp-apache-89bfc85bb-kbr5l 0/1 Pending 0 0sphp-apache-89bfc85bb-fh7n5 0/1 Pending 0 0sphp-apache-89bfc85bb-kbr5l 0/1 ContainerCreating 0 0sphp-apache-89bfc85bb-fh7n5 0/1 ContainerCreating 0 0sphp-apache-89bfc85bb-kbr5l 1/1 Running 0 1sphp-apache-89bfc85bb-fh7n5 1/1 Running 0 2s
While generating the load, observe how the Pods scale to handle it.
Run the following command to check HPA again:
kubectl describe hpa php-apache
You should get output similar to the following, with more deployed Pods:
Reference: Deployment/php-apacheTarget CPU utilization: 50%Current CPU utilization: 20%Min replicas: 1Max replicas: 10Deployment pods: 10 current / 10 desired
Press <Ctrl> + C to stop generating load after a few minutes. The Pods scale down since they no longer need the extra resources.
Now, check HPA again. You should get output like the following, with fewer deployed Pods.
Reference: Deployment/php-apacheTarget CPU utilization: 50%Current CPU utilization: 10%Min replicas: 1Max replicas: 10Deployment pods: 1 current / 10 desired
After the test, use the following commands to remove the HPA, the PHP Apache deployment, and the created service.
kubectl delete hpa php-apache
kubectl delete deployments.apps php-apache
kubectl delete service php-apache
VPA optimizes the resource allocation within individual Pods. Unlike HPA, which adjusts the number of Pod replicas, VPA fine-tunes the individual Pod’s CPU and memory limits and requests based on usage trends and requirements. This approach ensures that Pods receive precisely the resources they need.
Let’s continue with our previous example to explore implementing VPA to optimize the application’s performance.
Run the command below to enable the Vertical Pod Autoscaler feature in your AKS cluster.
az aks update --resource-group myResourceGroup --name myAKSCluster --enable-cluster-autoscaler --min-count 1 --max-count 3
This action activates the VPA feature on myAKSCluster in myResourceGroup. You should get the following output:
vpa-admission-controller-7f7644f998-2fsfv 1/1 Running 0 85svpa-admission-controller-7f7644f998-vl6hx 1/1 Running 0 85svpa-recommender-85b7594bff-npbhm 1/1 Running 0 85svpa-updater-684c549c84-lgg5p 1/1 Running 0 85s
After enabling VPA, new components start running in your cluster. These components include vpa-admission-controller, vpa-recommender, and vpa-updater. They monitor your application’s resource usage and adjust its resource requests accordingly.
Now, deploy a sample application (named hamster) using a Kubernetes manifest file. Run the following command:
kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/vertical-pod-autoscaler/examples/hamster.yaml
Next, use the following command to check the status of your application’s Pods.
kubectl get pods
You should get output like this indicating the hamster Pods are running:
NAME READY STATUS RESTARTS AGEhamster-8688cd95f9-kqscz 1/1 Running 0 19shamster-8688cd95f9-n8zf2 1/1 Running 0 19s
Use the command kubectl describe vpa/hamster-vpa to view the VPA’s recommendations for the hamster application. These recommendations include how much CPU and memory the application should ideally use (Target), as well as the minimum (Lower Bound) and maximum (Upper Bound) recommended limits.
That command might have an output like below:
Recommendation:
Container Recommendations:
Container Name: hamster
Lower Bound:
Cpu: 100m
Memory: 50Mi
Target:
Cpu: 100m
Memory: 50Mi
Uncapped Target:
Cpu: 1m
Memory: 5242880
Upper Bound:
Cpu: 100m
Memory: 50Mi
The VPA adjusts your Pods’ resource requests based on actual usage after a few minutes. Use the command below to watch these changes in real time.
kubectl get pods --watch
The output might look like the following.
NAME READY STATUS RESTARTS AGEhamster-8688cd95f9-5tm9x 1/1 Running 0 27shamster-8688cd95f9-kqscz 1/1 Running 0 3m53shamster-8688cd95f9-n8zf2 1/1 Terminating 0 3m53s
After VPA makes adjustments, use the following command to check the VPA status again.
kubectl describe vpa/hamster-vpa
The Target values should now be different, reflecting the new resource requests based on the application’s needs, like the output below.
Recommendation:
Container Recommendations:
Container Name: hamster
Lower Bound:
Cpu: 355m
Memory: 50Mi
Target:
Cpu: 587m
Memory: 50Mi
Uncapped Target:
Cpu: 587m
Memory: 11500k
Upper Bound:
Cpu: 1
Memory: 500Mi
When you use kubectl describe pods to describe the Pods, you’ll notice that the resource requests for CPU and memory match VPA’s new targets:
Requests: cpu: 587m memory: 50Mi
Effective node scaling is essential for ensuring optimal performance and cost efficiency in your AKS cluster. Cluster Autoscaler in AKS dynamically adjusts the number of nodes in your cluster.
When demand increases, it automatically adds nodes so your applications have the necessary resources. Conversely, when demand drops, it reduces the number of nodes to cut costs. This automatic scaling is essential for handling varying workloads efficiently.
Let’s walk through how to enable and configure Cluster Autoscaler in your AKS cluster.
If you already have an AKS cluster running, use the following command to enable Cluster Autoscaler.
az aks update --resource-group myResourceGroup --name myAKSCluster --enable-cluster-autoscaler --min-count 1 --max-count 3
This command updates your existing AKS cluster (myAKSCluster) in the resource group myResourceGroup. The flag --enable-cluster-autoscaler turns on the autoscaling feature, while --min-count 1 and --max-count 3 set the minimum and maximum number of nodes. In this case, your cluster can scale down to 1 node and scale up to 3 nodes as needed.
After setting up your cluster, you can view and manage it in the Azure portal. Log in, then go to Kubernetes Services to find your cluster (myAKSCluster).
Click on myAKSCluster. Go to Settings, then Node Pools. You’ll find details about your node pool named nodepool1.
Click on nodepool1 to view its configuration details, including autoscaling settings. The Scale method is Autoscale.
Efficient resource management and autoscaling are crucial for optimizing performance, ensuring availability, and controlling costs. The following best practices help you maintain a well-balanced AKS environment:
Setting appropriate resource requests and limits is crucial for ensuring optimal performance of your applications. HPA, VPA, and Cluster Autoscaler empower your AKS clusters to respond dynamically to workload changes and handle peak loads, reducing costs during low demand.
Continuous monitoring and regular adjustments ensure your cluster remains aligned with your application needs and operational objectives. These practices make your AKS clusters resilient, performant, and cost-effective. You’ll experience fewer downtimes, better resource use, and an overall smoother operational experience.
Want to get a complete understanding of what’s going on inside your AKS clusters? Try Site24x7 for free to gain deeper insights into your AKS clusters’ performance, resource usage, and more.
Write for Site24x7 is a special writing program that supports writers who create content for Site24x7 “Learn” portal. Get paid for your writing.
Apply Now