Recently, while catching up on the latest AKS release notes, I spotted a brand-new preview feature that promises to trim unnecessary CPU and memory usage from system add-ons: cost optimised add-on scaling.
In a nutshell, Microsoft has wired the managed Vertical Pod Autoscaler (VPA) into key AKS add-ons (think CoreDNS, Workload Identity, Image Integrity, Retina, and more) and given us knobs to fine-tune their resource profiles. In today’s post I’ll show you how to:
- Turn the feature on (or off) in a new or existing cluster.
- Customise default CPU / memory requests & limits.
- Adjust VPA policies such as min / max and update mode.
- Troubleshoot the most common “why is my pod Pending?” questions.
What exactly is Cost-Optimised Add-on Scaling?
First, let’s set the scene.
When you switch on the feature AKS does three things behind the scenes:
- Installs the managed VPA add-on (three pods called
vpa-admission-controller
,vpa-recommender
, andvpa-updater
). - Creates a VPA custom resource for each supported add-on so AKS can deliver CPU and memory recommendations, or even apply them automatically.
- Lets you override the defaults by adding a few simple annotations to the Deployment/DaemonSet or the VPA itself.
Supported AKS add-ons (first wave)
Add-on | Enablement behaviour | VPA CR name(s) | Quick validation |
---|---|---|---|
CoreDNS | On by default for new clusters | coredns |
kubectl get vpa coredns -n kube-system |
Workload Identity | Manually enable | azure-wi-webhook-controller-manager |
kubectl get vpa azure-wi-webhook-controller-manager -n kube-system |
Image Integrity | Manually enable | ratify |
kubectl get vpa ratify -n gatekeeper-system |
Network Observability (Retina) | Manually enable | retina-agent , retina-operator |
kubectl get vpa retina-agent -n kube-system |
Expect more add-ons to join this list as the preview matures.
Understanding VPA modes in real life
Before you flip any switches, it’s worth pausing to decide how the Vertical Pod Autoscaler should behave once it has a recommendation.
Think of the three modes as a sliding scale between hands-off advisor and fully automatic mechanic:
Mode | What it does | Typical use-case |
---|---|---|
Off | VPA gathers metrics and writes suggestions into the CR status field but never touches your pods. | Production clusters that need a strict change-control gate or GitOps-driven updates. Also handy when you’re just benchmarking to understand baseline usage. |
Initial (default) |
VPA applies the recommendation only when the pod restarts for another reason (image update, node drain, manual delete, etc.). VPA itself will not trigger that restart. | Teams that want a low-risk first step—no surprise restarts—but still want improved requests/limits over time. Works well alongside rolling deployments run by your CI/CD pipeline. |
Auto | VPA acts immediately: it edits the pod template, which in turn causes Kubernetes to recreate the pod with the new requests/limits. This can happen at any time if utilisation drifts. | Development, performance-testing or cost-lab environments where you’re comfortable with automated restarts in exchange for maximum resource efficiency. |
Why does the mode matter?
• Operational blast-radius: Auto can restart critical add-ons at 3 a.m. if usage spikes—great for savings, possibly bad for pagers.
• Auditability: Off keeps a clear paper-trail of recommendations you can review, discuss in a CAB meeting, and commit through GitOps.
• Speed of benefit: Initial captures the “easy wins” (pods update the next time you deploy anyway) without new moving parts.
Rule of thumb:
Start with Off in prod to collect data, graduate to Initial once you’re confident in the numbers, and save Auto for clusters where a sudden, automated restart won’t ruin anyone’s day.
And remember—whatever mode you choose today isn’t permanent. A single kubectl patch vpa
can promote or demote an add-on at any time, so adjust as your change-management posture evolves.
Prerequisites
You’ll need an AKS 1.25 + cluster, the Azure CLI 2.60.0 or newer, and the aks-preview
extension. You also need enough headroom on the system node pool (or the cluster autoscaler enabled) because VPA’s decisions are only as good as the nodes beneath them.
Install / Update the aks-preview Extension
Before we can flip any preview bits we need the extension. Run the command below in Cloud Shell or your favourite terminal.
# Install – or upgrade if you already have it az extension add —name aks–preview —upgrade |
Once the command completes, the latest preview features (including our shiny cost-optimised scaling flag) will be available to az aks
.
Register the Preview Feature
Azure features hide behind subscription-level flags. Let’s register AKS-AddonAutoscalingPreview
now:
az feature register \ —namespace Microsoft.ContainerService \ —name AKS–AddonAutoscalingPreview |
Keep hitting the same command until the state flips to Registered. Afterwards, refresh the provider so your subscription knows about the new capability:
az provider register —namespace Microsoft.ContainerService |
Thats it! It is now enabled.
Enable Cost-Optimised Scaling on Your Cluster
Creating a brand-new cluster
If you’re spinning up fresh infrastructure, simply pass one extra flag to az aks create
:
az aks create \ —resource–group $RG \ —name $CLUSTER \ —enable–optimized–addon–scaling |
Azure will provision the cluster, deploy VPA, and restart any add-on pods that need resizing. CoreDNS is clever enough to roll without downtime.
Turning it on for an existing cluster
Already have a cluster ticking along nicely? No problem, update in place:
az aks update \ —resource–group $RG \ —name $CLUSTER \ —enable–optimized–addon–scaling |
Expect VPA components to appear in the kube-system
namespace and selected add-on pods to bounce once.
When the command returns, your add-ons are officially under VPA’s watchful eye.
Verifying the VPA Pods
Let’s confirm everything spun up correctly. First, list the three VPA system pods:
kubectl get pods –n kube–system | grep vpa |
You should see vpa-admission-controller
, vpa-recommender
, and vpa-updater
all in the Running state. If any are pending, check your node capacity or the pod events for hints.
Applying the VPA-recommended values manually
Use the flow below whenever you want VPA’s suggested CPU / memory numbers to take effect immediately, rather than waiting for the next rollout.
Let’s have a look at what CoreDNS is currently using:
echo –e “CONTAINER\tREQ_CPU\tREQ_MEM\tLIM_CPU\tLIM_MEM” && \ kubectl get pod –n kube–system –l k8s–app=kube–dns –o json | jq –r ‘.items[].spec.containers[] | [.name, .resources.requests.cpu, .resources.requests.memory, .resources.limits.cpu, .resources.limits.memory] | @tsv‘ |
This should give you a nice table that looks like:
CONTAINER REQ_CPU REQ_MEM LIM_CPU LIM_MEM coredns 100m 70Mi 3 500Mi coredns 100m 70Mi 3 500Mi coredns 100m 70Mi 3 500Mi coredns 100m 70Mi 3 500Mi coredns 100m 70Mi 3 500Mi |
Next, have a peek at the VPA recommendations for CoreDNS:
kubectl get vpa coredns –n kube–system |
You’ll get an output similar to this:
NAME MODE CPU MEM PROVIDED AGE coredns Initial 11m 23574998 True 44m |
That table tells us VPA thinks CoreDNS only needs ~11 millicores of CPU and ~24 MB of RAM. Not bad!
By default, the VPA sits in Initial mode, which means the new numbers apply when the pod restarts. If you’d like them right away, simply delete the pod and let the ReplicaSet recreate it:
kubectl delete pod <coredns–pod–name> –n kube–system |
Give it a few seconds and the replacement pod will come up with the trimmed-down requests.
Customising Requests, Limits, Min/Max and Update Mode
Sometimes you need tighter control than “whatever VPA says”. AKS exposes three annotations to let you override specific behaviours by setting the values to enabled
or disabled
:
kubernetes.azure.com/override-requests-limits
– add to the Deployment or DaemonSet if you wish to edit CPU/memory requests or limits.kubernetes.azure.com/override-min-max
– add to the VPA resource itself to tweakminAllowed
ormaxAllowed
.kubernetes.azure.com/override-update-mode
– add to the VPA resource to switch between Off, Initial, or Auto.
Bumping CoreDNS Requests & Limits in the Deployment
You’ve profiled CoreDNS and discovered it occasionally bursts but rarely exceeds three vCPUs or 500 MiB. Shrink the default requests while keeping sensible limits by enabling the override on the Deployment:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
apiVersion: apps/v1 kind: Deployment metadata: name: coredns namespace: kube–system annotations: kubernetes.azure.com/override–requests–limits: “enabled” spec: replicas: 2 selector: matchLabels: k8s–app: kube–dns template: metadata: labels: k8s–app: kube–dns spec: containers: – name: coredns image: mcr.microsoft.com/oss/coredns/coredns:1.11.2 resources: requests: cpu: “100m” memory: “70Mi” limits: cpu: “3” memory: “500Mi” |
Once applied, AKS stops reconciling these values, and a pod restart picks up the new requests straight away.
Setting Hard Min/Max Guard-Rails in the VPA
Prefer to keep automatic scaling but impose upper and lower bounds? Add the min/max override to the VPA resource itself:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
apiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata: name: coredns namespace: kube–system annotations: kubernetes.azure.com/override–min–max: “enabled” spec: resourcePolicy: containerPolicies: – containerName: coredns minAllowed: cpu: 10m memory: 10Mi maxAllowed: cpu: 3 memory: 500Mi updatePolicy: updateMode: “Initial” |
VPA keeps producing fresh recommendations but always clamps them inside the range you set.
Turning VPA Off for a Specific Add-on
Need recommendations only, with zero automatic changes? Enable the update-mode override and set the mode to Off:
apiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata: name: coredns namespace: kube–system annotations: kubernetes.azure.com/override–update–mode: “enabled” spec: updatePolicy: updateMode: “Off” |
The VPA still records usage and suggests values, but it will never modify the running pod, handy for change-control windows or live debugging.
Disabling Cost-Optimised Scaling
Need to roll everything back? You can turn the feature off while leaving the VPA add-on in place:
az aks update \ —resource–group $RG \ —name $CLUSTER \ —disable–optimized–addon–scaling |
If you also want to remove VPA entirely, follow the official docs to disable the VPA add-on afterwards.
Troubleshooting Pointers
- If VPA pods are pending, double-check your system node pool has free CPU and memory, or enable the cluster autoscaler.
- When add-on pods stick in Pending, look for complaints about insufficient resources (
kubectl describe pod
). Either raise node capacity or lower VPA min/max. - No recommendations showing? Tail the
vpa-recommender
logs, it usually spills helpful errors.
Wrapping up
Microsoft has been on a mission to slice AKS operating costs, and Cost-Optimised Add-on Scaling is a smart move in that direction. By letting VPA trim away wasted CPU and memory,or by giving you explicit knobs to set hard caps, the feature can free up node capacity for your real workloads.
Try it today in a dev cluster, keep an eye on pod events, and let me know over on Twitter (@Pixel_Robots) how much headroom you claw back. Until next time, happy kubing kube-cost-optimising!