Cost-Optimised Add-on Scaling in AKS: Right-Size Your System Add-ons (Preview)

Reading Time: 6 minutes

Recently, while catching up on the latest AKS release notes, I spotted a brand-new preview feature that promises to trim unnecessary CPU and memory usage from system add-ons: cost optimised add-on scaling.
In a nutshell, Microsoft has wired the managed Vertical Pod Autoscaler (VPA) into key AKS add-ons (think CoreDNS, Workload Identity, Image Integrity, Retina, and more) and given us knobs to fine-tune their resource profiles. In today’s post I’ll show you how to:

Turn the feature on (or off) in a new or existing cluster.
Customise default CPU / memory requests & limits.
Adjust VPA policies such as min / max and update mode.
Troubleshoot the most common “why is my pod Pending?” questions.

Table of Contents

What exactly is Cost-Optimised Add-on Scaling?

First, let’s set the scene.

When you switch on the feature AKS does three things behind the scenes:

Installs the managed VPA add-on (three pods called vpa-admission-controller, vpa-recommender, and vpa-updater).
Creates a VPA custom resource for each supported add-on so AKS can deliver CPU and memory recommendations, or even apply them automatically.
Lets you override the defaults by adding a few simple annotations to the Deployment/DaemonSet or the VPA itself.

Supported AKS add-ons (first wave)

Add-on	Enablement behaviour	VPA CR name(s)	Quick validation
CoreDNS	On by default for new clusters	`coredns`	`kubectl get vpa coredns -n kube-system`
Workload Identity	Manually enable	`azure-wi-webhook-controller-manager`	`kubectl get vpa azure-wi-webhook-controller-manager -n kube-system`
Image Integrity	Manually enable	`ratify`	`kubectl get vpa ratify -n gatekeeper-system`
Network Observability (Retina)	Manually enable	`retina-agent`, `retina-operator`	`kubectl get vpa retina-agent -n kube-system`

Expect more add-ons to join this list as the preview matures.

Understanding VPA modes in real life

Before you flip any switches, it’s worth pausing to decide how the Vertical Pod Autoscaler should behave once it has a recommendation.
Think of the three modes as a sliding scale between hands-off advisor and fully automatic mechanic:

Mode	What it does	Typical use-case
Off	VPA gathers metrics and writes suggestions into the CR status field but never touches your pods.	Production clusters that need a strict change-control gate or GitOps-driven updates. Also handy when you’re just benchmarking to understand baseline usage.
Initial (default)	VPA applies the recommendation only when the pod restarts for another reason (image update, node drain, manual delete, etc.). VPA itself will not trigger that restart.	Teams that want a low-risk first step—no surprise restarts—but still want improved requests/limits over time. Works well alongside rolling deployments run by your CI/CD pipeline.
Auto	VPA acts immediately: it edits the pod template, which in turn causes Kubernetes to recreate the pod with the new requests/limits. This can happen at any time if utilisation drifts.	Development, performance-testing or cost-lab environments where you’re comfortable with automated restarts in exchange for maximum resource efficiency.

Why does the mode matter?
• Operational blast-radius: Auto can restart critical add-ons at 3 a.m. if usage spikes—great for savings, possibly bad for pagers.
• Auditability: Off keeps a clear paper-trail of recommendations you can review, discuss in a CAB meeting, and commit through GitOps.
• Speed of benefit: Initial captures the “easy wins” (pods update the next time you deploy anyway) without new moving parts.

Rule of thumb:
Start with Off in prod to collect data, graduate to Initial once you’re confident in the numbers, and save Auto for clusters where a sudden, automated restart won’t ruin anyone’s day.

And remember—whatever mode you choose today isn’t permanent. A single kubectl patch vpa -p '{"spec":{"updatePolicy":{"updateMode":"Auto"}}}' can promote or demote an add-on at any time, so adjust as your change-management posture evolves.

Warning

Preview caveat: Everything here could change tomorrow, so always test in a dev cluster first.

Prerequisites

You’ll need an AKS 1.25 + cluster, the Azure CLI 2.60.0 or newer, and the aks-preview extension. You also need enough headroom on the system node pool (or the cluster autoscaler enabled) because VPA’s decisions are only as good as the nodes beneath them.

Install / Update the aks-preview Extension

Before we can flip any preview bits we need the extension. Run the command below in Cloud Shell or your favourite terminal.

# Install – or upgrade if you already have it az extension add –name aks-preview –upgrade

# Install – or upgrade if you already have it

az extension add —name aks–preview —upgrade

Once the command completes, the latest preview features (including our shiny cost-optimised scaling flag) will be available to az aks.

Register the Preview Feature

Azure features hide behind subscription-level flags. Let’s register AKS-AddonAutoscalingPreview now:

az feature register \ –namespace Microsoft.ContainerService \ –name AKS-AddonAutoscalingPreview

az feature register \

—namespace Microsoft.ContainerService \

—name AKS–AddonAutoscalingPreview

Keep hitting the same command until the state flips to Registered. Afterwards, refresh the provider so your subscription knows about the new capability:

az provider register –namespace Microsoft.ContainerService

az provider register —namespace Microsoft.ContainerService

Thats it! It is now enabled.

Enable Cost-Optimised Scaling on Your Cluster

Creating a brand-new cluster

If you’re spinning up fresh infrastructure, simply pass one extra flag to az aks create:

az aks create \ –resource-group $RG \ –name $CLUSTER \ –enable-optimized-addon-scaling

az aks create \

—resource–group $RG \

—name $CLUSTER \

—enable–optimized–addon–scaling

Azure will provision the cluster, deploy VPA, and restart any add-on pods that need resizing. CoreDNS is clever enough to roll without downtime.

Information

If you’re deploying with Bicep, ARM templates, or Terraform, flip VerticalPodAutoscaler to true and AddonAutoscaling to enabled.

Turning it on for an existing cluster

Already have a cluster ticking along nicely? No problem, update in place:

az aks update \ –resource-group $RG \ –name $CLUSTER \ –enable-optimized-addon-scaling

az aks update \

—resource–group $RG \

—name $CLUSTER \

—enable–optimized–addon–scaling

Expect VPA components to appear in the kube-system namespace and selected add-on pods to bounce once.

When the command returns, your add-ons are officially under VPA’s watchful eye.

Verifying the VPA Pods

Let’s confirm everything spun up correctly. First, list the three VPA system pods:

kubectl get pods -n kube-system | grep vpa

kubectl get pods –n kube–system | grep vpa

You should see vpa-admission-controller, vpa-recommender, and vpa-updater all in the Running state. If any are pending, check your node capacity or the pod events for hints.

Applying the VPA-recommended values manually

Use the flow below whenever you want VPA’s suggested CPU / memory numbers to take effect immediately, rather than waiting for the next rollout.

Let’s have a look at what CoreDNS is currently using:

echo -e “CONTAINER\tREQ_CPU\tREQ_MEM\tLIM_CPU\tLIM_MEM” && \ kubectl get pod -n kube-system -l k8s-app=kube-dns -o json | jq -r ‘.items[].spec.containers[] | [.name, .resources.requests.cpu, .resources.requests.memory, .resources.limits.cpu, .resources.limits.memory] | @tsv’

echo –e “CONTAINER\tREQ_CPU\tREQ_MEM\tLIM_CPU\tLIM_MEM” && \

kubectl get pod –n kube–system –l k8s–app=kube–dns –o json |

jq –r ‘.items[].spec.containers[]

| [.name,

.resources.requests.cpu,

.resources.requests.memory,

.resources.limits.cpu,

.resources.limits.memory]

| @tsv‘

This should give you a nice table that looks like:

CONTAINER REQ_CPU REQ_MEM LIM_CPU LIM_MEM coredns 100m 70Mi 3 500Mi coredns 100m 70Mi 3 500Mi coredns 100m 70Mi 3 500Mi coredns 100m 70Mi 3 500Mi coredns 100m 70Mi 3 500Mi

CONTAINER REQ_CPU REQ_MEM LIM_CPU LIM_MEM

coredns 100m 70Mi 3 500Mi

Next, have a peek at the VPA recommendations for CoreDNS:

kubectl get vpa coredns -n kube-system

kubectl get vpa coredns –n kube–system

You’ll get an output similar to this:

NAME MODE CPU MEM PROVIDED AGE coredns Initial 11m 23574998 True 44m

NAME MODE CPU MEM PROVIDED AGE

coredns Initial 11m 23574998 True 44m

That table tells us VPA thinks CoreDNS only needs ~11 millicores of CPU and ~24 MB of RAM. Not bad!

By default, the VPA sits in Initial mode, which means the new numbers apply when the pod restarts. If you’d like them right away, simply delete the pod and let the ReplicaSet recreate it:

kubectl delete pod <coredns-pod-name> -n kube-system</coredns-pod-name>

kubectl delete pod <coredns–pod–name> –n kube–system

Give it a few seconds and the replacement pod will come up with the trimmed-down requests.

Customising Requests, Limits, Min/Max and Update Mode

Sometimes you need tighter control than “whatever VPA says”. AKS exposes three annotations to let you override specific behaviours by setting the values to enabled or disabled:

kubernetes.azure.com/override-requests-limits – add to the Deployment or DaemonSet if you wish to edit CPU/memory requests or limits.
kubernetes.azure.com/override-min-max – add to the VPA resource itself to tweak minAllowed or maxAllowed.
kubernetes.azure.com/override-update-mode – add to the VPA resource to switch between Off, Initial, or Auto.

Bumping CoreDNS Requests & Limits in the Deployment
You’ve profiled CoreDNS and discovered it occasionally bursts but rarely exceeds three vCPUs or 500 MiB. Shrink the default requests while keeping sensible limits by enabling the override on the Deployment:

apiVersion: apps/v1 kind: Deployment metadata: name: coredns namespace: kube-system annotations: kubernetes.azure.com/override-requests-limits: “enabled” spec: replicas: 2 selector: matchLabels: k8s-app: kube-dns template: metadata: labels: k8s-app: kube-dns spec: containers: – name: coredns image: mcr.microsoft.com/oss/coredns/coredns:1.11.2 resources: requests: cpu: “100m” memory: “70Mi” limits: cpu: “3” memory: “500Mi”

apiVersion: apps/v1

kind: Deployment

metadata:

namespace: kube–system

annotations:

kubernetes.azure.com/override–requests–limits: “enabled”

spec:

replicas: 2

selector:

matchLabels:

k8s–app: kube–dns

template:

metadata:

labels:

k8s–app: kube–dns

spec:

containers:

– name: coredns

image: mcr.microsoft.com/oss/coredns/coredns:1.11.2

resources:

requests:

cpu: “100m”

memory: “70Mi”

limits:

cpu: “3”

memory: “500Mi”

Once applied, AKS stops reconciling these values, and a pod restart picks up the new requests straight away.

Setting Hard Min/Max Guard-Rails in the VPA
Prefer to keep automatic scaling but impose upper and lower bounds? Add the min/max override to the VPA resource itself:

apiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata: name: coredns namespace: kube-system annotations: kubernetes.azure.com/override-min-max: “enabled” spec: resourcePolicy: containerPolicies: – containerName: coredns minAllowed: cpu: 10m memory: 10Mi maxAllowed: cpu: 3 memory: 500Mi updatePolicy: updateMode: “Initial”

apiVersion: autoscaling.k8s.io/v1

kind: VerticalPodAutoscaler

metadata:

namespace: kube–system

annotations:

kubernetes.azure.com/override–min–max: “enabled”

spec:

resourcePolicy:

containerPolicies:

– containerName: coredns

minAllowed:

cpu: 10m

memory: 10Mi

maxAllowed:

cpu: 3

memory: 500Mi

updatePolicy:

updateMode: “Initial”

VPA keeps producing fresh recommendations but always clamps them inside the range you set.

Turning VPA Off for a Specific Add-on
Need recommendations only, with zero automatic changes? Enable the update-mode override and set the mode to Off:

apiVersion: autoscaling.k8s.io/v1

kind: VerticalPodAutoscaler

metadata:

namespace: kube–system

annotations:

kubernetes.azure.com/override–update–mode: “enabled”

spec:

updatePolicy:

updateMode: “Off”

The VPA still records usage and suggests values, but it will never modify the running pod, handy for change-control windows or live debugging.

Disabling Cost-Optimised Scaling

Need to roll everything back? You can turn the feature off while leaving the VPA add-on in place:

az aks update \ –resource-group $RG \ –name $CLUSTER \ –disable-optimized-addon-scaling

az aks update \

—resource–group $RG \

—name $CLUSTER \

—disable–optimized–addon–scaling

If you also want to remove VPA entirely, follow the official docs to disable the VPA add-on afterwards.

Troubleshooting Pointers

If VPA pods are pending, double-check your system node pool has free CPU and memory, or enable the cluster autoscaler.
When add-on pods stick in Pending, look for complaints about insufficient resources (kubectl describe pod). Either raise node capacity or lower VPA min/max.
No recommendations showing? Tail the vpa-recommender logs, it usually spills helpful errors.

Wrapping up

Microsoft has been on a mission to slice AKS operating costs, and Cost-Optimised Add-on Scaling is a smart move in that direction. By letting VPA trim away wasted CPU and memory,or by giving you explicit knobs to set hard caps, the feature can free up node capacity for your real workloads.

Try it today in a dev cluster, keep an eye on pod events, and let me know over on Twitter (@Pixel_Robots) how much headroom you claw back. Until next time, happy ~~kubing~~ kube-cost-optimising!

Cost-Optimised Add-on Scaling in AKS: Right-Size Your System Add-ons (Preview)