Cost-Optimised Add-on Scaling in AKS: Right-Size Your System Add-ons (Preview)


Reading Time: 6 minutes

Recently, while catching up on the latest AKS release notes, I spotted a brand-new preview feature that promises to trim unnecessary CPU and memory usage from system add-ons: cost optimised add-on scaling.
In a nutshell, Microsoft has wired the managed Vertical Pod Autoscaler (VPA) into key AKS add-ons (think CoreDNS, Workload Identity, Image Integrity, Retina, and more) and given us knobs to fine-tune their resource profiles. In today’s post I’ll show you how to:

  • Turn the feature on (or off) in a new or existing cluster.
  • Customise default CPU / memory requests & limits.
  • Adjust VPA policies such as min / max and update mode.
  • Troubleshoot the most common “why is my pod Pending?” questions.

Table of Contents

What exactly is Cost-Optimised Add-on Scaling?

First, let’s set the scene.

When you switch on the feature AKS does three things behind the scenes:

  1. Installs the managed VPA add-on (three pods called vpa-admission-controller, vpa-recommender, and vpa-updater).
  2. Creates a VPA custom resource for each supported add-on so AKS can deliver CPU and memory recommendations, or even apply them automatically.
  3. Lets you override the defaults by adding a few simple annotations to the Deployment/DaemonSet or the VPA itself.

Supported AKS add-ons (first wave)

Add-on Enablement behaviour VPA CR name(s) Quick validation
CoreDNS On by default for new clusters coredns kubectl get vpa coredns -n kube-system
Workload Identity Manually enable azure-wi-webhook-controller-manager kubectl get vpa azure-wi-webhook-controller-manager -n kube-system
Image Integrity Manually enable ratify kubectl get vpa ratify -n gatekeeper-system
Network Observability (Retina) Manually enable retina-agent, retina-operator kubectl get vpa retina-agent -n kube-system

Expect more add-ons to join this list as the preview matures.

Understanding VPA modes in real life

Before you flip any switches, it’s worth pausing to decide how the Vertical Pod Autoscaler should behave once it has a recommendation.
Think of the three modes as a sliding scale between hands-off advisor and fully automatic mechanic:

Mode What it does Typical use-case
Off VPA gathers metrics and writes suggestions into the CR status field but never touches your pods. Production clusters that need a strict change-control gate or GitOps-driven updates. Also handy when you’re just benchmarking to understand baseline usage.
Initial
(default)
VPA applies the recommendation only when the pod restarts for another reason (image update, node drain, manual delete, etc.). VPA itself will not trigger that restart. Teams that want a low-risk first step—no surprise restarts—but still want improved requests/limits over time. Works well alongside rolling deployments run by your CI/CD pipeline.
Auto VPA acts immediately: it edits the pod template, which in turn causes Kubernetes to recreate the pod with the new requests/limits. This can happen at any time if utilisation drifts. Development, performance-testing or cost-lab environments where you’re comfortable with automated restarts in exchange for maximum resource efficiency.

Why does the mode matter?
Operational blast-radius: Auto can restart critical add-ons at 3 a.m. if usage spikes—great for savings, possibly bad for pagers.
Auditability: Off keeps a clear paper-trail of recommendations you can review, discuss in a CAB meeting, and commit through GitOps.
Speed of benefit: Initial captures the “easy wins” (pods update the next time you deploy anyway) without new moving parts.

Rule of thumb:
Start with Off in prod to collect data, graduate to Initial once you’re confident in the numbers, and save Auto for clusters where a sudden, automated restart won’t ruin anyone’s day.

And remember—whatever mode you choose today isn’t permanent. A single kubectl patch vpa -p '{"spec":{"updatePolicy":{"updateMode":"Auto"}}}' can promote or demote an add-on at any time, so adjust as your change-management posture evolves.

Warning

Preview caveat: Everything here could change tomorrow, so always test in a dev cluster first.

Prerequisites

You’ll need an AKS 1.25 + cluster, the Azure CLI 2.60.0 or newer, and the aks-preview extension. You also need enough headroom on the system node pool (or the cluster autoscaler enabled) because VPA’s decisions are only as good as the nodes beneath them.

Install / Update the aks-preview Extension

Before we can flip any preview bits we need the extension. Run the command below in Cloud Shell or your favourite terminal.

Once the command completes, the latest preview features (including our shiny cost-optimised scaling flag) will be available to az aks.

Register the Preview Feature

Azure features hide behind subscription-level flags. Let’s register AKS-AddonAutoscalingPreview now:

Keep hitting the same command until the state flips to Registered. Afterwards, refresh the provider so your subscription knows about the new capability:

Thats it! It is now enabled.

Enable Cost-Optimised Scaling on Your Cluster

Creating a brand-new cluster

If you’re spinning up fresh infrastructure, simply pass one extra flag to az aks create:

Azure will provision the cluster, deploy VPA, and restart any add-on pods that need resizing. CoreDNS is clever enough to roll without downtime.

Information

If you’re deploying with Bicep, ARM templates, or Terraform, flip VerticalPodAutoscaler to true and AddonAutoscaling to enabled.

Turning it on for an existing cluster

Already have a cluster ticking along nicely? No problem, update in place:

Expect VPA components to appear in the kube-system namespace and selected add-on pods to bounce once.

When the command returns, your add-ons are officially under VPA’s watchful eye.

Verifying the VPA Pods

Let’s confirm everything spun up correctly. First, list the three VPA system pods:

You should see vpa-admission-controller, vpa-recommender, and vpa-updater all in the Running state. If any are pending, check your node capacity or the pod events for hints.

Applying the VPA-recommended values manually

Use the flow below whenever you want VPA’s suggested CPU / memory numbers to take effect immediately, rather than waiting for the next rollout.

Let’s have a look at what CoreDNS is currently using:

This should give you a nice table that looks like:

Next, have a peek at the VPA recommendations for CoreDNS:

You’ll get an output similar to this:

That table tells us VPA thinks CoreDNS only needs ~11 millicores of CPU and ~24 MB of RAM. Not bad!

By default, the VPA sits in Initial mode, which means the new numbers apply when the pod restarts. If you’d like them right away, simply delete the pod and let the ReplicaSet recreate it:

Give it a few seconds and the replacement pod will come up with the trimmed-down requests.

Customising Requests, Limits, Min/Max and Update Mode

Sometimes you need tighter control than “whatever VPA says”. AKS exposes three annotations to let you override specific behaviours by setting the values to enabled or disabled:

  • kubernetes.azure.com/override-requests-limits – add to the Deployment or DaemonSet if you wish to edit CPU/memory requests or limits.
  • kubernetes.azure.com/override-min-max – add to the VPA resource itself to tweak minAllowed or maxAllowed.
  • kubernetes.azure.com/override-update-mode – add to the VPA resource to switch between Off, Initial, or Auto.

Bumping CoreDNS Requests & Limits in the Deployment
You’ve profiled CoreDNS and discovered it occasionally bursts but rarely exceeds three vCPUs or 500 MiB. Shrink the default requests while keeping sensible limits by enabling the override on the Deployment:

Once applied, AKS stops reconciling these values, and a pod restart picks up the new requests straight away.

Setting Hard Min/Max Guard-Rails in the VPA
Prefer to keep automatic scaling but impose upper and lower bounds? Add the min/max override to the VPA resource itself:

VPA keeps producing fresh recommendations but always clamps them inside the range you set.

Turning VPA Off for a Specific Add-on
Need recommendations only, with zero automatic changes? Enable the update-mode override and set the mode to Off:

The VPA still records usage and suggests values, but it will never modify the running pod, handy for change-control windows or live debugging.

Disabling Cost-Optimised Scaling

Need to roll everything back? You can turn the feature off while leaving the VPA add-on in place:

If you also want to remove VPA entirely, follow the official docs to disable the VPA add-on afterwards.

Troubleshooting Pointers

  • If VPA pods are pending, double-check your system node pool has free CPU and memory, or enable the cluster autoscaler.
  • When add-on pods stick in Pending, look for complaints about insufficient resources (kubectl describe pod). Either raise node capacity or lower VPA min/max.
  • No recommendations showing? Tail the vpa-recommender logs, it usually spills helpful errors.

Wrapping up

Microsoft has been on a mission to slice AKS operating costs, and Cost-Optimised Add-on Scaling is a smart move in that direction. By letting VPA trim away wasted CPU and memory,or by giving you explicit knobs to set hard caps, the feature can free up node capacity for your real workloads.

Try it today in a dev cluster, keep an eye on pod events, and let me know over on Twitter (@Pixel_Robots) how much headroom you claw back. Until next time, happy kubing kube-cost-optimising!


Share this content:

I am a passionate blogger with extensive experience in web design. As a seasoned YouTube SEO expert, I have helped numerous creators optimize their content for maximum visibility.

Leave a Comment