Right-Sizing Azure VMs: Why the CPU Average Lies to You

The quarterly cost review reaches the compute line and someone points at a D8s_v5 that has averaged 4% CPU since March. Downsizing it would save a few hundred dollars a month, and everyone in the room knows it. Nobody approves the change.

The reason is always the same: someone remembers a spike. Month-end close, a batch job, a traffic surge nobody can quite characterize. The average says the VM is asleep, but the average also flattens out the one hour that matters, and the person who owns the workload is not going to risk an outage to save the company lunch money.

So oversized VMs survive review after review. Not because the data says they should, but because the data in the room is too thin to argue with fear.

The Problem: The Average Buries the Evidence

Teams over-provision by default, and that part is rational. Picking a bigger SKU at deployment time is cheap insurance against a performance incident on day one. The failure is structural: nothing in Azure nudges anyone to revisit the decision, so the insurance premium bills forever.

And when someone finally does look, the number they reach for first, average CPU, is the least useful one available:

Averages hide spikes. A VM at 4% average could be flatlined at 4%, or idle for 29 days and pegged at 100% during month-end close. Those are different sizing decisions.
CPU is only one axis. A VM can idle its cores while sitting at 90% memory usage. Downsizing it by CPU alone is how right-sizing gets its bad reputation.
No data means no decision. Without percentiles and memory numbers, "it spikes sometimes" is unanswerable, and unanswerable objections always win.

The fix is not courage. It is better data.

Pulling Real Utilization Numbers Yourself

If your VMs report into Log Analytics through VM insights, the InsightsMetrics table has what you need. This query runs in Log Analytics (not Azure Resource Graph) and gives you average, 95th percentile, and max CPU per VM over 30 days:

InsightsMetrics
| where TimeGenerated > ago(30d)
| where Namespace == 'Processor' and Name == 'UtilizationPercentage'
| summarize avgCpu = round(avg(Val), 1),
            p95Cpu = round(percentile(Val, 95), 1),
            maxCpu = round(max(Val), 1) by Computer
| order by p95Cpu asc

The gap between avgCpu and p95Cpu is the spike argument, settled with numbers. A VM at 4% average and 11% p95 is oversized, full stop. A VM at 4% average and 92% p95 has a real workload hiding in it, and now you know to look at when those spikes happen before touching it.

Do the same for memory, because CPU-only sizing is the classic mistake:

InsightsMetrics
| where TimeGenerated > ago(30d)
| where Namespace == 'Memory' and Name == 'AvailableMB'
| summarize avgAvailableMB = round(avg(Val)),
            p5AvailableMB = round(percentile(Val, 5)) by Computer
| order by p5AvailableMB asc

A low 5th percentile of available memory means the VM genuinely needs its RAM even if its cores are idle, which usually points to a different SKU family rather than a smaller size.

For a single VM without any agent setup, platform metrics work from the Azure CLI:

az monitor metrics list \
  --resource /subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.Compute/virtualMachines/<vm-name> \
  --metric "Percentage CPU" \
  --interval PT1H \
  --aggregation Average Maximum

These queries work, and knowing how to run them makes you the most credible person in the sizing meeting. But look at what they leave you holding:

Agent and ingestion prerequisites. InsightsMetrics only exists for VMs enrolled in VM insights with the Azure Monitor agent, and you pay Log Analytics ingestion for the privilege. Platform metrics alone give you CPU but not guest-level memory.
Interpretation is on you, per VM. The query returns numbers. Deciding whether 18% p95 on a D8s_v5 means a D4s_v5 or an E4s_v5, given memory pressure and disk throughput caps, is manual analysis multiplied by every VM in the fleet.
No SKU mapping, no dollar figure. Nothing here says "switch to this size and save this much," which is the sentence that actually gets a change approved.
It is a snapshot. You ran it in June. The fleet keeps changing, and nobody reruns the query in August.

What StratoLens Does With the Same Data

StratoLens turns that raw utilization into decisions you can defend in the review meeting:

Recommendations built on real CPU, memory, and disk utilization, not a single headline metric, so the memory-hungry VM with idle cores does not get flagged for a naive downsize.
Specific SKU suggestions. Not "this VM is oversized" but "switch to this SKU and save this much," which is the difference between an observation and an approved change.
Cost impact per VM, so you work the list from the biggest savings down instead of debating the $12/month stragglers.
Continuous analysis instead of a one-off query. The fleet gets re-evaluated as usage changes, so right-sizing stops being an annual archaeology project.

The performance data driving those recommendations never leaves your environment. StratoLens deploys into your own Azure tenant, so utilization and cost numbers stay in your subscription.

Review Sizing on a Cadence

Right-sizing fails as a heroic quarterly initiative and works as a small standing habit: look at the current recommendations, action the top few by dollar impact, and move on. The Log Analytics queries above are a solid way to start and a good way to spot-check anything you are told. When you want the percentile analysis, SKU mapping, and savings math done for every VM continuously, that is what VM Sizing Recommendations in StratoLens is for.

Back to Blog

Right-Sizing Azure VMs: Why the CPU Average Lies to You

The Problem: The Average Buries the Evidence

Pulling Real Utilization Numbers Yourself

What StratoLens Does With the Same Data

Review Sizing on a Cadence

Start Your 28-Day Free Trial