ec2 right sizing · 2026 playbook

EC2 right-sizing — match the instance to the real workload, not the guess.

Most EC2 bills are 30–45% larger than they need to be because instances were sized once, at launch, against a fear of running out — and never revisited. Right-sizing is the discipline of matching instance type and size to actual measured utilization. This page walks through how to read AWS Compute Optimizer, which metrics actually matter, how to right-size EC2 plus RDS, EBS and Lambda in the same pass, where Graviton fits, and the measure → recommend → test → apply loop that makes it safe.

typical EC2 overspend
30–45%
idle-fleet CPU
often <10%
Graviton price-perf
~20–40%
partner audit cost
often $0
TL;DR
  • Right-sizing means matching each instance to its real, measured load — CPU, memory, network, and disk — instead of the conservative guess made at launch. The biggest wins come from compute that runs steadily below 20% utilization: dropping a size (m5.2xlarge → m5.xlarge) halves the hourly rate, and most workloads never notice.
  • AWS Compute Optimizer is the free starting point — it reads 14 days of CloudWatch history and flags over-provisioned and under-provisioned instances, but it is blind to memory unless you install the CloudWatch agent. Without memory data you are right-sizing on CPU alone, which is how teams accidentally downsize a memory-bound app into swapping.
  • Right-sizing is not a one-time project; it is a loop. Measure (collect 2–4 weeks of real metrics) → recommend (Compute Optimizer + your judgement) → test (apply in staging or to one node) → apply (roll out with guardrails). A CloudRoute-matched AWS partner runs this loop for you — and because it usually qualifies as an AWS-funded optimization or Well-Architected engagement, you frequently cut the bill for $0.
the core idea

IWhat right-sizing actually is — and why it is the highest-ROI lever you own

Right-sizing is the practice of continuously matching the instance type and size you pay for to the utilization the workload actually shows. It is the one cost lever that requires no commitment, no contract, and no architectural rewrite — you are simply paying for what you use instead of what you feared you might use.

Every EC2 instance is a fixed hourly bill regardless of how much of it you consume. An m5.2xlarge (8 vCPU, 32 GiB) costs roughly twice an m5.xlarge (4 vCPU, 16 GiB), and that 2× shows up every hour of every day whether the box is pinned at 90% CPU or idling at 6%. Right-sizing is the act of noticing the idle 6% and moving down a size — or three.

The reason this is the highest-ROI lever for most teams is that it stacks underneath everything else. Savings Plans and Reserved Instances discount the rate you pay; right-sizing reduces the quantity you buy. If you buy a 3-year Compute Savings Plan against an over-provisioned fleet, you have just locked in three years of paying for capacity you do not use. The correct order is always: right-size first, then commit to the smaller, accurate baseline. Doing it the other way around is one of the most expensive mistakes in AWS FinOps.

The numbers are not subtle. Across audited fleets, it is normal to find average CPU utilization sitting between 5% and 20% on production instances that were sized for a launch-day worst case that never arrived. AWS's own guidance and the published FinOps literature put typical right-sizing savings at 30–45% of the affected compute spend — and that is before you layer commitments on top. For a startup spending $8K/month on EC2, that is $2.5K–$3.6K/month recovered, every month, with zero loss of capability.

The honest tradeoff: right-sizing trades a sliver of headroom for a large chunk of money. If you downsize too aggressively and a traffic spike arrives, the instance throttles. That is why the process below is built around measurement and a reversible test step — not around guessing smaller instead of guessing bigger.

reading the tool

IIAWS Compute Optimizer — how to read the recommendations correctly

Compute Optimizer is AWS's free, ML-driven right-sizing recommender. It analyses up to 14 days of CloudWatch metrics (longer if you enable enhanced history) and classifies every instance as over-provisioned, under-provisioned, or optimized. It is the right place to start — provided you understand what it can and cannot see.

Turn it on at the account or AWS Organizations level (it is opt-in and free) and give it a couple of weeks of data. It then produces, for each instance, a finding and up to three recommended alternatives, each annotated with the projected CPU/memory/network headroom and an estimated monthly cost difference. Read those recommendations as a ranked shortlist, not gospel — the tool is conservative by design and does not know your business context (e.g. that one box must survive a Monday-morning batch spike).

Over-provisioned — the money is here

An "over-provisioned" finding means the instance has consistently more capacity than the workload uses — high idle CPU, spare memory, light network. These are your savings. Compute Optimizer will suggest a smaller size in the same family (e.g. c5.2xlarge → c5.xlarge) or a cheaper modern/Graviton equivalent. The estimated savings figure it shows is the recovered monthly spend if you adopt the top recommendation.

Treat over-provisioned findings on stateless, horizontally-scaled tiers (web/app servers behind a load balancer, worker pools) as the safest, first wins — downsizing one node in a fleet of ten is low-risk and instantly reversible.

Under-provisioned — fix these first, for safety

An "under-provisioned" finding means the instance is starved — CPU pinned near 100%, memory pressure, or network saturation. These do not save money; they protect performance and reliability. Always action the under-provisioned list before you start downsizing elsewhere, so that right-sizing improves the fleet rather than just shrinking it. Sometimes the right move is up a size; sometimes it is out (add nodes and scale horizontally).

The finding-classification and the risk score

Each recommendation carries a "performance risk" indicator (how likely the smaller size is to under-serve the workload) and a CPU/memory utilization breakdown at p99.5, not just the average. Read the percentile, not the mean — an instance averaging 15% CPU but spiking to 95% three times a day is not a clean downsize candidate. The percentile view is exactly where teams who right-size on averages alone get burned.

the memory blind spot

By default Compute Optimizer (and the EC2 console) cannot see memory utilization — the EC2 hypervisor does not expose guest RAM. Until you install the CloudWatch agent to publish a memory metric, every right-sizing recommendation is CPU-and-network only. For Java/JVM, in-memory caches, analytics, and most databases, memory is the binding constraint — right-sizing those on CPU alone is how you downsize a healthy box into constant swapping and a 2am page.

the signals

IIIThe metrics that actually matter (and the one AWS hides)

Right-sizing is only as good as the telemetry behind it. There are four signals that decide whether a smaller instance is safe, and you need all four — not just the two that AWS surfaces for free.

The goal is to characterise the workload across a representative window — ideally 2–4 weeks that includes your weekly peak (month-end close, Monday traffic, a marketing send). Look at percentiles (p95/p99), not just averages, and look at the shape over time, not a single snapshot.

  • CPU utilization — The default signal, available with no agent. Read average AND p99. A box at 10% average / 30% p99 is an easy downsize; 15% average / 95% p99 is not — that spike needs the headroom or needs to be smoothed with auto-scaling.
  • Memory utilization — The signal AWS does not expose without the CloudWatch agent (or an equivalent like the OpenTelemetry collector). This is the single most important addition you can make. Memory-bound workloads — databases, caches, JVM apps, data pipelines — must be right-sized on memory headroom, often choosing an r-family (memory-optimized) size rather than a smaller general-purpose one.
  • Network throughput — EC2 network bandwidth scales with instance size. Downsizing a network-heavy node (ingestion, media, chatty microservices) can quietly cap throughput even when CPU and memory have room. Check NetworkIn/NetworkOut against the instance's rated bandwidth before moving down.
  • Disk / EBS throughput + IOPS — Smaller instances also get less EBS bandwidth and lower baseline IOPS. A box that is CPU-idle but IO-bound (a database doing heavy reads) can be throughput-constrained by a downsize. Watch EBSReadOps/EBSWriteOps and volume queue depth.

Two practical notes. First, burstable T-family instances (t3/t4g) are measured differently — their CPU credit balance matters as much as raw utilization; a T-instance "averaging 10% CPU" can still be throttling if it has exhausted credits. Second, install the CloudWatch agent fleet-wide before you start — retrofitting memory data after you have already downsized defeats the purpose. The agent itself is near-free; the metrics it publishes carry a small per-metric CloudWatch cost that is rounding error against the savings.

the whole footprint

IVRight-sizing is not just EC2 — RDS, EBS, and Lambda in the same pass

EC2 is where right-sizing started, but the same "match the resource to real utilization" discipline applies across the stack. A complete right-sizing pass touches your databases, your block storage, and even your serverless functions — and the non-EC2 wins are frequently larger per-hour than the compute ones.

Treat right-sizing as a footprint exercise, not an EC2 exercise. The four surfaces below are the ones with the most reliable, low-risk savings — and a partner-led audit will sweep all of them in a single engagement rather than leaving three-quarters of the money on the table.

RDS / Aurora — usually the biggest single line

Databases are routinely the most over-provisioned resource on the bill because nobody wants to be the person who under-sized the database. Compute Optimizer now covers RDS, surfacing instance-class recommendations from CloudWatch (CPU, memory, IOPS, connections). The high-value moves: drop a db instance class where utilization is low, switch to a Graviton db class (db.r6g/db.r7g — typically ~10–20% cheaper for similar or better performance), enable storage autoscaling so you stop pre-provisioning headroom, and reserve the right-sized class afterward (RDS still uses Reserved Instances, where EC2 has mostly moved to Savings Plans). For spiky, intermittent databases, Aurora Serverless v2 right-sizes capacity automatically by the second.

EBS — the silent over-provision

EBS right-sizing is almost pure profit because it carries essentially no performance risk. Three moves: (1) migrate gp2 volumes to gp3 — gp3 is ~20% cheaper per GiB and lets you provision IOPS/throughput independently of size, so you stop buying a huge volume just to get IOPS; (2) delete unattached volumes and stale snapshots, which accumulate silently every time an instance is terminated; (3) right-size over-allocated volumes (a 1 TiB volume that is 8% full). Unattached gp3 volumes and orphaned snapshots are one of the most common "free money" findings in any audit.

Lambda — memory is the dial

Lambda has no instance type — but its memory setting is a right-sizing dial, because Lambda allocates CPU proportionally to memory. Over-allocate and you pay for unused GB-seconds; under-allocate and the function runs slower (and can paradoxically cost more, because duration grows). Compute Optimizer produces Lambda recommendations, and AWS Lambda Power Tuning maps the cost/performance curve so you pick the cheapest memory setting that still hits your latency target. Pair this with moving functions to the arm64 (Graviton) architecture for a further ~20% price cut.

modern families

VModern families and Graviton — right-sizing across, not just down

Right-sizing is not only about choosing a smaller size in the same family. Half the win is moving to a newer generation or a different architecture that delivers the same work for less money. The cleanest version of this is AWS Graviton.

Two "sideways" moves belong in every right-sizing pass. First, generation upgrades: an m6i or m7i delivers more performance per dollar than the equivalent m5, and an older m4/c4 is almost always worth retiring. Moving up a generation while moving down a size is frequently a net price cut with a net performance gain. Second, architecture: AWS Graviton (ARM-based — the g-suffix families like m7g, c7g, r7g, t4g) delivers roughly 20–40% better price-performance than comparable x86 instances for a wide range of workloads.

Graviton belongs inside the right-sizing conversation rather than as a separate project, because the decision is the same decision: what is the cheapest instance that serves this workload at the required performance? Often the answer is "the Graviton equivalent, one size smaller." For most interpreted and managed runtimes — Python, Node.js, Go, Java, .NET, most containers, plus managed services like RDS, ElastiCache and OpenSearch — moving to Graviton is a configuration change, not a rewrite. The work is in validating that any native dependencies have arm64 builds (today almost all do) and re-running your test suite.

The combined effect compounds. Take a fleet of x86 instances averaging 12% CPU. Right-sizing down a notch recovers ~50% on those boxes; moving the smaller target to Graviton takes another ~20% off the rate. Stack a Compute Savings Plan on the new, accurate baseline and the same workload can land 60–70% below where it started — without touching application logic. That stacking order (right-size → Graviton → commit) is the core of a well-run optimization.

don't commit before you right-size

The most expensive sequencing error in AWS FinOps: buying a 3-year Savings Plan or RI against an over-provisioned fleet. You lock in three years of paying for idle capacity. Always right-size and migrate to the cheapest viable family first, then commit to the smaller baseline. A Well-Architected / cost-optimization review will catch this — and a CloudRoute partner sequences it for you.

the loop

VIThe process: measure → recommend → test → apply

Right-sizing fails when it is treated as a spreadsheet exercise done once. It works when it is a repeatable loop with a reversible test step. Here is the four-stage loop a disciplined FinOps practitioner (or a CloudRoute partner) runs.

The whole point of the loop is to make downsizing safe and routine instead of scary and rare. Each stage de-risks the next; the test stage is what separates a confident right-sizing program from a gamble.

  • 1 — Measure — Install the CloudWatch agent for memory, then collect 2–4 weeks of CPU/memory/network/disk metrics that include your weekly peak. No agent, no memory data, no safe recommendation for anything memory-bound. Tag everything so you can attribute spend to teams and workloads.
  • 2 — Recommend — Run Compute Optimizer across EC2, RDS, EBS and Lambda. Cross-check its shortlist against the percentile view and your business calendar (month-end, launches, campaigns). Produce a ranked change list: safest/highest-value first — usually stateless fleet nodes and unattached EBS.
  • 3 — Test — Validate before you roll out. Apply the new size to one node behind the load balancer, or to a staging environment under load (replay production traffic if you can). Watch latency, error rate, CPU credits (for T-family), and memory headroom. For Graviton, run the full test suite on arm64. Reversible by design.
  • 4 — Apply — Roll the validated change out fleet-wide using your normal deploy path — change the launch template / Auto Scaling Group instance type, or update the RDS class in a maintenance window. Then re-measure: right-sizing is continuous, because workloads grow and shrink. Re-run the loop quarterly (or continuously, via automation).
making it stick

VIIAutomation, guardrails, and why teams over-provision in the first place

A right-sizing pass that is not backed by automation and guardrails decays — within two quarters the fleet drifts back to over-provisioned, because the same incentives that caused it the first time are still in place. Sustainable right-sizing needs both the cultural fix and the technical guardrails.

Start with the human reason, because it explains the whole pattern. Teams over-provision because the cost of being too small is visible and personal (an outage, a page, a blamed engineer) while the cost of being too big is invisible and diffuse (a slightly larger AWS bill nobody owns). Engineers, rationally, size up. The fix is partly cultural — make cost visible per team via cost allocation tags and showback so the over-provision has an owner — and partly structural: make downsizing safe and reversible (the test step above) so "size up just in case" stops being the only defensible choice.

Guardrails that prevent re-bloat

Put a cost ceiling and a feedback loop around the fleet. AWS Budgets with alerts catches spend creeping back up; Cost Anomaly Detection flags a sudden jump (a forgotten over-sized instance left running). Cost allocation tags + a Cost Explorer or CUR view per team turn an anonymous bill into an owned one. Together these mean a re-bloat shows up on someone's dashboard within days, not at the next quarterly review.

Automation that keeps capacity honest

The durable answer to over-provisioning is to stop sizing for the peak at all. Auto Scaling Groups add and remove capacity with demand, so you run small at the trough and wide at the peak instead of running big all the time. Scheduled scaling shuts non-production environments down nights and weekends (a dev fleet that runs 168 hours/week but is used 50 can be cut ~70% on a schedule). For variable databases, Aurora Serverless v2 right-sizes by the second. And Compute Optimizer recommendations can be pulled via API into a recurring report so right-sizing candidates surface automatically every cycle.

The guardrail against over-correcting

Automation cuts both ways — aggressive downsizing without limits causes incidents. The guardrails that keep it safe: never downsize more than one size step at a time without a test; keep p99 headroom (target peak utilization in the 60–80% band, not 95%+); exclude singleton stateful nodes from automated right-sizing; and always action the under-provisioned list first. The objective is the cheapest fleet that still meets your reliability bar — not the smallest possible fleet.

the safe way to do it

VIIIA partner runs the loop safely — and it is often AWS-funded

Right-sizing is conceptually simple and operationally fiddly: it needs clean telemetry, percentile-aware judgement, a safe test path, and someone with the time to roll changes out without breaking production. Most startup teams know they are over-provisioned and still never get to it, because the engineer who could is the engineer shipping the product.

That is the gap CloudRoute fills. We route you to a vetted AWS partner who runs the full measure → recommend → test → apply loop across EC2, RDS, EBS and Lambda — installs the CloudWatch agent so the recommendations include memory, sequences the Graviton moves, and applies the changes behind your normal deploy guardrails. You get the savings without pulling an engineer off the roadmap.

The honest commercial framing — because it is the part founders do not believe at first: this work is frequently AWS-funded. AWS funds partner-led cost-optimization and Well-Architected engagements through its partner programs (the partner is paid by AWS, not by you), and a Well-Architected Review can unlock remediation credits that offset the rework. For qualifying, credit-eligible engagements that means you cut your own bill for $0. Where an engagement does not qualify for AWS funding, it is a vetted-partner referral that pays for itself many times over out of the savings — a 35% cut on an $8K/month EC2 line is ~$34K/year, recurring.

Either way the structure is the same: AWS wants you efficient and committed for the long term, so it underwrites the partner; the partner does the work; CloudRoute is paid by the partner as a routing fee. You stay out of the payment loop and keep the savings. The cross-links below — the cost-optimization hub, Savings Plans (commit after you right-size), Graviton migration, Spot, and the Well-Architected review — are the natural next steps in the same motion.

before / after

A right-sizing pass, line by line — before vs after

A representative startup fleet before and after a single right-sizing engagement: downsize over-provisioned compute, move the targets to Graviton, fix the database and EBS, and tune Lambda memory. Rates are illustrative on-demand us-east-1 figures for shape — check the AWS pricing calculator / Cost Explorer for current rates. The point is the pattern, not the exact dollar.

ResourceBefore (sized at launch)After (right-sized)Monthly change
Web/app fleet (×6)6× m5.2xlarge @ ~$0.384/hr · avg 11% CPU6× m7g.xlarge (Graviton, 1 size down) @ ~$0.145/hr−$1,045/mo (~62%)
Worker pool (×4, stateless)4× c5.2xlarge @ ~$0.34/hr · avg 18% CPU4× c7g.xlarge (Graviton) @ ~$0.145/hr−$561/mo (~57%)
Primary databasedb.r5.2xlarge @ ~$1.00/hr · avg 22% CPU/memdb.r7g.xlarge (Graviton) @ ~$0.48/hr−$380/mo (~52%)
EBS storage8 TiB gp2 + 1.2 TiB unattached + stale snaps4 TiB gp3 (right-sized), orphans deleted−$520/mo (~55%)
Lambda (event pipeline)1769 MB x86, untuned512 MB arm64, power-tuned−$190/mo (~48%)
Non-prod environmentsRunning 168 hrs/weekScheduled off nights + weekends (~50 hrs/wk used)−$430/mo (~70%)
Fleet total~$8,050/mo~$4,924/mo−$3,126/mo (~39%)
Net ~39% off the affected spend (~$37K/year) with no loss of capability and no application rewrite — purely matching resources to measured utilization plus Graviton. Layer a Compute Savings Plan on the new, accurate baseline and the total reduction reaches ~55–65%. The order matters: right-size first, then commit.
know you're over-provisioned but never get to it?
Have a partner run the measure → recommend → test → apply loop for you
Get matched in 24h →
a recent match

A 41% EC2 cut for a Series-A SaaS — anonymized

inquiry · series-a b2b saas, ~$9.4K/mo AWS, US-East
Series-A B2B SaaS, 22 engineers, ~$9,400/month AWS bill (EC2 + RDS the bulk), single AWS account, no FinOps owner

Situation: Bill had grown ~3× in nine months and nobody could say why. Compute had been sized at launch and never revisited; production web/app fleet averaged 9–13% CPU. No CloudWatch agent, so memory was invisible and the team was scared to downsize the JVM-heavy services. The one engineer who understood the infrastructure was fully allocated to a customer-facing launch and could not take two weeks to do a right-sizing pass.

What CloudRoute did: Routed within 20 hours to a US-East AWS partner with a FinOps + Well-Architected track record. Partner ran the engagement as an AWS-funded cost-optimization review: installed the CloudWatch agent fleet-wide, collected 3 weeks of metrics across a month-end peak, then right-sized the stateless fleet down a size and onto m7g/c7g Graviton, moved the primary DB to db.r7g, migrated EBS gp2→gp3 and deleted ~1.2 TiB of orphaned volumes/snapshots, power-tuned the event-pipeline Lambdas to 512 MB arm64, and put non-prod on a nights/weekends schedule. Every change went through a one-node/staging test first; the under-provisioned queue-consumer was sized UP for safety in the same pass.

Outcome: AWS bill dropped from ~$9,400 to ~$5,520/month — a 41% cut (~$46K/year) — with zero customer-visible regression; p99 latency actually improved on the Graviton fleet. The Well-Architected review unlocked remediation credits that covered the rework, and the partner was paid through AWS's partner funding. CloudRoute's fee was paid by the partner. The customer paid $0 and walked away with Budgets alerts + anomaly detection wired up to keep the fleet honest.

engagement window: ~3 weeks · founder/eng time: ~4 hours · recurring savings: ~$46K/year · cost to customer: $0

faq

Common questions

What is EC2 right-sizing, in one sentence?
Right-sizing is matching each EC2 instance's type and size to the workload's real, measured utilization — CPU, memory, network, and disk — instead of the conservative over-estimate usually made at launch. The goal is the cheapest instance that still meets your performance and reliability bar, and it is the highest-ROI cost lever because it requires no commitment or rewrite.
How much can right-sizing actually save?
Typical right-sizing savings land at 30–45% of the affected compute spend, because most production fleets run at 5–20% average CPU after being sized for a worst case that never arrived. Fold in a move to Graviton (~20–40% better price-performance) and a Savings Plan on the new baseline, and the total reduction on the same workload commonly reaches 55–70% — with no loss of capability.
Is AWS Compute Optimizer enough on its own?
It is the right free starting point — it reads 14 days of CloudWatch metrics and flags over- and under-provisioned EC2, RDS, EBS, and Lambda — but it has one critical blind spot: it cannot see memory utilization until you install the CloudWatch agent. Without memory data you are right-sizing on CPU and network alone, which is unsafe for databases, JVM apps, and caches. Use Compute Optimizer as a ranked shortlist, then apply percentile-aware judgement and a test step.
Why can't AWS see my instance's memory usage by default?
The EC2 hypervisor exposes CPU, network, and disk metrics but not guest RAM — memory lives inside the operating system, which AWS does not instrument for you. You publish it yourself by installing the CloudWatch agent (or an OpenTelemetry collector), which adds a memory-utilization metric. This is the single most important thing to do before right-sizing anything memory-bound, and it should be done fleet-wide before you collect your measurement window.
Should I buy Savings Plans or Reserved Instances before or after right-sizing?
After — always. If you commit to a 1- or 3-year Savings Plan or RI against an over-provisioned fleet, you lock in years of paying for idle capacity. The correct sequence is right-size → migrate to the cheapest viable family (often Graviton) → then commit to the smaller, accurate baseline. Committing first is one of the most expensive sequencing mistakes in AWS FinOps.
Won't downsizing risk an outage when traffic spikes?
Only if you right-size on averages and skip the test step. The safe method reads p95/p99 utilization (not just the mean), keeps headroom (target 60–80% peak, not 95%+), validates the new size on one node or in staging under load before rolling out, and actions under-provisioned instances first. Done that way the change is reversible and low-risk — and auto-scaling means you stop sizing for the peak at all.
Does right-sizing apply to RDS, EBS, and Lambda too?
Yes — and the non-EC2 wins are often bigger. RDS/Aurora are routinely the most over-provisioned line (drop the instance class, move to Graviton db.r7g, enable storage autoscaling, then reserve). EBS is near-pure profit: migrate gp2→gp3, delete unattached volumes and stale snapshots, and shrink over-allocated volumes. Lambda's memory setting is a right-sizing dial (it controls CPU too) — tune it with Lambda Power Tuning and move to arm64. A complete pass touches all four.
How is this often "AWS-funded," and what is the catch?
AWS funds partner-led cost-optimization and Well-Architected engagements through its partner programs — the partner is paid by AWS, not by you — and a Well-Architected Review can unlock remediation credits that offset the rework. For qualifying, credit-eligible engagements you cut your own bill for $0. Where an engagement does not qualify, it is a vetted-partner referral that pays for itself many times over from the savings. CloudRoute is paid by the partner as a routing fee, so you stay out of the payment loop and keep the savings.

Stop paying for capacity you don't use.

CloudRoute routes you to a vetted AWS partner who runs the full right-sizing loop across EC2, RDS, EBS and Lambda — safely, behind your deploy guardrails. Often AWS-funded, so you cut the bill for $0.

typical EC2 cut30–45%
with Graviton + SP55–70%
cost to youoften $0
EC2 Right-Sizing: Cut Your AWS Bill 30–45% (2026 Guide) · CloudRoute