Most EC2 bills are 30–45% larger than they need to be because instances were sized once, at launch, against a fear of running out — and never revisited. Right-sizing is the discipline of matching instance type and size to actual measured utilization. This page walks through how to read AWS Compute Optimizer, which metrics actually matter, how to right-size EC2 plus RDS, EBS and Lambda in the same pass, where Graviton fits, and the measure → recommend → test → apply loop that makes it safe.
Right-sizing is the practice of continuously matching the instance type and size you pay for to the utilization the workload actually shows. It is the one cost lever that requires no commitment, no contract, and no architectural rewrite — you are simply paying for what you use instead of what you feared you might use.
Every EC2 instance is a fixed hourly bill regardless of how much of it you consume. An m5.2xlarge (8 vCPU, 32 GiB) costs roughly twice an m5.xlarge (4 vCPU, 16 GiB), and that 2× shows up every hour of every day whether the box is pinned at 90% CPU or idling at 6%. Right-sizing is the act of noticing the idle 6% and moving down a size — or three.
The reason this is the highest-ROI lever for most teams is that it stacks underneath everything else. Savings Plans and Reserved Instances discount the rate you pay; right-sizing reduces the quantity you buy. If you buy a 3-year Compute Savings Plan against an over-provisioned fleet, you have just locked in three years of paying for capacity you do not use. The correct order is always: right-size first, then commit to the smaller, accurate baseline. Doing it the other way around is one of the most expensive mistakes in AWS FinOps.
The numbers are not subtle. Across audited fleets, it is normal to find average CPU utilization sitting between 5% and 20% on production instances that were sized for a launch-day worst case that never arrived. AWS's own guidance and the published FinOps literature put typical right-sizing savings at 30–45% of the affected compute spend — and that is before you layer commitments on top. For a startup spending $8K/month on EC2, that is $2.5K–$3.6K/month recovered, every month, with zero loss of capability.
The honest tradeoff: right-sizing trades a sliver of headroom for a large chunk of money. If you downsize too aggressively and a traffic spike arrives, the instance throttles. That is why the process below is built around measurement and a reversible test step — not around guessing smaller instead of guessing bigger.
Compute Optimizer is AWS's free, ML-driven right-sizing recommender. It analyses up to 14 days of CloudWatch metrics (longer if you enable enhanced history) and classifies every instance as over-provisioned, under-provisioned, or optimized. It is the right place to start — provided you understand what it can and cannot see.
Turn it on at the account or AWS Organizations level (it is opt-in and free) and give it a couple of weeks of data. It then produces, for each instance, a finding and up to three recommended alternatives, each annotated with the projected CPU/memory/network headroom and an estimated monthly cost difference. Read those recommendations as a ranked shortlist, not gospel — the tool is conservative by design and does not know your business context (e.g. that one box must survive a Monday-morning batch spike).
An "over-provisioned" finding means the instance has consistently more capacity than the workload uses — high idle CPU, spare memory, light network. These are your savings. Compute Optimizer will suggest a smaller size in the same family (e.g. c5.2xlarge → c5.xlarge) or a cheaper modern/Graviton equivalent. The estimated savings figure it shows is the recovered monthly spend if you adopt the top recommendation.
Treat over-provisioned findings on stateless, horizontally-scaled tiers (web/app servers behind a load balancer, worker pools) as the safest, first wins — downsizing one node in a fleet of ten is low-risk and instantly reversible.
An "under-provisioned" finding means the instance is starved — CPU pinned near 100%, memory pressure, or network saturation. These do not save money; they protect performance and reliability. Always action the under-provisioned list before you start downsizing elsewhere, so that right-sizing improves the fleet rather than just shrinking it. Sometimes the right move is up a size; sometimes it is out (add nodes and scale horizontally).
Each recommendation carries a "performance risk" indicator (how likely the smaller size is to under-serve the workload) and a CPU/memory utilization breakdown at p99.5, not just the average. Read the percentile, not the mean — an instance averaging 15% CPU but spiking to 95% three times a day is not a clean downsize candidate. The percentile view is exactly where teams who right-size on averages alone get burned.
By default Compute Optimizer (and the EC2 console) cannot see memory utilization — the EC2 hypervisor does not expose guest RAM. Until you install the CloudWatch agent to publish a memory metric, every right-sizing recommendation is CPU-and-network only. For Java/JVM, in-memory caches, analytics, and most databases, memory is the binding constraint — right-sizing those on CPU alone is how you downsize a healthy box into constant swapping and a 2am page.
Right-sizing is only as good as the telemetry behind it. There are four signals that decide whether a smaller instance is safe, and you need all four — not just the two that AWS surfaces for free.
The goal is to characterise the workload across a representative window — ideally 2–4 weeks that includes your weekly peak (month-end close, Monday traffic, a marketing send). Look at percentiles (p95/p99), not just averages, and look at the shape over time, not a single snapshot.
Two practical notes. First, burstable T-family instances (t3/t4g) are measured differently — their CPU credit balance matters as much as raw utilization; a T-instance "averaging 10% CPU" can still be throttling if it has exhausted credits. Second, install the CloudWatch agent fleet-wide before you start — retrofitting memory data after you have already downsized defeats the purpose. The agent itself is near-free; the metrics it publishes carry a small per-metric CloudWatch cost that is rounding error against the savings.
EC2 is where right-sizing started, but the same "match the resource to real utilization" discipline applies across the stack. A complete right-sizing pass touches your databases, your block storage, and even your serverless functions — and the non-EC2 wins are frequently larger per-hour than the compute ones.
Treat right-sizing as a footprint exercise, not an EC2 exercise. The four surfaces below are the ones with the most reliable, low-risk savings — and a partner-led audit will sweep all of them in a single engagement rather than leaving three-quarters of the money on the table.
Databases are routinely the most over-provisioned resource on the bill because nobody wants to be the person who under-sized the database. Compute Optimizer now covers RDS, surfacing instance-class recommendations from CloudWatch (CPU, memory, IOPS, connections). The high-value moves: drop a db instance class where utilization is low, switch to a Graviton db class (db.r6g/db.r7g — typically ~10–20% cheaper for similar or better performance), enable storage autoscaling so you stop pre-provisioning headroom, and reserve the right-sized class afterward (RDS still uses Reserved Instances, where EC2 has mostly moved to Savings Plans). For spiky, intermittent databases, Aurora Serverless v2 right-sizes capacity automatically by the second.
EBS right-sizing is almost pure profit because it carries essentially no performance risk. Three moves: (1) migrate gp2 volumes to gp3 — gp3 is ~20% cheaper per GiB and lets you provision IOPS/throughput independently of size, so you stop buying a huge volume just to get IOPS; (2) delete unattached volumes and stale snapshots, which accumulate silently every time an instance is terminated; (3) right-size over-allocated volumes (a 1 TiB volume that is 8% full). Unattached gp3 volumes and orphaned snapshots are one of the most common "free money" findings in any audit.
Lambda has no instance type — but its memory setting is a right-sizing dial, because Lambda allocates CPU proportionally to memory. Over-allocate and you pay for unused GB-seconds; under-allocate and the function runs slower (and can paradoxically cost more, because duration grows). Compute Optimizer produces Lambda recommendations, and AWS Lambda Power Tuning maps the cost/performance curve so you pick the cheapest memory setting that still hits your latency target. Pair this with moving functions to the arm64 (Graviton) architecture for a further ~20% price cut.
Right-sizing is not only about choosing a smaller size in the same family. Half the win is moving to a newer generation or a different architecture that delivers the same work for less money. The cleanest version of this is AWS Graviton.
Two "sideways" moves belong in every right-sizing pass. First, generation upgrades: an m6i or m7i delivers more performance per dollar than the equivalent m5, and an older m4/c4 is almost always worth retiring. Moving up a generation while moving down a size is frequently a net price cut with a net performance gain. Second, architecture: AWS Graviton (ARM-based — the g-suffix families like m7g, c7g, r7g, t4g) delivers roughly 20–40% better price-performance than comparable x86 instances for a wide range of workloads.
Graviton belongs inside the right-sizing conversation rather than as a separate project, because the decision is the same decision: what is the cheapest instance that serves this workload at the required performance? Often the answer is "the Graviton equivalent, one size smaller." For most interpreted and managed runtimes — Python, Node.js, Go, Java, .NET, most containers, plus managed services like RDS, ElastiCache and OpenSearch — moving to Graviton is a configuration change, not a rewrite. The work is in validating that any native dependencies have arm64 builds (today almost all do) and re-running your test suite.
The combined effect compounds. Take a fleet of x86 instances averaging 12% CPU. Right-sizing down a notch recovers ~50% on those boxes; moving the smaller target to Graviton takes another ~20% off the rate. Stack a Compute Savings Plan on the new, accurate baseline and the same workload can land 60–70% below where it started — without touching application logic. That stacking order (right-size → Graviton → commit) is the core of a well-run optimization.
The most expensive sequencing error in AWS FinOps: buying a 3-year Savings Plan or RI against an over-provisioned fleet. You lock in three years of paying for idle capacity. Always right-size and migrate to the cheapest viable family first, then commit to the smaller baseline. A Well-Architected / cost-optimization review will catch this — and a CloudRoute partner sequences it for you.
Right-sizing fails when it is treated as a spreadsheet exercise done once. It works when it is a repeatable loop with a reversible test step. Here is the four-stage loop a disciplined FinOps practitioner (or a CloudRoute partner) runs.
The whole point of the loop is to make downsizing safe and routine instead of scary and rare. Each stage de-risks the next; the test stage is what separates a confident right-sizing program from a gamble.
A right-sizing pass that is not backed by automation and guardrails decays — within two quarters the fleet drifts back to over-provisioned, because the same incentives that caused it the first time are still in place. Sustainable right-sizing needs both the cultural fix and the technical guardrails.
Start with the human reason, because it explains the whole pattern. Teams over-provision because the cost of being too small is visible and personal (an outage, a page, a blamed engineer) while the cost of being too big is invisible and diffuse (a slightly larger AWS bill nobody owns). Engineers, rationally, size up. The fix is partly cultural — make cost visible per team via cost allocation tags and showback so the over-provision has an owner — and partly structural: make downsizing safe and reversible (the test step above) so "size up just in case" stops being the only defensible choice.
Put a cost ceiling and a feedback loop around the fleet. AWS Budgets with alerts catches spend creeping back up; Cost Anomaly Detection flags a sudden jump (a forgotten over-sized instance left running). Cost allocation tags + a Cost Explorer or CUR view per team turn an anonymous bill into an owned one. Together these mean a re-bloat shows up on someone's dashboard within days, not at the next quarterly review.
The durable answer to over-provisioning is to stop sizing for the peak at all. Auto Scaling Groups add and remove capacity with demand, so you run small at the trough and wide at the peak instead of running big all the time. Scheduled scaling shuts non-production environments down nights and weekends (a dev fleet that runs 168 hours/week but is used 50 can be cut ~70% on a schedule). For variable databases, Aurora Serverless v2 right-sizes by the second. And Compute Optimizer recommendations can be pulled via API into a recurring report so right-sizing candidates surface automatically every cycle.
Automation cuts both ways — aggressive downsizing without limits causes incidents. The guardrails that keep it safe: never downsize more than one size step at a time without a test; keep p99 headroom (target peak utilization in the 60–80% band, not 95%+); exclude singleton stateful nodes from automated right-sizing; and always action the under-provisioned list first. The objective is the cheapest fleet that still meets your reliability bar — not the smallest possible fleet.
Right-sizing is conceptually simple and operationally fiddly: it needs clean telemetry, percentile-aware judgement, a safe test path, and someone with the time to roll changes out without breaking production. Most startup teams know they are over-provisioned and still never get to it, because the engineer who could is the engineer shipping the product.
That is the gap CloudRoute fills. We route you to a vetted AWS partner who runs the full measure → recommend → test → apply loop across EC2, RDS, EBS and Lambda — installs the CloudWatch agent so the recommendations include memory, sequences the Graviton moves, and applies the changes behind your normal deploy guardrails. You get the savings without pulling an engineer off the roadmap.
The honest commercial framing — because it is the part founders do not believe at first: this work is frequently AWS-funded. AWS funds partner-led cost-optimization and Well-Architected engagements through its partner programs (the partner is paid by AWS, not by you), and a Well-Architected Review can unlock remediation credits that offset the rework. For qualifying, credit-eligible engagements that means you cut your own bill for $0. Where an engagement does not qualify for AWS funding, it is a vetted-partner referral that pays for itself many times over out of the savings — a 35% cut on an $8K/month EC2 line is ~$34K/year, recurring.
Either way the structure is the same: AWS wants you efficient and committed for the long term, so it underwrites the partner; the partner does the work; CloudRoute is paid by the partner as a routing fee. You stay out of the payment loop and keep the savings. The cross-links below — the cost-optimization hub, Savings Plans (commit after you right-size), Graviton migration, Spot, and the Well-Architected review — are the natural next steps in the same motion.
A representative startup fleet before and after a single right-sizing engagement: downsize over-provisioned compute, move the targets to Graviton, fix the database and EBS, and tune Lambda memory. Rates are illustrative on-demand us-east-1 figures for shape — check the AWS pricing calculator / Cost Explorer for current rates. The point is the pattern, not the exact dollar.
| Resource | Before (sized at launch) | After (right-sized) | Monthly change |
|---|---|---|---|
| Web/app fleet (×6) | 6× m5.2xlarge @ ~$0.384/hr · avg 11% CPU | 6× m7g.xlarge (Graviton, 1 size down) @ ~$0.145/hr | −$1,045/mo (~62%) |
| Worker pool (×4, stateless) | 4× c5.2xlarge @ ~$0.34/hr · avg 18% CPU | 4× c7g.xlarge (Graviton) @ ~$0.145/hr | −$561/mo (~57%) |
| Primary database | db.r5.2xlarge @ ~$1.00/hr · avg 22% CPU/mem | db.r7g.xlarge (Graviton) @ ~$0.48/hr | −$380/mo (~52%) |
| EBS storage | 8 TiB gp2 + 1.2 TiB unattached + stale snaps | 4 TiB gp3 (right-sized), orphans deleted | −$520/mo (~55%) |
| Lambda (event pipeline) | 1769 MB x86, untuned | 512 MB arm64, power-tuned | −$190/mo (~48%) |
| Non-prod environments | Running 168 hrs/week | Scheduled off nights + weekends (~50 hrs/wk used) | −$430/mo (~70%) |
| Fleet total | ~$8,050/mo | ~$4,924/mo | −$3,126/mo (~39%) |
Situation: Bill had grown ~3× in nine months and nobody could say why. Compute had been sized at launch and never revisited; production web/app fleet averaged 9–13% CPU. No CloudWatch agent, so memory was invisible and the team was scared to downsize the JVM-heavy services. The one engineer who understood the infrastructure was fully allocated to a customer-facing launch and could not take two weeks to do a right-sizing pass.
What CloudRoute did: Routed within 20 hours to a US-East AWS partner with a FinOps + Well-Architected track record. Partner ran the engagement as an AWS-funded cost-optimization review: installed the CloudWatch agent fleet-wide, collected 3 weeks of metrics across a month-end peak, then right-sized the stateless fleet down a size and onto m7g/c7g Graviton, moved the primary DB to db.r7g, migrated EBS gp2→gp3 and deleted ~1.2 TiB of orphaned volumes/snapshots, power-tuned the event-pipeline Lambdas to 512 MB arm64, and put non-prod on a nights/weekends schedule. Every change went through a one-node/staging test first; the under-provisioned queue-consumer was sized UP for safety in the same pass.
Outcome: AWS bill dropped from ~$9,400 to ~$5,520/month — a 41% cut (~$46K/year) — with zero customer-visible regression; p99 latency actually improved on the Graviton fleet. The Well-Architected review unlocked remediation credits that covered the rework, and the partner was paid through AWS's partner funding. CloudRoute's fee was paid by the partner. The customer paid $0 and walked away with Budgets alerts + anomaly detection wired up to keep the fleet honest.
engagement window: ~3 weeks · founder/eng time: ~4 hours · recurring savings: ~$46K/year · cost to customer: $0
CloudRoute routes you to a vetted AWS partner who runs the full right-sizing loop across EC2, RDS, EBS and Lambda — safely, behind your deploy guardrails. Often AWS-funded, so you cut the bill for $0.