There are roughly nine real levers that move an AWS bill, and they are not equal. Some cut 20–40% with a weekend of work; others cut 5% and take a quarter of engineering time. This is the definitive, neutral, priority-ordered playbook: the mechanism behind each lever, its typical savings range, the order to pull them in, and the point at which a partner-led — often AWS-funded — audit pays for itself.
The single most expensive mistake in cost work is pulling levers in the wrong order — spending a sprint shaving 4% off Lambda while a 30% commitment win sits untouched. Rank by impact divided by effort, and the order almost writes itself.
An AWS bill is not one number; it is a stack of largely independent cost drivers. Compute (EC2, ECS/Fargate, Lambda) is usually the biggest. Storage (EBS, S3, snapshots) is the slow accumulator. Databases (RDS, Aurora, DynamoDB) are often the most over-provisioned. And then there is the category most teams never look at directly: data movement — inter-AZ traffic, NAT Gateway processing, and egress to the internet. Each driver has its own lever, and the levers do not interfere with one another, which means you can sequence them.
The right mental model is impact ÷ effort. Impact is the dollar reduction; effort is the engineering and risk cost to capture it. A Savings Plan is pure financial paperwork — near-zero engineering, near-zero risk, large impact — so it ranks first despite touching nothing technical. Re-architecting a service onto event-driven Lambda might be the right long-term call, but as a cost lever it is high-effort and slow-paying, so it ranks last. Most teams invert this, because architecture is more interesting than commitment math.
There is also a sequencing rule that saves real money: clean before you commit. Right-size and delete idle resources first, then buy commitments against the smaller, true baseline. Buy a three-year Savings Plan against an over-provisioned fleet and you have locked in the waste for three years. The correct order is cleanup → measure the steady-state floor → commit to that floor → then optimize the variable layer on top. The rest of this playbook follows that order.
One honest caveat up front: percentages in this guide are typical ranges drawn from common account patterns, not guarantees. Your mix of services determines which levers dominate. A GPU-heavy training shop and a CRUD SaaS app have completely different bills, and the same lever can be worth 2% to one and 35% to the other. Use the ranking as a sequence; use your own Cost Explorer breakdown to size each step.
The largest fast win in almost every account. You are paying full on-demand rates for compute you run 24/7 anyway. Committing to that baseline trades a one- or three-year promise for a 20–40% discount, with zero architecture change and effectively zero downside if sized correctly.
AWS sells the same compute at radically different prices depending on whether you commit. On-demand is the default and the most expensive. Commit to a steady hourly dollar amount of usage for one or three years and AWS discounts it — typically 20–30% for a one-year Compute Savings Plan and up to ~40% for three-year, paid up front. The discount is purely financial: the instance behaves identically, the workload doesn’t change, and the saving applies automatically against matching usage.
There are two instruments, and the distinction matters. Savings Plans commit to a dollars-per-hour spend and are the modern default. Compute Savings Plans are the flexible kind — they float across EC2, Fargate, and Lambda, and across instance family, size, OS, tenancy, and Region, so they keep applying even as you change architecture. EC2 Instance Savings Plans are locked to a family in a Region and discount slightly deeper in exchange for that rigidity. Reserved Instances are the older mechanism; for most teams the only RIs still worth buying are for services where Savings Plans don’t apply — notably RDS, ElastiCache, Redshift, and OpenSearch.
The discipline that makes this safe is coverage targeting. Don’t commit to 100% of current usage — commit to the floor you are certain to run regardless of growth or churn. A common target is 70–85% commitment coverage on the steady-state baseline, leaving the top, spiky layer on on-demand or Spot. AWS Cost Explorer’s Savings Plans recommendations will compute this floor for you from your last 7, 30, or 60 days. Start with a one-year Compute Savings Plan at conservative coverage; you can layer additional commitments on top as confidence grows, but you can never unwind one early.
The trap to avoid: buying long, rigid commitments against a bill you haven’t cleaned yet. If 20% of your fleet is over-provisioned, a three-year EC2 Instance Savings Plan against today’s usage cements that 20% waste for three years. This is exactly why commitments rank first on impact but are sequenced after a quick cleanup pass — the two are not in conflict, the order just matters.
20–40% off committed compute. One-year Compute Savings Plan ≈ 20–30%; three-year, all-upfront ≈ up to ~40%. RDS/ElastiCache/Redshift Reserved Instances ≈ 30–55% off those services. Effort: financial, not technical — hours, not sprints. Risk: low if coverage is targeted to the certain baseline.
The average provisioned instance runs at a fraction of its capacity. Right-sizing matches instance type and size to actual utilization — and unlike commitments, it costs nothing and locks in nothing. It is the cleanup that should happen before you commit.
Provisioning is almost always done by guesswork, then never revisited. An engineer picks an m5.2xlarge "to be safe," the service ships, and two years later it’s still an m5.2xlarge averaging 9% CPU and 30% memory. Right-sizing is the systematic correction: look at real utilization over a representative window (2–4 weeks, including peaks), and move each workload to the smallest instance that comfortably serves its actual load. Dropping one size — 2xlarge to xlarge — halves that instance’s cost.
AWS Compute Optimizer is the native engine for this. It ingests CloudWatch metrics and emits per-resource recommendations for EC2, EBS volumes, Lambda memory, ECS-on-Fargate tasks, and Auto Scaling groups, each tagged under-provisioned / optimized / over-provisioned with a projected dollar delta. Enable it (free), give it ~14 days to gather data, and work the over-provisioned list top-down by savings. Memory-based recommendations need the CloudWatch agent installed, since EC2 memory isn’t visible to AWS by default — without it you’re right-sizing on CPU alone and may leave memory waste on the table.
Right-sizing also means changing shape, not just size. A workload pinned on CPU but light on memory belongs on a compute-optimized family (c-series); a caching layer that’s all memory belongs on r-series. Moving to the family that matches the bottleneck often beats simply shrinking within the wrong family. And right-sizing isn’t one-and-done — usage drifts, so the teams that hold their savings re-run Compute Optimizer quarterly rather than treating it as a single project.
Order matters here too: right-size first, then buy commitments against the smaller footprint. Compute Optimizer’s numbers feed directly into a more accurate Savings Plan floor, so these two top levers reinforce each other when sequenced correctly.
15–30% off the compute line in an account that has never been right-sized. Effort: low–medium (read recommendations, validate, restart workloads on new sizes during a maintenance window). Risk: low — keep headroom for peaks and roll changes gradually. Cost to capture: $0.
Two levers that reduce the unit price of compute itself rather than the quantity. Spot trades guaranteed availability for a steep discount on fault-tolerant work; Graviton trades a CPU architecture port for a structural price/performance gain. Both can be large where they apply.
Spot sells AWS’s spare capacity at a discount of typically 60–90% off on-demand, with one condition: AWS can reclaim the instance on two minutes’ notice when it needs the capacity back. That makes Spot ideal for anything fault-tolerant and stateless — CI/CD runners, batch and data processing, rendering, ML training with checkpointing, and stateless web tiers behind a load balancer.
The modern way to run Spot is not to bid on a single instance type but to use a diversified pool. EC2 Auto Scaling and Karpenter (on EKS) let you request "any of these N instance families" and let AWS place you on whatever spare capacity is cheapest and most stable, falling back gracefully and even blending Spot with on-demand in one fleet. The mistake to avoid is putting a stateful primary database or a single-instance service with no failover on Spot — an interruption there is an outage, not a saving.
Graviton is AWS’s own ARM-based processor line. For most general-purpose, compute, and memory workloads, the Graviton-backed instance families deliver materially better price/performance than the equivalent x86 family — AWS positions it at roughly 20% better, and on the right workload the real-world saving lands in a similar band, before any commitment discount stacks on top.
The catch is architecture: your code and dependencies must run on ARM64. For interpreted and managed runtimes — most Python, Node, Java, Go, .NET, and managed services like RDS, ElastiCache, and OpenSearch on Graviton — this is often a config flip with no code change. For anything with compiled native dependencies or container images, you rebuild for ARM64 (or multi-arch) and test. The migration is a real engineering task, but a one-time one, after which the price advantage is permanent and compounds with Spot and Savings Plans.
Spot: 60–90% off the interruptible portion of compute (CI, batch, training, stateless tiers). Graviton: ~20% better price/performance on ported workloads, permanent and stackable with commitments. Effort: low for Spot on existing autoscaling; medium for Graviton (rebuild + test on ARM64).
Storage rarely spikes, so it rarely gets attention — and that’s exactly why it bloats. The lever is matching each byte to the cheapest tier that meets its real access pattern, then deleting the bytes nobody reads at all.
S3 has a ladder of storage classes priced by access frequency. S3 Standard is the default and most expensive per GB. Below it sit Standard-Infrequent Access, One Zone-IA, the Glacier tiers for archival, and — the lever most teams should reach for first — S3 Intelligent-Tiering, which monitors access per object and automatically demotes cold objects to cheaper tiers and promotes them back on access, for a tiny monitoring fee. For any bucket with mixed or unpredictable access patterns, Intelligent-Tiering captures most of the available saving with zero ongoing management and no retrieval-fee risk on the data that does get touched.
EBS is the quieter line. The lever there is twofold: migrate older gp2 volumes to gp3, which is cheaper per GB and lets you provision IOPS and throughput independently of size (so you stop paying for a giant volume just to get its bundled IOPS), and delete the snapshots nobody needs. Snapshot sprawl is endemic — automated daily snapshots accumulate for years, and orphaned snapshots from long-deleted volumes keep billing silently. A lifecycle policy that ages snapshots out and a one-time sweep of orphans both pay off immediately.
The cleanup half of this lever is pure profit: there is no tradeoff in deleting storage that serves no purpose. Unattached EBS volumes left behind by terminated instances, old AMIs and their backing snapshots, incomplete multipart uploads quietly accruing in S3, and forgotten log buckets with no expiration policy. An S3 Lifecycle policy that expires logs after N days and aborts stale multipart uploads turns a perpetually growing line into a flat one.
20–60% off the storage line. Intelligent-Tiering or IA on cold S3 data ≈ 40–70% off those objects; gp2→gp3 ≈ ~20% off those volumes plus decoupled IOPS; snapshot/orphan cleanup is pure recovery. Effort: low (lifecycle policies + one cleanup sweep). Risk: low with correct lifecycle rules.
The lever almost nobody pulls deliberately, because the cost hides across thousands of tiny line items. On a chatty, multi-AZ, microservice architecture, data movement and NAT processing can quietly be 10–25% of the bill — and most of it is avoidable.
AWS charges for data movement in ways that are easy to design into a bill by accident. Traffic between Availability Zones is billed in both directions. Traffic out to the internet (egress) is billed per GB and is one of AWS’s higher-margin lines. Same-AZ traffic over private IPs is free, and inbound is free — so the cost is a direct function of how much your architecture moves data across zones and out to the world. A microservice mesh that scatters chatty services across AZs without thought can generate enormous inter-AZ charges for traffic that, with AZ-aware placement, would have been free.
The NAT Gateway is the single most common surprise on the transfer line. Resources in private subnets reach the internet through a NAT Gateway, which bills an hourly charge plus a per-GB processing charge on every byte that passes through it. The per-GB processing fee is the trap: route all your S3 reads, ECR image pulls, CloudWatch logs, and other AWS-bound traffic through the NAT and you pay a processing fee on traffic that never needed to leave AWS at all. The fix is VPC Gateway Endpoints for S3 and DynamoDB (free) and Interface Endpoints (PrivateLink) for other services, which route that traffic privately and bypass the NAT processing charge entirely. A single S3 Gateway Endpoint on a data-heavy account can cut the NAT bill dramatically on its own.
There are two more high-leverage moves on this line. First, put a CDN (CloudFront) in front of high-volume egress — CloudFront’s per-GB rates are lower than direct S3/EC2 egress, and it offsets origin transfer, so heavy public download or media workloads get cheaper and faster at once. Second, audit cross-AZ chatter and co-locate the chattiest service-to-service paths within an AZ where availability requirements allow. None of this changes what the application does; it changes the path the bytes take, and the bytes are what AWS meters.
10–25% of the bill on transfer-heavy / multi-AZ / microservice architectures. VPC Gateway + Interface Endpoints can erase most NAT processing charges; CloudFront cuts high-volume egress; AZ-aware placement removes avoidable cross-AZ traffic. Effort: low–medium (mostly networking config). Often the most overlooked lever in the whole account.
Two of the fastest wins in the playbook, both pure waste-elimination with effectively no architectural tradeoff. Cleanup removes resources nobody uses; scheduling switches off resources nobody uses at night.
Every account accumulates resources that bill 24/7 while doing nothing. The usual suspects: unattached Elastic IPs (an idle EIP bills hourly), idle load balancers with no healthy targets, unattached EBS volumes and old snapshots, over-provisioned NAT Gateways in dev VPCs nobody uses, forgotten RDS instances from a finished project, and dormant non-prod environments left running after a launch. AWS Trusted Advisor and Cost Explorer’s resource-level views surface most of these; a quarterly sweep keeps them from re-accumulating.
The reason this lever is so attractive is that there is no tradeoff to weigh — you are deleting things that produce zero value and pure cost. The only discipline required is confidence that a resource is truly orphaned, which good tagging (below) makes trivial. On a neglected account, a single cleanup sweep routinely recovers several percent of the bill in an afternoon.
Development, staging, QA, and test environments are typically used during business hours, roughly 40–50 hours a week — yet they usually run 168 hours a week. Scheduling them to stop outside working hours cuts their compute and RDS cost by roughly two-thirds, because you stop paying for ~120 idle hours. AWS Instance Scheduler (a supported solution) or simple tag-driven Lambda/EventBridge automations start and stop instances and RDS databases on a defined calendar.
This lever applies only to non-production — you obviously don’t schedule production down. But for organizations where non-prod is a meaningful share of total spend (often 20–40% in active engineering shops), shutting it off two-thirds of the time is one of the highest impact-÷-effort moves available, and it carries essentially no risk because the environments are non-critical and start back up on schedule.
Cleanup: 3–10% of a neglected bill, recovered in hours, zero tradeoff. Scheduling non-prod: ~65% off the affected non-prod compute/RDS by running it ~50 hours instead of 168. Effort: low for both. Risk: near-zero (non-prod only / orphaned only).
The lever that doesn’t cut cost directly but stops every other saving from silently eroding. Without allocation, alerts, and ownership, an optimized account drifts back to bloat within a year. Governance is what makes FinOps a state rather than a one-off project.
It all starts with cost allocation tags. Until every resource is tagged by team, environment, product, and cost center, "the bill" is one undifferentiated number and no one owns any of it. With a consistent tagging scheme activated in the Billing console, Cost Explorer can slice spend by any dimension — so you can see that one team’s staging environment is 18% of the bill, or that an untagged "miscellaneous" bucket is quietly the third-largest line. Tagging produces no savings on its own; it produces the visibility and accountability that make every other lever findable and assignable. AWS Organizations tag policies and Service Control Policies can enforce tagging so resources can’t be created untagged.
AWS Budgets closes the loop on intent. A budget with alert thresholds (say, notify at 80% and 100% of plan) turns runaway spend into an email or Slack alert before the invoice lands, not after. Budgets can track total spend, a specific service, a specific tag, or even Savings Plan and Reserved Instance coverage and utilization — so you can be alerted not just when spend is high but when your commitment coverage drops, which is the early warning that you’re leaking back to on-demand.
AWS Cost Anomaly Detection is the automated watchdog. It learns each service’s normal spend pattern with machine learning and alerts when a service deviates — a misconfigured cron hammering a NAT Gateway, a runaway data pipeline, a left-on GPU instance, a leaked credential spinning up crypto-mining instances. It catches the spikes a monthly budget review would miss until it’s already expensive, and it’s free to enable. Together, tagging + Budgets + Anomaly Detection convert cost optimization from a heroic quarterly cleanup into a continuously enforced baseline — which is the entire point of FinOps as a discipline.
Governance produces ~0% direct savings — and protects 100% of the savings from every lever above it. An account that’s right-sized, committed, and cleaned but ungoverned drifts back toward ~30% waste within a year. Tagging makes waste findable; Budgets make intent enforceable; Anomaly Detection catches the spike before the invoice. Effort: low–medium, one-time setup. Payoff: durability.
Everything above is doable in-house, and at small scale it should be. But there’s a crossover point where a specialist audit recovers far more than it costs — and a structural reason it frequently costs the customer nothing at all.
The in-house ceiling is real. The native tools — Cost Explorer, Compute Optimizer, Trusted Advisor, the CUR — surface the obvious levers, but the deeper savings (commitment-laddering across multiple plan types, Spot diversification on EKS, Graviton portability assessment, the full transfer/endpoint redesign) require both specialist knowledge and dedicated time that an engineering team rarely has to spare. A founder’s cloud lead at 70% allocated to shipping product is not going to model a three-instrument commitment ladder. The opportunity cost of DIY past a certain bill size is the engineering hours plus the savings left uncaptured.
As a rough rule of thumb, a partner-led audit starts paying for itself around $15K–$25K/month in AWS spend. Below that, the native tools plus this playbook usually capture most of the available saving. Above it, the absolute dollars at stake — a 25–35% reduction on a $300K/year bill is $75K–$105K a year — dwarf the cost of expert help, and the optimization is no longer a one-afternoon job but an ongoing program worth resourcing properly.
The part most teams don’t know: AWS itself funds much of this work. AWS runs Well-Architected Framework reviews (the Cost Optimization pillar is one of six) and partner-led optimization and Optimization and Licensing Assessment (OLA) engagements, and it offers funding and credits to certified partners to perform them — because a healthy, optimized, sticky customer is worth far more to AWS long-term than the short-term margin on waste. Routed through the right partner, a structured cost audit is frequently delivered at $0 to the customer, with AWS covering the partner’s engagement through these programs. That inverts the usual calculus: the question isn’t whether the savings justify the audit fee, it’s whether you’d rather keep paying the waste than accept a funded review.
The honest framing: a partner-led audit is not magic — it pulls the same nine levers in this playbook, just faster, deeper, and with the commitment math done by people who do it daily. What it changes is the slope and the cost. For a team under $15K/month with engineering slack, this guide and the native tools are enough. For a team above it, especially one that can get the audit AWS-funded, declining is usually the more expensive choice.
Both pull the same levers. The difference is depth, speed, the engineering hours you spend, and — often decisively — who pays. Below roughly $15K/month, DIY captures most of the win. Above it, a funded audit usually nets more even after the savings split.
| Variable | DIY first pass | Partner-led audit (often AWS-funded) |
|---|---|---|
| Best fit | Spend < ~$15K/month, engineering slack | Spend > ~$15K–$25K/month, no slack |
| Levers covered | The obvious ones (commitments, right-sizing, cleanup) | All nine, incl. commitment laddering + transfer redesign |
| Time to first savings | Days to weeks (your hours) | 2–4 week structured assessment |
| Engineering cost | Your team’s time (the hidden cost) | Minimal — partner does the analysis |
| Commitment math | Cost Explorer recommendations | Multi-instrument ladder, modeled |
| Ongoing discipline | You build the governance | Partner sets up tagging/Budgets/anomaly + handoff |
| Cost to you | $0 (your time) | Frequently $0 — AWS funds via Well-Architected / partner programs |
Situation: Bill had grown ~40% in two quarters with no clear cause. All compute on-demand. Never right-sized. Heavy cross-AZ chatter between services and all egress through a single NAT Gateway. No tagging, no budgets, no anomaly alerts. The lone platform engineer was fully allocated to product and had no time to model commitments or redesign the network path.
What CloudRoute did: Routed within 24 hours to an AWS partner with a FinOps + EKS track record who ran a Well-Architected Cost pillar review. Sequence: cleanup + right-sizing via Compute Optimizer first; then a one-year Compute Savings Plan at 75% coverage on the corrected baseline; S3 + DynamoDB Gateway Endpoints and PrivateLink to kill NAT processing charges; Karpenter Spot pools for stateless and CI workloads; gp2→gp3 and snapshot lifecycle; then tagging, Budgets, and Cost Anomaly Detection for durability. The Well-Architected engagement was AWS-funded.
Outcome: Steady-state bill down ~34% (≈$7.5K/month, ~$90K/year) within six weeks, with governance in place so it holds. NAT processing charges fell ~80% from the endpoints alone. CloudRoute’s commission was paid by the partner from AWS’s engagement funding — the customer paid $0 for the audit.
engagement window: 6 weeks · founder time: ~6 hours · run-rate cut: ~34% (~$90K/yr) · cost to customer: $0
CloudRoute routes you to a vetted AWS partner who runs the cost audit — commitments, right-sizing, Spot/Graviton, the transfer/NAT traps, governance. Often AWS-funded, so the customer pays $0. No procurement, no discovery theater.