SageMaker has no licence fee — you pay per second for the compute, plus storage and the managed features you use. That sounds simple until you count the components: notebooks, training jobs, four endpoint types, batch transform, Feature Store, Data Wrangler, and storage each bill differently. This page breaks down every line, gives a representative instance-and-GPU cost table, two worked examples, the Savings-Plans math, and the levers that actually move the bill.
Before any specific rate, understand the structure: SageMaker is pure usage-based pricing with no licence fee, billed per second of compute, with separate meters for each component. Get the shape right and the numbers slot in.
There is no flat subscription or seat fee for SageMaker. Every dollar you spend traces to a resource you turned on: an instance that ran, storage you held, a feature you invoked. Compute is billed per second with no minimum, which is what makes ephemeral training jobs cheap — you pay for exactly the seconds the cluster existed.
The bill decomposes into a handful of independent meters: (1) training-job compute — per instance-second while a job runs; (2) inference compute — the instances behind real-time/async endpoints, or per-inference compute for serverless, or per-job compute for batch transform; (3) notebook/Studio compute — the instances backing interactive work; (4) Feature Store — writes, reads, and storage; (5) Data Wrangler — processing-instance time; (6) Ground Truth — per labeled object; and (7) storage and data transfer — S3 for data and artifacts, plus any provisioned volumes.
Across almost every real account, compute dominates — training instance-seconds and always-on endpoint instance-hours are the big lines; the feature and storage meters are usually rounding error by comparison until you reach large scale. So the rest of this page weights toward compute: which instances, which endpoint mode, and how to cut both.
One caveat stated once and applied throughout: every dollar figure on this page is representative as of 2026. AWS pricing varies by region and changes over time, and GPU instance pricing in particular moves. Use these numbers to reason about the shape and relative magnitude of cost — then confirm exact current rates on the AWS SageMaker pricing page before you budget.
Here is each meter, what triggers it, and how to think about its size. The pattern to notice: the things that run continuously cost the most; the things that run transiently are cheap.
Read this as a checklist of everything that can appear on a SageMaker bill, roughly ordered from biggest typical impact to smallest.
If you remember nothing else: real-time endpoints and notebooks bill while idle; serverless, async, and batch transform scale to zero. The most expensive mistake in SageMaker is leaving an always-on real-time endpoint or a notebook running after you stop using it. Auto-shutdown notebooks; delete test endpoints; default to serverless/batch until steady traffic justifies real-time.
Because compute dominates the bill, the instance you pick is the lever. The table below gives representative on-demand SageMaker rates by instance class so you can reason about relative cost — a high-end GPU can be 50–100× the hourly rate of a small CPU instance.
These figures are representative as of 2026, in the rough mid-range across major US regions, for SageMaker on-demand usage. They are intended for relative reasoning, not budgeting to the cent. Real-time hosting, training, and notebook usage of the same instance type are billed at slightly different SageMaker rates; the magnitudes below are the right ballpark. Always confirm the exact, current, region-specific rate on the AWS SageMaker pricing page.
| Instance class | Example type | Accelerator | Rough $/hour (on-demand) | Typical use |
|---|---|---|---|---|
| Small CPU | ml.t3 / ml.m5.large | None | ~$0.05–$0.15 | Notebooks, light inference, small tabular |
| General CPU | ml.c5 / ml.m5.xlarge–4xlarge | None | ~$0.20–$1.00 | CPU training, batch scoring, feature jobs |
| Entry GPU | ml.g4dn / ml.g5.xlarge | 1× NVIDIA T4 / A10G | ~$0.70–$2.00 | Small-model training, inference, fine-tuning |
| Mid GPU | ml.g5.12xlarge | 4× NVIDIA A10G | ~$5–$8 | Mid-size training, multi-GPU inference |
| High-end GPU | ml.p4d.24xlarge | 8× NVIDIA A100 | ~$30–$40 | Large-model / deep-learning training |
| Top-end GPU | ml.p5.48xlarge | 8× NVIDIA H100 | ~$60–$100+ | Foundation-model training, large distributed jobs |
| AWS silicon (train) | ml.trn1 | AWS Trainium | Lower per-equivalent FLOP than GPU | Cost-efficient large-model training |
| AWS silicon (infer) | ml.inf2 | AWS Inferentia2 | Lower $/inference than equivalent GPU | Cost-efficient high-volume inference |
If your SageMaker usage is steady and predictable, you are leaving money on the table paying pure on-demand. Two mechanisms cut the effective rate: Savings Plans (for steady usage) and Spot (for interruptible training).
SageMaker Savings Plans let you commit to a consistent amount of compute usage — measured in dollars per hour — for a one-year or three-year term, in exchange for a meaningful discount versus on-demand. The plan applies automatically across eligible SageMaker usage: Studio notebooks, training jobs, real-time inference, and processing. The longer the commitment and the more you pay up front, the deeper the discount. The trade-off is commitment risk: if your usage drops below the committed level, you still pay for the commitment, so size it to your reliable baseline, not your peak.
Spot instances for training are the other big lever. Managed Spot training runs your training jobs on spare AWS capacity at a steep discount versus on-demand, automatically checkpointing so an interrupted job resumes rather than restarting. For non-urgent training that can tolerate interruption and restart, this is one of the largest single savings available — often cutting training compute cost by more than half. It applies to training jobs, not to real-time endpoints (you do not want a production endpoint on interruptible capacity).
A practical combined strategy: put your steady baseline (notebooks, the always-on portion of inference) under a Savings Plan, run training on Spot where the schedule allows, and keep spiky or experimental work on serverless/on-demand so you are not committing to capacity you might not use. And if you are credit-funded, the credits stack on top — they apply to the post-discount usage just as they would to on-demand.
These are not either/or. Savings Plans lower the rate you are billed; AWS credits pay the bill. A credit-funded team can still use a Savings Plan to stretch the credits further — the discounted usage simply draws down the credit balance more slowly.
Abstract rates only get you so far. Here are two concrete scenarios — training a model once, and hosting an endpoint around the clock — worked end to end with representative 2026 numbers so you can see how the bill actually assembles.
Both examples are illustrative and use the representative rates from section III. Your real numbers depend on region, instance choice, and current pricing — but the method (instance rate × time × count, plus the small meters) is exactly how you should estimate your own.
Scenario: fine-tune a mid-size model on a single high-end GPU instance (ml.p4d.24xlarge, 8× A100) for one 10-hour training run.
Compute: ~$35/hour × 10 hours = ~$350 for the run on-demand. On managed Spot (say ~60% off), the same run is ~$140.
Storage & data: the training dataset and output artifact in S3 cost cents-to-a-few-dollars for a typical dataset; negligible next to compute.
Total: roughly $140–$350 for the run, depending on Spot vs on-demand. The key insight: training is a spike — you pay for those 10 hours and then nothing, because the cluster is torn down. Run the same fine-tune ten times during development and you are at ~$1.4K–$3.5K, which is exactly the kind of experimentation budget AWS credits are designed to absorb.
Scenario: serve a model on a single entry-GPU real-time endpoint (ml.g5.xlarge, ~$1.40/hour representative) continuously for a month, with no auto-scaling.
Compute: ~$1.40/hour × 24 × ~30 days ≈ ~$1,000/month — and that is whether the endpoint serves a million requests or zero, because real-time endpoints bill for uptime, not usage.
The same workload, re-architected: if traffic is spiky, moving to serverless inference bills only for the compute actually consumed during inference — for an endpoint that is busy only a few hours a day, that can be a fraction of the $1,000. If the work is offline (e.g., nightly scoring), batch transform for an hour a night might be ~$40–$50/month instead of ~$1,000.
Total: the same model can cost ~$1,000/month (real-time, always-on) or a small fraction of that (serverless/batch) depending purely on the endpoint mode. This single decision is why two teams running "the same model" can have wildly different SageMaker bills.
Training cost is a function of how big the instance and how long the run (and Spot vs on-demand). Hosting cost is a function of which endpoint mode far more than which model. Optimize those two and you have optimized most of the SageMaker bill.
There is a fairly stable hierarchy of what actually moves a SageMaker bill. Work down this list in order — the top items dwarf the bottom ones.
In rough order of impact for a typical ML team:
Every cost line on this page is credit-eligible. AWS credit programs apply to SageMaker compute, storage, and features exactly as they apply to the rest of AWS — which means a funded ML team can train and host on credits rather than cash.
AWS funds large credit pools through partner-incentive programs because it wants AI and ML workloads consolidated on AWS for the long term. The relevant pools for SageMaker work: Activate Portfolio (up to ~$100K for institutionally-funded startups, general AWS infrastructure), Bedrock / GenAI PoC funding ($10K–$50K for a defined generative-AI proof-of-concept), and the Generative AI Accelerator (up to $1M for selected AI-first companies). Credits auto-apply to your monthly AWS bill until exhausted — including the SageMaker lines.
The mechanic that makes this free to you: the credit pool is funded by AWS, the engagement is delivered by a vetted AWS partner (who AWS pays through separate partner-incentive programs), and CloudRoute is paid a routing commission by that partner. You — the customer — pay $0. There is no invoice from CloudRoute, no procurement cycle, and no cost passed to you; the structural incentives work without you in the payment loop.
In practice this means an ML team can run its experimentation budget (Example A, repeated dozens of times), stand up its training pipeline, and host its endpoints — all drawing down a credit balance rather than burning runway. And because credits stack on top of Savings Plans and Spot, disciplined cost management makes the credits last dramatically longer. CloudRoute routes you to a partner who both builds the SageMaker workload and files the credit application that funds it.
The full credit-program detail lives in the AWS Credits cluster: $100K AWS credits (the headline tier and its four routes), AWS credits for generative-AI startups, and AWS PoC / Bedrock POC funding explained. This page covers the cost; those cover the funding mechanics.
Most SageMaker overspend traces to a short list of recurring mistakes. Knowing them up front is cheaper than learning them from a surprise invoice.
The endpoint mode is the single biggest cost lever in SageMaker serving. Here is how the same model lands on the bill under each mode — illustrative monthly figures using the representative 2026 entry-GPU rate from section III. Verify live rates on the AWS pricing page.
| Endpoint mode | Billing basis | Bills when idle? | Cold starts? | Illustrative monthly cost* | Best when |
|---|---|---|---|---|---|
| Real-time | Per instance-hour, 24/7 | Yes — always-on | No | ~$1,000 (1× entry GPU, continuous) | Steady, latency-sensitive online traffic |
| Serverless | Per inference compute used | No — scales to zero | Yes (occasional) | A fraction of real-time if busy only part of the day | Spiky / intermittent online traffic |
| Asynchronous | Per instance-time while busy | No — scales to zero | Minimal | Proportional to busy time + queue | Large payloads, long-running inferences |
| Batch transform | Per job instance-time | No — transient job | N/A | ~$40–$50 (1 hr/night offline scoring) | Offline, scheduled, whole-dataset scoring |
Situation: Their SageMaker bill had climbed to ~$14K/month: heavy GPU training during model development plus three always-on real-time endpoints serving customer inference, several of which were over-provisioned. The finance team flagged it as the fastest-growing line in AWS, and the runway math did not support it through the next milestone.
What CloudRoute did: Routed within 24 hours to an EU partner with an ML / cost-optimization track record. The partner ran a SageMaker cost review (moved development training to managed Spot, right-sized two of the three endpoints, shifted nightly batch scoring off a real-time endpoint onto batch transform) and, in parallel, filed an Activate Portfolio credit application plus a GenAI PoC application for the inference workload.
Outcome: The optimization work cut the run-rate from ~$14K to ~$7K/month (Spot training + endpoint right-sizing + batch transform for offline scoring). Credits approved within 16 days then covered the remaining bill — taking the team's effective SageMaker cost to ~$0 through the credit window. CloudRoute's commission was paid by the partner from AWS engagement funding; the customer paid $0.
bill cut: ~50% before credits · then to ~$0 on credits · matched in: < 24h · cost to customer: $0
CloudRoute connects ML and data-science teams with vetted AWS partners who optimize SageMaker spend and file the credit applications that fund training and hosting. Customer pays $0 — AWS funds it.