Amazon SageMaker pricing · full 2026 breakdown

Amazon SageMaker pricing — every component, broken down (2026).

SageMaker has no licence fee — you pay per second for the compute, plus storage and the managed features you use. That sounds simple until you count the components: notebooks, training jobs, four endpoint types, batch transform, Feature Store, Data Wrangler, and storage each bill differently. This page breaks down every line, gives a representative instance-and-GPU cost table, two worked examples, the Savings-Plans math, and the levers that actually move the bill.

licence fee
$0
billing granularity
per second
biggest lever
GPU + endpoint mode
credits to cover it
up to $1M
TL;DR
  • SageMaker has no flat licence fee. You pay for what you use, billed per second: training-job compute, endpoint/notebook compute, batch-transform jobs, Feature Store reads/writes/storage, Data Wrangler processing, and S3 storage. The compute lines dominate — and the GPU instance you pick plus the endpoint mode you choose move the bill more than anything else.
  • The single most common cause of a surprise SageMaker bill is an always-on real-time endpoint (or an idle Studio notebook) left running. Real-time endpoints bill 24/7 whether traffic arrives or not; serverless and batch transform scale to zero. Matching the endpoint mode to the traffic is the highest-leverage cost decision you make.
  • SageMaker Savings Plans cut compute cost meaningfully (commit $/hour for 1 or 3 years), and Spot training can cut training cost substantially. Better still: AWS credits (Activate up to $100K, Bedrock/GenAI PoC $10K–$50K, GenAI Accelerator up to $1M) cover SageMaker compute, storage, and features. CloudRoute routes you to the partner who files them — you pay $0; AWS funds it.
the model

IHow SageMaker billing works — the shape before the numbers

Before any specific rate, understand the structure: SageMaker is pure usage-based pricing with no licence fee, billed per second of compute, with separate meters for each component. Get the shape right and the numbers slot in.

There is no flat subscription or seat fee for SageMaker. Every dollar you spend traces to a resource you turned on: an instance that ran, storage you held, a feature you invoked. Compute is billed per second with no minimum, which is what makes ephemeral training jobs cheap — you pay for exactly the seconds the cluster existed.

The bill decomposes into a handful of independent meters: (1) training-job compute — per instance-second while a job runs; (2) inference compute — the instances behind real-time/async endpoints, or per-inference compute for serverless, or per-job compute for batch transform; (3) notebook/Studio compute — the instances backing interactive work; (4) Feature Store — writes, reads, and storage; (5) Data Wrangler — processing-instance time; (6) Ground Truth — per labeled object; and (7) storage and data transfer — S3 for data and artifacts, plus any provisioned volumes.

Across almost every real account, compute dominates — training instance-seconds and always-on endpoint instance-hours are the big lines; the feature and storage meters are usually rounding error by comparison until you reach large scale. So the rest of this page weights toward compute: which instances, which endpoint mode, and how to cut both.

One caveat stated once and applied throughout: every dollar figure on this page is representative as of 2026. AWS pricing varies by region and changes over time, and GPU instance pricing in particular moves. Use these numbers to reason about the shape and relative magnitude of cost — then confirm exact current rates on the AWS SageMaker pricing page before you budget.

line by line

IIPricing by component

Here is each meter, what triggers it, and how to think about its size. The pattern to notice: the things that run continuously cost the most; the things that run transiently are cheap.

Read this as a checklist of everything that can appear on a SageMaker bill, roughly ordered from biggest typical impact to smallest.

  • Training jobs — Billed per instance-second for every instance in the job, from launch to completion. Cost = (per-hour instance rate ÷ 3600) × seconds × instance count. A short job on a small GPU is a few dollars; a multi-day distributed job on many high-end GPUs is the largest single line a research team will see. Managed Spot training can cut this substantially in exchange for interruptibility.
  • Real-time endpoints — Billed per instance-hour for as long as the endpoint exists, 24/7, regardless of whether requests arrive. This is the most expensive idle mode and the #1 source of surprise bills. Cost = per-hour instance rate × instance count × hours up. Auto-scaling adds instances under load (and cost).
  • Serverless inference — Billed for the compute consumed per inference (memory-size × duration) plus the number of requests; scales to zero when idle, so you pay nothing between bursts. Cheaper than real-time for spiky or intermittent traffic; trade-off is occasional cold-start latency.
  • Asynchronous inference — Billed per instance-time while the endpoint is processing the queue; can scale to zero between bursts. Suits large payloads and long-running inferences where you do not need a synchronous response.
  • Batch transform — Billed per instance-second of the transient job that scores a dataset — no persistent endpoint, so nothing bills between runs. Usually the cheapest way to score a whole dataset offline (e.g., nightly).
  • Studio / notebook compute — Billed per instance-hour for the compute backing your interactive notebooks and IDE apps while they are running. An idle-but-still-running notebook keeps billing — set auto-shutdown. (There is no charge for the SageMaker domain itself, only the compute apps inside it.)
  • Feature Store — Billed for writes and reads to the online store plus storage in the offline store. Modest for most workloads; can grow with very high-throughput real-time feature serving.
  • Data Wrangler — Billed per hour for the processing instance used during interactive data prep and for the processing job when you run a transformation flow at scale.
  • Ground Truth (labeling) — Billed per labeled object, plus the human-workforce cost if you use a vendor workforce or Mechanical Turk. Automated/active-learning labeling reduces the count of objects that need human review.
  • Storage & data transfer — S3 storage for datasets and model artifacts at standard S3 rates, plus any EBS volumes attached to instances, plus standard AWS data-transfer charges. Small relative to compute but non-zero at scale.
the one rule that saves the most money

If you remember nothing else: real-time endpoints and notebooks bill while idle; serverless, async, and batch transform scale to zero. The most expensive mistake in SageMaker is leaving an always-on real-time endpoint or a notebook running after you stop using it. Auto-shutdown notebooks; delete test endpoints; default to serverless/batch until steady traffic justifies real-time.

compute rates

IIIInstance family & GPU cost table (representative 2026)

Because compute dominates the bill, the instance you pick is the lever. The table below gives representative on-demand SageMaker rates by instance class so you can reason about relative cost — a high-end GPU can be 50–100× the hourly rate of a small CPU instance.

These figures are representative as of 2026, in the rough mid-range across major US regions, for SageMaker on-demand usage. They are intended for relative reasoning, not budgeting to the cent. Real-time hosting, training, and notebook usage of the same instance type are billed at slightly different SageMaker rates; the magnitudes below are the right ballpark. Always confirm the exact, current, region-specific rate on the AWS SageMaker pricing page.

representative SageMaker on-demand instance rates · 2026 · verify live on AWS pricing page
Instance classExample typeAcceleratorRough $/hour (on-demand)Typical use
Small CPUml.t3 / ml.m5.largeNone~$0.05–$0.15Notebooks, light inference, small tabular
General CPUml.c5 / ml.m5.xlarge–4xlargeNone~$0.20–$1.00CPU training, batch scoring, feature jobs
Entry GPUml.g4dn / ml.g5.xlarge1× NVIDIA T4 / A10G~$0.70–$2.00Small-model training, inference, fine-tuning
Mid GPUml.g5.12xlarge4× NVIDIA A10G~$5–$8Mid-size training, multi-GPU inference
High-end GPUml.p4d.24xlarge8× NVIDIA A100~$30–$40Large-model / deep-learning training
Top-end GPUml.p5.48xlarge8× NVIDIA H100~$60–$100+Foundation-model training, large distributed jobs
AWS silicon (train)ml.trn1AWS TrainiumLower per-equivalent FLOP than GPUCost-efficient large-model training
AWS silicon (infer)ml.inf2AWS Inferentia2Lower $/inference than equivalent GPUCost-efficient high-volume inference
AWS's own silicon — Trainium for training and Inferentia for inference, programmed via the Neuron SDK — is positioned as cheaper per unit of work than equivalent NVIDIA GPU instances, and is a major cost lever for large or high-volume workloads. See the dedicated Trainium and Inferentia pages. All figures representative for 2026; verify on the AWS pricing page.
commitment discounts

IVSageMaker Savings Plans & Spot — committing to cut the rate

If your SageMaker usage is steady and predictable, you are leaving money on the table paying pure on-demand. Two mechanisms cut the effective rate: Savings Plans (for steady usage) and Spot (for interruptible training).

SageMaker Savings Plans let you commit to a consistent amount of compute usage — measured in dollars per hour — for a one-year or three-year term, in exchange for a meaningful discount versus on-demand. The plan applies automatically across eligible SageMaker usage: Studio notebooks, training jobs, real-time inference, and processing. The longer the commitment and the more you pay up front, the deeper the discount. The trade-off is commitment risk: if your usage drops below the committed level, you still pay for the commitment, so size it to your reliable baseline, not your peak.

Spot instances for training are the other big lever. Managed Spot training runs your training jobs on spare AWS capacity at a steep discount versus on-demand, automatically checkpointing so an interrupted job resumes rather than restarting. For non-urgent training that can tolerate interruption and restart, this is one of the largest single savings available — often cutting training compute cost by more than half. It applies to training jobs, not to real-time endpoints (you do not want a production endpoint on interruptible capacity).

A practical combined strategy: put your steady baseline (notebooks, the always-on portion of inference) under a Savings Plan, run training on Spot where the schedule allows, and keep spiky or experimental work on serverless/on-demand so you are not committing to capacity you might not use. And if you are credit-funded, the credits stack on top — they apply to the post-discount usage just as they would to on-demand.

Savings Plans vs credits

These are not either/or. Savings Plans lower the rate you are billed; AWS credits pay the bill. A credit-funded team can still use a Savings Plan to stretch the credits further — the discounted usage simply draws down the credit balance more slowly.

real numbers

VTwo worked examples

Abstract rates only get you so far. Here are two concrete scenarios — training a model once, and hosting an endpoint around the clock — worked end to end with representative 2026 numbers so you can see how the bill actually assembles.

Both examples are illustrative and use the representative rates from section III. Your real numbers depend on region, instance choice, and current pricing — but the method (instance rate × time × count, plus the small meters) is exactly how you should estimate your own.

Example A — Train a model once

Scenario: fine-tune a mid-size model on a single high-end GPU instance (ml.p4d.24xlarge, 8× A100) for one 10-hour training run.

Compute: ~$35/hour × 10 hours = ~$350 for the run on-demand. On managed Spot (say ~60% off), the same run is ~$140.

Storage & data: the training dataset and output artifact in S3 cost cents-to-a-few-dollars for a typical dataset; negligible next to compute.

Total: roughly $140–$350 for the run, depending on Spot vs on-demand. The key insight: training is a spike — you pay for those 10 hours and then nothing, because the cluster is torn down. Run the same fine-tune ten times during development and you are at ~$1.4K–$3.5K, which is exactly the kind of experimentation budget AWS credits are designed to absorb.

Example B — Host an endpoint 24/7

Scenario: serve a model on a single entry-GPU real-time endpoint (ml.g5.xlarge, ~$1.40/hour representative) continuously for a month, with no auto-scaling.

Compute: ~$1.40/hour × 24 × ~30 days ≈ ~$1,000/month — and that is whether the endpoint serves a million requests or zero, because real-time endpoints bill for uptime, not usage.

The same workload, re-architected: if traffic is spiky, moving to serverless inference bills only for the compute actually consumed during inference — for an endpoint that is busy only a few hours a day, that can be a fraction of the $1,000. If the work is offline (e.g., nightly scoring), batch transform for an hour a night might be ~$40–$50/month instead of ~$1,000.

Total: the same model can cost ~$1,000/month (real-time, always-on) or a small fraction of that (serverless/batch) depending purely on the endpoint mode. This single decision is why two teams running "the same model" can have wildly different SageMaker bills.

the takeaway from both

Training cost is a function of how big the instance and how long the run (and Spot vs on-demand). Hosting cost is a function of which endpoint mode far more than which model. Optimize those two and you have optimized most of the SageMaker bill.

optimization

VICost-optimization levers, ranked

There is a fairly stable hierarchy of what actually moves a SageMaker bill. Work down this list in order — the top items dwarf the bottom ones.

In rough order of impact for a typical ML team:

  • 1 · Kill idle real-time endpoints and notebooks — The highest-leverage and most-ignored lever. An always-on endpoint or an un-shut-down notebook bills 24/7. Auto-shutdown notebooks, delete test endpoints the moment an experiment ends, and audit for "zombie" endpoints monthly. This alone often cuts a wasteful bill by the largest single margin.
  • 2 · Match the endpoint mode to the traffic — Serverless for spiky online, batch transform for offline scoring, async for large/slow inferences, real-time only for steady latency-sensitive traffic. As Example B shows, this can be a 10–20× difference for the same model.
  • 3 · Use Spot for training — Managed Spot training cuts training compute substantially for interruptible jobs, with automatic checkpointing. For development and non-urgent training, this is close to free money.
  • 4 · Right-size the instance — Do not train on an 8×H100 box what fits on a single A10G; do not host on a GPU what runs fine on CPU. Profile the workload and pick the smallest instance that meets the latency/throughput target.
  • 5 · Consider AWS silicon (Trainium / Inferentia) — For large training or high-volume inference, Trainium (training) and Inferentia (inference) are positioned as cheaper per unit of work than equivalent NVIDIA GPUs, via the Neuron SDK. The migration cost is real but the per-job savings compound at scale.
  • 6 · Buy a Savings Plan for the steady baseline — Once usage is predictable, commit your reliable baseline to a 1- or 3-year Savings Plan for a discounted rate. Size to baseline, not peak.
  • 7 · Use auto-scaling on real-time endpoints — Where you must run real-time, configure auto-scaling so you run the minimum instances off-peak and add capacity only under load — rather than provisioning for peak 24/7.
  • 8 · Fund it with AWS credits — The lever that takes the bill to $0 for eligible teams. Credits apply to all of the above usage; combined with right-sizing and Savings Plans, they stretch much further. Covered next.
the $0 path

VIIHow AWS credits fund SageMaker training & hosting

Every cost line on this page is credit-eligible. AWS credit programs apply to SageMaker compute, storage, and features exactly as they apply to the rest of AWS — which means a funded ML team can train and host on credits rather than cash.

AWS funds large credit pools through partner-incentive programs because it wants AI and ML workloads consolidated on AWS for the long term. The relevant pools for SageMaker work: Activate Portfolio (up to ~$100K for institutionally-funded startups, general AWS infrastructure), Bedrock / GenAI PoC funding ($10K–$50K for a defined generative-AI proof-of-concept), and the Generative AI Accelerator (up to $1M for selected AI-first companies). Credits auto-apply to your monthly AWS bill until exhausted — including the SageMaker lines.

The mechanic that makes this free to you: the credit pool is funded by AWS, the engagement is delivered by a vetted AWS partner (who AWS pays through separate partner-incentive programs), and CloudRoute is paid a routing commission by that partner. You — the customer — pay $0. There is no invoice from CloudRoute, no procurement cycle, and no cost passed to you; the structural incentives work without you in the payment loop.

In practice this means an ML team can run its experimentation budget (Example A, repeated dozens of times), stand up its training pipeline, and host its endpoints — all drawing down a credit balance rather than burning runway. And because credits stack on top of Savings Plans and Spot, disciplined cost management makes the credits last dramatically longer. CloudRoute routes you to a partner who both builds the SageMaker workload and files the credit application that funds it.

where to read the credit mechanics in full

The full credit-program detail lives in the AWS Credits cluster: $100K AWS credits (the headline tier and its four routes), AWS credits for generative-AI startups, and AWS PoC / Bedrock POC funding explained. This page covers the cost; those cover the funding mechanics.

avoid these

VIIIThe pricing mistakes that cost teams the most

Most SageMaker overspend traces to a short list of recurring mistakes. Knowing them up front is cheaper than learning them from a surprise invoice.

  • Leaving a real-time endpoint running after an experiment — The classic. A test endpoint on a GPU instance left up for a month is ~$1,000 of pure waste. Delete endpoints the moment you are done; audit monthly.
  • Defaulting to real-time when serverless or batch would do — Reaching for the always-on mode out of habit when the traffic is spiky or offline. Match the mode to the traffic shape (section IV / Example B).
  • Training on a bigger GPU than the job needs — Provisioning an 8×H100 box for a job that fits on one A10G burns 30–50× the rate. Right-size first.
  • Not using Spot for interruptible training — Paying full on-demand for development training runs that could tolerate interruption — leaving more than half the training cost on the table.
  • Buying a Savings Plan sized to peak, not baseline — Over-committing means paying for capacity you do not use. Size the commitment to your reliable baseline and keep variable usage on-demand/serverless.
  • Forgetting to shut down Studio notebooks — Idle notebook compute bills per hour. Enable auto-shutdown so a forgotten notebook does not run all weekend.
  • Budgeting without checking current GPU rates — GPU pricing moves and varies by region. Estimating from a stale number — including the representative ones on this page — without confirming on the AWS pricing page leads to a budget that is off.
the cost decision that matters most

Endpoint cost comparison — same model, four ways to serve it

The endpoint mode is the single biggest cost lever in SageMaker serving. Here is how the same model lands on the bill under each mode — illustrative monthly figures using the representative 2026 entry-GPU rate from section III. Verify live rates on the AWS pricing page.

Endpoint modeBilling basisBills when idle?Cold starts?Illustrative monthly cost*Best when
Real-timePer instance-hour, 24/7Yes — always-onNo~$1,000 (1× entry GPU, continuous)Steady, latency-sensitive online traffic
ServerlessPer inference compute usedNo — scales to zeroYes (occasional)A fraction of real-time if busy only part of the daySpiky / intermittent online traffic
AsynchronousPer instance-time while busyNo — scales to zeroMinimalProportional to busy time + queueLarge payloads, long-running inferences
Batch transformPer job instance-timeNo — transient jobN/A~$40–$50 (1 hr/night offline scoring)Offline, scheduled, whole-dataset scoring
*Illustrative, representative 2026 figures for a single entry-GPU-class workload — for relative reasoning only, not budgeting. The same model can differ by 10–20× in monthly cost purely on endpoint mode. Confirm current rates on the AWS SageMaker pricing page.
why pay the SageMaker bill in cash?
Cover SageMaker training and hosting with AWS credits — pay $0
Get matched in 24h →
a recent match

A SageMaker bill, taken to $0 — anonymized

inquiry · Series-A computer-vision SaaS, Germany
Series-A computer-vision SaaS, 22 people, training and serving custom defect-detection models on SageMaker

Situation: Their SageMaker bill had climbed to ~$14K/month: heavy GPU training during model development plus three always-on real-time endpoints serving customer inference, several of which were over-provisioned. The finance team flagged it as the fastest-growing line in AWS, and the runway math did not support it through the next milestone.

What CloudRoute did: Routed within 24 hours to an EU partner with an ML / cost-optimization track record. The partner ran a SageMaker cost review (moved development training to managed Spot, right-sized two of the three endpoints, shifted nightly batch scoring off a real-time endpoint onto batch transform) and, in parallel, filed an Activate Portfolio credit application plus a GenAI PoC application for the inference workload.

Outcome: The optimization work cut the run-rate from ~$14K to ~$7K/month (Spot training + endpoint right-sizing + batch transform for offline scoring). Credits approved within 16 days then covered the remaining bill — taking the team's effective SageMaker cost to ~$0 through the credit window. CloudRoute's commission was paid by the partner from AWS engagement funding; the customer paid $0.

bill cut: ~50% before credits · then to ~$0 on credits · matched in: < 24h · cost to customer: $0

faq

Common questions

How much does Amazon SageMaker cost?
SageMaker has no licence or subscription fee — you pay only for the compute (billed per second), storage, and managed features you use. The cost is dominated by compute: training-job instance-seconds and the instances behind your endpoints. A one-off fine-tuning run on a high-end GPU might be a few hundred dollars; an always-on real-time endpoint on an entry GPU is roughly $1,000/month; serverless and batch-transform serving cost a fraction of that for the same model. All figures are representative for 2026 — confirm current, region-specific rates on the AWS SageMaker pricing page.
Why is my SageMaker bill so high?
The overwhelmingly common cause is an always-on real-time endpoint (or an idle Studio notebook) left running — those bill 24/7 whether or not they are used. Other frequent causes: training on a larger GPU than the job needs, defaulting to real-time when serverless or batch transform would fit the traffic, and not using Spot for interruptible training. Audit for idle endpoints and notebooks first; that single fix usually accounts for the largest share of a surprise bill.
What is the cheapest way to host a model on SageMaker?
It depends on the traffic. For offline, whole-dataset scoring, batch transform is usually cheapest (a transient job, nothing bills between runs). For spiky or intermittent online traffic, serverless inference is typically cheapest because it scales to zero. Real-time endpoints are the most expensive when idle and only make sense for steady, latency-sensitive traffic. Matching the endpoint mode to the traffic is the single biggest cost lever in SageMaker serving.
How do SageMaker Savings Plans work?
A SageMaker Savings Plan is a commitment to a steady amount of compute usage — measured in dollars per hour — for a one-year or three-year term, in exchange for a discount versus on-demand. It applies automatically across eligible usage (Studio, training, real-time inference, processing). Deeper discounts come from longer terms and more up-front payment. Size the commitment to your reliable baseline usage, not your peak, because you pay for the commitment even if usage drops below it.
Is SageMaker or Bedrock cheaper?
They bill on different bases, so it depends on the workload. Bedrock charges per token with no infrastructure to manage — cheap to start and to run for moderate volumes of calls to an existing foundation model. SageMaker charges per instance-second of compute — which can be cheaper at very high, steady inference volume (especially on AWS silicon) but more expensive to operate and easy to overspend on if endpoints sit idle. For a generative-AI feature over an existing model, Bedrock is usually the cheaper and faster path; for high-volume custom-model serving, SageMaker with right-sized compute can win.
Can I use Spot instances to reduce SageMaker training cost?
Yes — managed Spot training runs training jobs on spare AWS capacity at a steep discount versus on-demand (often more than 50% off), with automatic checkpointing so an interrupted job resumes rather than restarting from scratch. It is one of the largest single savings available for training that can tolerate interruption. It applies to training jobs, not to real-time endpoints — you would not want a production endpoint on interruptible capacity.
Do AWS credits cover SageMaker costs?
Yes. AWS credits apply to SageMaker compute (training and inference), storage, and features just like any other AWS service, auto-applying to your monthly bill until exhausted. Eligible programs include Activate Portfolio (up to ~$100K), Bedrock/GenAI PoC funding ($10K–$50K), and the Generative AI Accelerator (up to $1M). Credits also stack on top of Savings Plans and Spot, so disciplined cost management makes them last longer. CloudRoute routes you to a vetted AWS partner who files the application; the customer pays $0 because AWS funds the pool and the partner pays CloudRoute a routing commission.
Are these SageMaker prices current and exact?
No — every figure on this page is representative as of 2026 and intended for reasoning about the shape and relative magnitude of cost, not for budgeting to the cent. AWS pricing varies by region and changes over time, and GPU instance pricing in particular moves. Always confirm the exact, current, region-specific rates on the official AWS SageMaker pricing page before you commit a budget.

Stop paying the SageMaker bill in cash

CloudRoute connects ML and data-science teams with vetted AWS partners who optimize SageMaker spend and file the credit applications that fund training and hosting. Customer pays $0 — AWS funds it.

matched within< 24h
credit ceilingup to $1M
cost to you$0
Amazon SageMaker pricing — full 2026 breakdown · CloudRoute