bedrock provisioned throughput · the commitment decision · 2026

Bedrock Provisioned Throughput — when it pays off.

A neutral reference for Amazon Bedrock Provisioned Throughput in 2026: what a reserved "model unit" actually buys you, how it differs from on-demand, when it is genuinely required (hosting custom and fine-tuned models), the three commitment tiers (no-commit hourly, 1-month, 6-month), the exact break-even math against on-demand with a worked example, how to buy and manage model units, and how AWS credits can fund the commitment so the build costs you $0.

billing unit
model unit / hour
commitment tiers
none · 1mo · 6mo
required for
custom models
cost with credits
$0
TL;DR
  • Provisioned Throughput (PT) reserves dedicated capacity for a specific Bedrock model — measured in "model units" — and bills a flat hourly rate per unit regardless of how many tokens you push through it. It guarantees a fixed throughput and predictable latency, and it is the only way to serve most custom (fine-tuned, distilled, or imported) models on Bedrock.
  • PT is a commitment decision, not a default. On-demand wins for variable, low, or unknown traffic; PT wins for steady high volume where the flat hourly cost beats per-token billing, and where on-demand throttling is an operational risk. The break-even is a volume threshold: below it, on-demand is cheaper; above it, PT is. The worked example below shows how to find your own line.
  • The 6-month commitment is cheapest per hour, the 1-month is mid, and the no-commit hourly rate is highest but cancellable at any time. The hourly charge accrues 24/7 whether or not the model is used — which is exactly the kind of standing cost AWS credits are good at absorbing. CloudRoute routes you to a credit pool (Activate up to $100K, Bedrock POC $10K–$50K, GenAI Accelerator up to $1M) and a vetted partner to size and manage the commitment — customer pays $0.
the concept

IWhat Provisioned Throughput actually is

Provisioned Throughput is the mode you reach for when on-demand stops being enough — either because you need guaranteed capacity, or because you are serving a model that on-demand cannot serve at all. Understanding what you are reserving, and what the reservation guarantees, is the whole decision.

Most Bedrock usage starts on the on-demand path: you call a model, you pay a published rate per 1,000 input and output tokens, and you commit to nothing. Capacity is shared across all accounts in a region and governed by per-account throughput quotas. That is ideal until your traffic is high enough, steady enough, or sensitive enough that shared capacity becomes a liability. Provisioned Throughput (PT) is the answer to that: you reserve dedicated inference capacity for one specific model and pay a flat hourly rate for it.

The unit of reservation is the model unit (MU). A model unit represents a defined, guaranteed amount of throughput for a given model — a certain number of input and output tokens per minute (the exact figures vary by model and are published per model on the Bedrock console). You buy one or more model units of a specific model, and from that moment you are billed an hourly rate per model unit for as long as the provisioned-throughput allocation exists — independent of how many requests you actually send through it. Send zero tokens for an hour and you still pay the hourly rate; saturate the unit and you pay the same hourly rate.

That flat-rate structure is the entire point. On-demand cost is a function of usage (tokens consumed); provisioned cost is a function of time and reserved capacity (model units × hours). PT decouples your bill from your traffic. For a workload with high, predictable volume, a fixed monthly capacity bill is both cheaper and more budgetable than a per-token bill that scales with every request.

PT also changes the performance contract. On-demand throughput is best-effort within your quota and can be throttled during regional demand spikes (the ThrottlingException that latency-sensitive teams learn to fear). A provisioned model unit delivers guaranteed, consistent throughput and latency — the capacity is yours, isolated from other tenants' spikes. For a production system with an SLA, that predictability is often worth more than the raw cost comparison.

One framing that helps: on-demand is like paying per ride; Provisioned Throughput is like leasing a dedicated vehicle. If you ride occasionally, pay-per-ride is cheaper. If you are commuting heavily every day, the lease is cheaper and always available — but you pay for it even on the days you stay home. The rest of this page is about finding the point where the lease starts to win, and the cases where you have no choice but to lease.

the one-line definition

Provisioned Throughput = reserve dedicated capacity for one Bedrock model in units called model units, billed at a flat hourly rate per unit (optionally discounted with a 1- or 6-month commitment), in exchange for guaranteed throughput and latency and the ability to serve custom models. Cost tracks reserved capacity and time, not tokens consumed.

the tradeoff

IIOn-demand vs Provisioned Throughput — the real tradeoff

The choice between on-demand and PT is the largest cost-and-reliability lever on a heavy Bedrock workload. It is not "which is better" — it is "which fits this traffic shape." Get the shape wrong in either direction and you either overpay or get throttled.

The tradeoff has three axes — cost, latency/reliability, and commitment — and they pull in different directions depending on how your traffic behaves.

The honest default for most teams: start on-demand (optionally with prompt caching and Batch for the workloads those suit), measure real, sustained traffic, and only move a specific hot path onto Provisioned Throughput once the volume is high, steady, and predictable enough that the math and the reliability both favor it. Reserving capacity before you have proof of steady volume is the single most common way teams waste money on PT.

Cost — usage-priced vs capacity-priced

On-demand bills per token, so cost rises and falls exactly with usage; at low or spiky volume you pay almost nothing during quiet periods. PT bills per model-unit-hour, a fixed cost that does not move with usage. The implication is a crossover: below some volume, on-demand's per-token total is less than PT's flat monthly bill; above it, PT is less. The whole cost question reduces to: are you above or below the break-even volume? (Section IV does the math.)

Latency and reliability — best-effort vs guaranteed

On-demand capacity is shared and best-effort within your quota. Most of the time it is fine, but during regional demand surges you can hit throttling, and tail latency is not contractually fixed. A provisioned model unit gives you isolated, guaranteed throughput — consistent latency and no contention with other tenants' spikes. For an interactive product with a latency SLA, or a pipeline that must not stall, this reliability is frequently the deciding factor even before cost. (Cross-region inference is the on-demand answer to spike-smoothing; PT is the reserved-capacity answer — see the cross-region-inference sibling.)

Commitment and flexibility — none vs months

On-demand commits you to nothing — switch models, change volume, or stop entirely with no penalty. PT asks for a commitment: even the no-commit hourly option ties you to paying for the reserved capacity while it exists, and the discounted 1- and 6-month tiers lock the rate (and the spend) for that term. That rigidity is the cost of the guarantee. It is fine for a stable, proven workload; it is a trap for an experiment whose model choice or volume is still moving.

no choice involved

IIIWhen Provisioned Throughput is required (not optional)

For base models, PT is a cost-and-reliability choice you can take or leave. For an important class of workloads it is not optional at all — it is the only way to run the model. This is the case people most often miss when they budget a custom-model project.

The defining rule: most custom models on Bedrock can only be served via Provisioned Throughput. On-demand is reserved for the shared, multi-tenant base models. Anything that is yours specifically needs dedicated capacity to host it. That covers several categories:

  • Fine-tuned models — A base model you have fine-tuned on your own data produces a private model artifact unique to your account. Serving it requires you to purchase Provisioned Throughput for that custom model — there is no on-demand endpoint for a model only you have. This is the recurring hosting cost that surprises teams: the fine-tuning training run was cheap and one-time, but keeping the result available means standing model-unit-hours.
  • Distilled models — Model distillation trains a smaller, cheaper model to mimic a larger one for a narrow task. The distilled artifact is custom, so it follows the same rule — serve it on Provisioned Throughput. The trade can still be excellent: a small distilled model on one model unit can be far cheaper to run at volume than a frontier model on-demand, but you are now in the PT cost model.
  • Imported custom-weight models — Where Bedrock supports importing your own model weights (Custom Model Import for supported architectures), the imported model is served from dedicated capacity you provision. Again: no shared on-demand path, because the weights are private to you.
  • Guaranteed-capacity production paths — Even on a base model, a path that must never throttle — a checkout flow, a real-time agent under an SLA, a regulated workflow — may require PT not for cost but because best-effort on-demand capacity is not an acceptable reliability posture. Here PT is "required" by the SLA, not by Bedrock.

The practical consequence for planning: if your roadmap includes fine-tuning, distillation, or importing a model, build the Provisioned-Throughput hosting cost into the budget from day one. The most common custom-model budgeting mistake is pricing only the training and forgetting that the resulting model then sits on a 24/7 hourly charge for as long as it is deployed. For many narrow use cases that standing cost is exactly why the team should reconsider whether a base model with good prompting or RAG would have been cheaper overall (see the fine-tuning sibling for that decision).

the rule to remember

Base models: on-demand or Provisioned Throughput — your choice. Custom models (fine-tuned, distilled, imported): Provisioned Throughput only. If you fine-tune, you are committing to a standing hourly hosting cost — budget for the hosting, not just the training.

how it is billed

IVPricing — model units, hourly rates, and the three commitment tiers

Provisioned Throughput pricing is refreshingly simple compared with per-token math: a model unit has an hourly rate, and the rate drops the longer you commit. The complexity is not in the formula — it is in deciding how many units and which term.

You are billed (number of model units) × (hourly rate for that model) × (hours the allocation exists). The hourly rate depends on two things: which model (larger, more capable models cost more per model-unit-hour, mirroring their higher on-demand token rates) and which commitment term you choose. There are three terms:

Two cost realities to internalize. First, the charge is per model — a model unit of one model does not serve a different model; if you run several models on PT you pay for each separately. Second, the charge is continuous: a provisioned allocation left running over a weekend, a forgotten test allocation, or an over-provisioned unit count all burn money silently because the meter runs on time, not usage. The discipline of PT is not the purchase decision alone — it is ongoing right-sizing and cleanup.

No-commitment (hourly)

Pay the highest per-hour rate, but cancel any time — you are only on the hook for the hours the allocation actually exists. Best for: short-lived needs (a launch spike, a time-boxed campaign, a load test), validating that PT is the right move before committing to a term, or serving a custom model for a finite project. This is the flexible, no-lock option; you trade the discount for the freedom to turn it off.

1-month commitment

Commit to one month and the hourly rate drops below the no-commit rate. You pay for the full month of reserved capacity regardless of usage. Best for: a workload you are confident is steady for at least a month but whose longer-term volume you are not ready to lock — a recently-launched feature with proven early traffic, for example.

6-month commitment

The longest standard term and the cheapest per-hour rate — the deepest discount in exchange for the deepest lock. You pay for six months of capacity. Best for: a mature, high-volume production path with stable model choice and predictable demand — the classic case for reserving capacity. The risk is obvious: if you switch models or your volume falls inside the term, you are still paying for capacity you no longer need.

bedrock provisioned-throughput commitment tiers · representative 2026 structure
Commitment termRelative hourly rateYou pay forFlexibilityBest for
No commitmentHighestOnly the hours the allocation existsCancel any timeSpikes, tests, validating PT, finite projects
1-monthMid (discounted)A full month of capacityLocked for the monthProven-steady feature, near-term confidence
6-monthLowest (deepest discount)Six months of capacityLocked for the termMature high-volume production, stable model
Rates and exact terms are representative as of 2026 — confirm current model-unit pricing and available commitment terms on the AWS Bedrock pricing page and console. Longer commitments trade flexibility for a lower hourly rate; the hourly charge accrues whether or not the unit is used.
the worked math

VThe break-even math vs on-demand — a worked example

The cost decision for a base model comes down to one number: the volume at which Provisioned Throughput becomes cheaper than on-demand. Below it, stay on-demand; above it, reserve. Here is exactly how to compute your own line, with representative numbers to show the shape.

The method is the same regardless of the actual rates: compare the fixed monthly cost of the model units you would need against the per-token cost of the same traffic on on-demand. Where the two lines cross is your break-even.

Step 1 — size the model units. A model unit delivers a published throughput (tokens/minute) for a given model. Take your peak sustained throughput requirement and divide by the per-unit throughput to get the number of model units you must reserve to serve the load without throttling. Say a workload needs sustained capacity that requires 2 model units of a Sonnet-class model to serve at peak.

Step 2 — compute the fixed monthly PT cost. Suppose, for illustration, the 6-month-commitment rate for that model is on the order of $25 per model-unit-hour (representative — confirm the real figure). Two units running continuously for a 730-hour month is 2 × $25 × 730 ≈ $36,500/month, fixed, no matter the token volume.

Step 3 — compute the same traffic on-demand. Now price the actual token throughput at on-demand rates. If the workload, run flat-out, would push roughly 3.0 billion input and 0.6 billion output tokens a month through those two saturated units, then at representative Sonnet-class on-demand rates ($3 per 1M input, $15 per 1M output) that is 3,000 × $3 + 600 × $15 ≈ $9,000 + $9,000 ≈ $18,000/month. At that volume on-demand is far cheaper — so do not provision.

Step 4 — find the crossover. PT only wins once on-demand cost climbs past the ~$36,500 fixed PT bill. Keep the same 5:1 input:output mix and on-demand reaches ~$36,500/month at roughly 6.1 billion input + 1.2 billion output tokens — i.e. you need to be running those two model units near saturation, around the clock, before reservation pays off purely on cost. The lesson is blunt: on a base model, Provisioned Throughput beats on-demand on cost only at genuinely high, sustained utilization. If your units would sit half-idle, on-demand is cheaper.

Two things shift the line in PT's favor beyond raw cost. (1) Reliability: if on-demand throttling would breach an SLA, the guaranteed capacity can justify PT below the pure cost crossover. (2) Custom models: if the model is fine-tuned or distilled, there is no on-demand line to compare against — PT is the only option, and the question becomes "how few model units can serve the load," not "PT or on-demand." For everything else, the rule of thumb stands: provision only when you can keep the units busy.

the break-even rule of thumb

Provisioned Throughput beats on-demand on cost only when reserved model units run at high, sustained utilization (roughly speaking, busy most hours of most days). Idle reserved capacity is pure waste. Below the crossover, on-demand — plus Batch and prompt caching where they fit — is cheaper. Above it, and for guaranteed-SLA or custom-model paths, reserve.

buying and managing

VIHow to buy and manage model units

Provisioning throughput is a few clicks (or an API call), but managing it well — picking the term, right-sizing the unit count, and not leaking idle capacity — is where the cost discipline lives. Here is the lifecycle end to end.

You purchase Provisioned Throughput from the Amazon Bedrock console (Provisioned throughput section) or programmatically via the API/SDK/CloudFormation. The flow is consistent:

  • Pick the model — Choose the specific base model — or select your custom (fine-tuned / distilled / imported) model, which can only be served this way. Throughput is reserved against that one model.
  • Choose the commitment term — No-commitment (hourly, cancellable), 1-month, or 6-month. This sets your hourly rate. Match the term to how confident you are in the workload's longevity — short for unproven, long only for mature, stable demand.
  • Set the number of model units — Size from measured peak sustained throughput, not a guess. Under-provisioning reintroduces throttling; over-provisioning burns money on idle capacity. Start from real on-demand traffic data wherever possible.
  • Confirm and deploy — On confirmation you receive a provisioned-model identifier (an ARN) that you target in your Bedrock inference calls instead of the shared base-model ID. Billing for the model-unit-hours begins immediately and runs until you delete the allocation (or the committed term ends, per its rules).
  • Monitor utilization — Track invocation and token-throughput metrics in Amazon CloudWatch against the capacity you reserved. Consistently low utilization means you over-provisioned (cut units or move back to on-demand); sustained saturation with throttling means you need another unit.
  • Decommission promptly — For no-commitment allocations, delete them the moment the need ends — a forgotten hourly allocation is a silent recurring charge. For committed terms, plan the term boundary so you are not auto-renewing capacity you no longer use.

A clean operating pattern that many teams converge on: serve interactive traffic on-demand with prompt caching, run bulk jobs on Batch, and reserve Provisioned Throughput only for the one or two hot, steady paths (or any custom model) that genuinely justify it — then watch CloudWatch and resize. PT is not an all-or-nothing switch for the account; it is a surgical tool for specific paths. The cost-engineering value is in applying it precisely and cleaning it up rigorously — exactly the kind of ongoing FinOps work a vetted partner handles in the engagements CloudRoute routes.

how it becomes $0

VIIHow AWS credits fund the commitment

Provisioned Throughput is the most "committed" line on a Bedrock bill — a standing hourly charge for months. That is precisely the kind of cost AWS credits are designed to absorb, which changes the risk calculus of reserving capacity at all.

Provisioned-Throughput charges are ordinary Bedrock spend, so they are fully credit-eligible — credits in your AWS account apply automatically against the model-unit-hours just as they do against on-demand tokens, fine-tuning, and embeddings. The relevant pools are the familiar ones: AWS Activate (general startup credits, commonly up to $100K for institutionally-funded startups), a dedicated Bedrock / Generative-AI POC pool ($10K–$50K) aimed at proving out a GenAI use case, and the competitive Generative AI Accelerator (awards up to $1M for a small cohort of AI-first startups).

Why this matters specifically for PT: the scariest thing about a commitment term is paying for reserved capacity during the months before a workload has fully ramped — the "what if volume does not materialize" risk. When the commitment is drawn from a credit pool rather than runway, that risk is largely defused. You can reserve the capacity a fine-tuned model or a high-SLA path needs, run it through launch and ramp, and let credits cover the standing hourly cost while you prove the workload out. Cost discipline becomes "make the credits last" rather than "protect the bank balance."

The practical mechanic is that these pools are largely partner-filed — requested through the AWS Partner Network (the ACE program), not a public self-serve form — which is why teams route through an AWS partner rather than applying alone. CloudRoute matches you to the right pool for your stage and to a vetted AWS DevOps/ML partner who both files the credit application and does the actual cost engineering: sizing the model units from real traffic, choosing the commitment term, wiring CloudWatch alarms on utilization, and cleaning up idle allocations. The customer pays $0 — AWS funds the credit pool, AWS pays the partner through engagement-funding programs, and the partner pays CloudRoute a routing commission. You never see an invoice. (For the credit mechanics, see AWS credits for generative-AI startups and the Bedrock POC funding page.)

on-demand vs provisioned

On-Demand vs Provisioned Throughput (1-month / 6-month) — when each wins

The scannable version of the whole decision: on-demand against the two main provisioned commitment tiers, across cost shape, performance, commitment, and the workloads each one wins. Figures are representative 2026 illustrations, not quotes.

VariableOn-DemandProvisioned — 1-monthProvisioned — 6-month
How you payPer token (input + output)Hourly per model unitHourly per model unit
Cost shapeScales with usageFixed for the monthFixed for the term
Relative hourly raten/a (usage-priced)Discounted vs no-commitCheapest per hour
Throughput / latencyBest-effort within quotaGuaranteed, isolatedGuaranteed, isolated
Throttling riskPossible at spikesNone (reserved)None (reserved)
CommitmentNone — cancel any time1 month locked6 months locked
Serves custom models?NoYesYes
When it winsVariable / low / unknown volume; prototypesProven-steady volume, near-term confidenceMature high-volume production; stable model
There is also a no-commitment hourly PT option (highest rate, cancellable any time) for spikes and validation. Custom (fine-tuned / distilled / imported) models can only be served on Provisioned Throughput. On a base model, PT beats on-demand on cost only at high, sustained utilization — see the break-even section.
before you commit to a model unit
Get AWS credits to fund the commitment — and a partner to size it (you pay $0)
Get matched in 24h →
a recent match

A fine-tuned model that needed reserved capacity — funded at $0 — anonymized

inquiry · Series-A vertical-AI SaaS, Singapore
Series-A vertical-AI SaaS, 24 people, a fine-tuned domain model plus a high-SLA interactive path

Situation: The team had fine-tuned a domain-specific model on their proprietary data and discovered at deployment time that serving it required Provisioned Throughput — an ongoing hourly cost they had not budgeted, because they had priced only the one-time fine-tuning run. On top of that, their interactive product path could not tolerate on-demand throttling under load. They needed reserved capacity for both, but were wary of committing months of standing cost out of a runway earmarked for hiring.

What CloudRoute did: CloudRoute matched them within 24 hours to a Singapore-region AWS partner with GenAI cost-engineering experience. The partner (1) sized the model units from real traffic — one unit for the custom model, one for the SLA path — rather than over-provisioning; (2) put the proven, steady custom-model path on a 6-month commitment for the deepest rate and kept the still-ramping SLA path on a 1-month commitment; (3) wired CloudWatch utilization alarms so idle capacity would be caught; and (4) filed a Bedrock POC credit application plus an Activate Portfolio application to fund the whole commitment.

Outcome: The reserved capacity went live with guaranteed throughput on both paths, and the entire standing PT cost — plus the rest of the Bedrock bill — was covered by the approved credits, so the team paid $0 during launch and ramp. As volume on the SLA path proved out, the partner rolled it onto a 6-month commitment too. CloudRoute's commission was paid by the partner from AWS engagement funding, not by the customer.

reserved: 2 model units · terms: 6mo + 1mo · credits secured: POC + Activate · out-of-pocket during build: $0

faq

Common questions

What is Amazon Bedrock Provisioned Throughput?
Provisioned Throughput reserves dedicated inference capacity for a specific Bedrock model, measured in "model units," and bills a flat hourly rate per model unit regardless of how many tokens you send through it. It guarantees consistent throughput and latency (no shared-capacity throttling) and is the only way to serve most custom models. You can buy it with no commitment (hourly, cancellable) or commit to a 1-month or 6-month term for a lower hourly rate.
What is a model unit in Bedrock Provisioned Throughput?
A model unit (MU) is the unit of reserved capacity — it represents a defined, guaranteed amount of throughput (input and output tokens per minute) for one specific model, with the exact figures published per model. You reserve one or more model units and pay an hourly rate per unit for as long as the allocation exists. A model unit is tied to a single model; running several models on Provisioned Throughput means paying for each separately.
When should I use Provisioned Throughput instead of on-demand?
Use on-demand for variable, low, or unknown traffic and for prototypes — you pay only for what you use and commit to nothing. Use Provisioned Throughput when (1) you are serving a custom (fine-tuned, distilled, or imported) model, which on-demand cannot serve at all; (2) you have high, steady, predictable volume where the flat hourly cost beats per-token billing; or (3) you have a latency/reliability SLA that best-effort on-demand capacity could breach during spikes. On a base model, PT wins on cost only at high sustained utilization.
Is Provisioned Throughput required for fine-tuned models on Bedrock?
Yes. Custom models — fine-tuned, distilled, or imported via Custom Model Import — can only be served on Provisioned Throughput; there is no shared on-demand endpoint for a model unique to your account. This is why custom-model budgeting must include the ongoing hosting cost: the fine-tuning training run is a small one-time charge, but keeping the resulting model deployed means paying for model-unit-hours continuously for as long as it is available.
How is Provisioned Throughput priced?
You pay (number of model units) × (hourly rate for that model) × (hours the allocation exists). The hourly rate depends on the model (larger models cost more per model-unit-hour) and the commitment term: no-commitment has the highest hourly rate but is cancellable any time; a 1-month commitment is discounted; a 6-month commitment is the cheapest per hour. The charge accrues whether or not the model is used. All exact figures are representative as of 2026 — confirm current rates on the AWS Bedrock pricing page.
How do I calculate the break-even between Provisioned Throughput and on-demand?
Compare the fixed monthly PT cost against the per-token on-demand cost of the same traffic. Step 1: size the model units needed from peak sustained throughput. Step 2: compute fixed PT cost = units × hourly rate × hours per month. Step 3: price the same monthly token volume at on-demand rates. The crossover — where on-demand cost rises past the fixed PT bill — is your break-even. Practically, PT beats on-demand on cost only when reserved units run at high, sustained utilization; idle reserved capacity is waste. Reliability needs or custom models can justify PT below the pure cost crossover.
How do I buy and manage Provisioned Throughput?
Purchase it from the Amazon Bedrock console (Provisioned throughput section) or via the API/SDK/CloudFormation: pick the model, choose the commitment term, set the number of model units (size from measured peak traffic, not a guess), and confirm. You get a provisioned-model ARN to target in your inference calls, and billing starts immediately. Manage it by monitoring utilization in CloudWatch — cut or add units as needed — and decommission no-commitment allocations the moment the need ends, since a forgotten allocation is a silent recurring charge.
Can AWS credits cover Provisioned Throughput costs?
Yes — Provisioned-Throughput charges are ordinary Bedrock spend and are fully credit-eligible; credits in your AWS account apply automatically against the model-unit-hours. This is especially useful for PT because it defuses the main risk of a commitment term — paying for reserved capacity before a workload has ramped. The relevant pools (AWS Activate up to $100K, Bedrock/GenAI POC $10K–$50K, GenAI Accelerator up to $1M) are largely partner-filed via the AWS Partner Network. CloudRoute matches you to the right pool and a vetted AWS partner who files the application and sizes/manages the commitment — customer pays $0, AWS funds it.

Reserve the capacity — let AWS fund the commitment

Whether you need Provisioned Throughput to serve a fine-tuned model or to guarantee an SLA at scale, the standing hourly cost is exactly what AWS credits absorb. CloudRoute routes you to the right credit pool (Activate up to $100K, Bedrock POC $10K–$50K, GenAI Accelerator up to $1M) and a vetted AWS partner to size and manage the commitment. Customer pays $0.

matched within< 24h
GenAI credit ceilingup to $1M
cost to you$0
Bedrock Provisioned Throughput — when it pays off (2026) · CloudRoute