for AWS partners →Make your Bedrock bill $0 with credits →

amazon bedrock pricing calculator · 2026

Amazon Bedrock pricing calculator — the formula, the reference tables, your estimate.

A practical way to estimate your Amazon Bedrock bill before you build. This page gives the one cost formula, a set of pre-computed monthly-cost tables for the three most common workloads — a chatbot, a RAG app, and a batch job — at low, medium, and high volume across Amazon Nova, Claude Haiku, Claude Sonnet, and Llama; the four levers that move the number; and a step-by-step walkthrough to estimate your own bill in five minutes. All figures are representative as of 2026 — confirm current rates on the AWS pricing page. An interactive calculator widget is on the way.

Make your Bedrock bill $0 with credits →→ jump to the reference cost tables

the formula

tokens × rate × requests

scenarios costed

chatbot · RAG · batch

biggest lever

model choice

cost with credits

TL;DR

The Bedrock cost formula for a text workload is: (input tokens × input rate + output tokens × output rate) × requests per month, summed across the models you call. Add per-image charges for image models and a one-time training + ongoing hosting charge for fine-tuned models. Everything else flows from those numbers.
Model choice is the dominant lever — the same workload can cost 50–100× more on a frontier model than on a small one. The other three big levers: Batch (~50% cheaper for async jobs), prompt caching (cuts the cost of repeated context, often 30–55% on chat/RAG), and Provisioned Throughput (reserved capacity that wins only at steady high volume). The reference tables below show real monthly numbers for each.
A prototype is single-digit dollars; production at scale runs to thousands per month. That gap is exactly what AWS credits cover — Activate (up to $100K), a Bedrock/GenAI POC pool ($10K–$50K), and the GenAI Accelerator (up to $1M). They are largely partner-filed. CloudRoute routes you to the right pool and a vetted AWS partner, so the build costs you $0.

the math

IThe Bedrock cost formula

Every Bedrock estimate reduces to one formula. Once you can write it down for your workload, a "calculator" is just arithmetic — and you can sanity-check any number a tool gives you.

Bedrock text models are billed on tokens. A token is roughly ¾ of an English word, so 1,000 tokens ≈ 750 words. Every request is metered in two directions: input tokens (your system prompt + any conversation history or retrieved context + the user message) and output tokens (what the model generates). Each direction has its own published rate, almost always per 1,000 tokens (some pages quote per-million; divide by 1,000 to convert).

The monthly cost of a single text workload is therefore:

input_tokens — everything you send per request: system prompt + conversation history (for chat) or retrieved chunks (for RAG) + the user input. This is usually the larger of the two in chat and RAG.
output_tokens — what the model writes back. Output is typically priced 3–5× higher than input, so verbose generation is dominated by output cost.
input_rate / output_rate — set entirely by which model you pick — the single biggest lever (a 50–100× range from smallest to frontier models).
requests_per_month — your traffic: messages, queries, or items processed per month.

the formula

Monthly cost = ( input_tokens × input_rate + output_tokens × output_rate ) × requests_per_month — where rates are per 1,000 tokens and token counts are per request. Sum this across every model you call. For image models, swap in images × price_per_image. For embeddings, only the input side is charged. For fine-tuned models, add a one-time training charge plus an ongoing hourly hosting charge on Provisioned Throughput.

the rate card

IIRepresentative per-model rates used in this page

To make the reference tables concrete, here are the representative 2026 rates the math below uses. These are illustrative — confirm current rates on the AWS Bedrock pricing page before budgeting.

Four models span the practical cost range, and they are the ones the reference tables price: Amazon Nova Lite (a cheap, fast multimodal model), Claude Haiku (fast and inexpensive), Claude Sonnet (the all-round workhorse), and Llama (large, ~70B-class) (a capable open-weight option). Notice the spread: Nova Lite is roughly 50× cheaper on input than Sonnet. That spread is why "which model" dominates every estimate.

representative on-demand rates used in this page · per 1K tokens · 2026

Model	Input / 1K	Output / 1K	Input / 1M	Output / 1M	Typical role
Amazon Nova Lite	$0.00006	$0.00024	$0.06	$0.24	Cheap, fast, multimodal
Claude Haiku	$0.00025	$0.00125	$0.25	$1.25	Fast, low-cost, high-throughput
Llama (large, ~70B)	$0.00265	$0.0035	$2.65	$3.50	Capable open-weight
Claude Sonnet	$0.003	$0.015	$3.00	$15.00	Best all-round workhorse

Representative 2026 figures for relative comparison only — confirm current rates on the AWS Bedrock pricing page. Output is typically 3–5× input. Rates vary by region and exclude Batch and prompt-caching discounts. Amazon Nova Micro is cheaper still; Claude Opus-class and Nova Premier are more expensive — the four above bracket the common range.

pre-computed scenarios

IIIReference cost tables — chatbot, RAG, batch (low / medium / high)

These are the tables most people want: the monthly on-demand cost of three common workloads, at three volumes, across the four models. Each table states its per-request token assumptions so you can scale it to your own numbers. All figures use the rate card above and are representative as of 2026.

Read these as shape, not gospel. The absolute dollars depend on rates that change; the ratios between models and between volumes are the durable insight. Two patterns jump out immediately: the cost scales linearly with volume (10× the traffic = 10× the bill), and the model choice swings the bill by 50–100× at any given volume. Pick the cheapest model that passes your quality bar and you have already won most of the cost battle.

Scenario A — Chatbot / assistant

Assumptions per message: ~1,500 input tokens (a system prompt plus a few turns of history plus the user message) and ~500 output tokens (a typical answer). Volumes: low = 30,000 messages/mo, medium = 300,000, high = 3,000,000.

chatbot · monthly on-demand cost by model · in 1,500 / out 500 per msg · 2026

Volume (msgs/mo)	Nova Lite	Claude Haiku	Llama 70B	Claude Sonnet
Low — 30,000	~$6	~$30	~$172	~$360
Medium — 300,000	~$63	~$300	~$1,720	~$3,600
High — 3,000,000	~$630	~$3,000	~$17,200	~$36,000

On-demand, no caching. Add prompt caching and the chat numbers typically fall 30–55% because the system prompt and history repeat (see §IV). A chatbot is the workload where caching pays off most.

Scenario B — RAG application

Assumptions per query: ~4,000 input tokens (the user query plus several retrieved document chunks plus a system prompt) and ~400 output tokens (a grounded answer). RAG is input-heavy because you stuff retrieved context into the prompt. Volumes: low = 20,000 queries/mo, medium = 200,000, high = 2,000,000. Embedding the corpus is a separate, usually small, one-time-ish cost (see note).

RAG · monthly on-demand cost by model · in 4,000 / out 400 per query · 2026

Volume (queries/mo)	Nova Lite	Claude Haiku	Llama 70B	Claude Sonnet
Low — 20,000	~$7	~$30	~$240	~$360
Medium — 200,000	~$67	~$300	~$2,400	~$3,600
High — 2,000,000	~$672	~$3,000	~$24,000	~$36,000

Generation cost only. Embedding the knowledge base with Titan Text Embeddings V2 (~$0.00002/1K input tokens) is cheap — embedding 50M tokens of source content is on the order of ~$1. The recurring RAG cost is the per-query generation above plus your vector-store hosting (OpenSearch/Aurora/etc.), which is billed by those services, not Bedrock.

Scenario C — Batch job (classification / enrichment / summarization)

Assumptions per item: ~2,000 input tokens and ~300 output tokens (e.g. read a document, emit a label or short summary). Because batch jobs are not latency-sensitive, they run on Bedrock Batch, which is ~50% cheaper than on-demand — the figures below already apply that discount. Volumes: low = 100,000 items/mo, medium = 1,000,000, high = 10,000,000.

batch · monthly cost by model · in 2,000 / out 300 per item · ~50% Batch discount applied · 2026

Volume (items/mo)	Nova Lite	Claude Haiku	Llama 70B	Claude Sonnet
Low — 100,000	~$10	~$44	~$318	~$525
Medium — 1,000,000	~$96	~$438	~$3,175	~$5,250
High — 10,000,000	~$960	~$4,375	~$31,750	~$52,500

Batch pricing (~50% off on-demand) is already baked into these numbers — the same job on-demand would cost about double. Batch is the single easiest cost win for any high-volume, non-real-time workload. See the amazon-bedrock-batch-inference sibling.

what moves the number

IVThe four levers that change your bill

A calculator output is only useful if you know which inputs to change to bring it down. Four levers dominate, in rough order of impact. Pulling them in combination is how teams take a scary estimate and make it boring.

These are not mutually exclusive — a single product can route cheap requests to a small model on-demand, push nightly bulk work to Batch, cache a large shared system prompt, and reserve capacity for a steady high-QPS path, all at once.

Model choice (biggest lever, 50–100×) — The same workload on Nova Lite vs Claude Sonnet differs by ~50× in the tables above. Match the model to the task: small/cheap models for classification, extraction, routing, and easy chat; frontier models only for the genuinely hard requests. Many production systems route across a tier of models — a cheap model handles 80–90% of traffic and escalates the rest.
Batch (~50% off, async work) — Submit a large set of requests as one job and Bedrock processes them in the background for roughly half the on-demand rate. Ideal for bulk summarization, classification, enrichment, embedding a corpus, and offline evaluation. If a workload does not need a real-time answer, Batch nearly halves it for free.
Prompt caching (30–55% on repeated context) — When many requests share a large prefix — a long system prompt, a policy document, a few-shot block, or a fixed retrieved context — caching stores that prefix so you are not billed full rate to re-process it every call. Chat and RAG benefit most; in the worked example below, caching cuts a chatbot bill from ~$63 to ~$44/mo on Nova Lite. See amazon-bedrock-prompt-caching.
Provisioned Throughput (steady high volume only) — Reserve dedicated model capacity for a fixed hourly (or committed) price instead of paying per token. This wins only when utilization is high and steady — a busy, predictable production path can come out cheaper than on-demand and avoids throttling. At low or spiky volume it is more expensive than on-demand. See amazon-bedrock-provisioned-throughput.

caching, quantified

Re-running the medium chatbot (300,000 msgs/mo, 1,500 in / 500 out) with ~1,200 of the 1,500 input tokens cached: Nova Lite ~$63 → ~$44, Claude Haiku ~$300 → ~$219, Claude Sonnet ~$3,600 → ~$2,628, Llama 70B ~$1,718 → ~$859. The bigger and more-repeated your shared context, the larger the cut.

do it yourself

VEstimate your own bill — a step-by-step walkthrough

Use this to produce a defensible monthly estimate for your specific workload in about five minutes. It is the same procedure an interactive calculator would automate — done by hand so you understand every input.

Work one workload at a time (chatbot, RAG, batch, image), then add the workloads together for a total. Keep a note of every assumption so you can revisit it when traffic or models change.

Step 1 — Estimate tokens per request — Count input tokens (system prompt + history/retrieved context + user input) and output tokens (typical answer length). Rule of thumb: 1,000 tokens ≈ 750 words. If unsure, run 10–20 real requests and read the token counts from the API response — far more accurate than guessing.
Step 2 — Estimate requests per month — Messages, queries, or items processed monthly. For user-facing apps: active users × requests per user per day × 30. Be honest about peak vs average — bill is driven by total volume, not peak.
Step 3 — Pick a candidate model and read its rates — Start with the cheapest model you think might pass your quality bar (often Nova Lite or Claude Haiku). Take its input and output rates per 1K tokens from the AWS pricing page (or the rate card in §II).
Step 4 — Plug into the formula — (input_tokens × input_rate + output_tokens × output_rate) × requests_per_month. That is your on-demand monthly cost for that model and workload. Repeat Steps 3–4 for one cheaper and one more capable model to bracket the range — exactly what the reference tables show.
Step 5 — Apply the levers — Async/bulk work → multiply by ~0.5 for Batch. Large repeated prefix → reduce the input portion by your cache hit rate (often a 30–55% total cut on chat/RAG). Steady high QPS → price Provisioned Throughput against the on-demand number and take the lower.
Step 6 — Add the non-token costs — RAG: add vector-store hosting (OpenSearch Serverless / Aurora / Pinecone) and a small one-time embedding cost. Image: add images × per-image price. Fine-tuned: add one-time training + ongoing hosting. Then sum all workloads for your monthly total — and add a 20–30% buffer for growth and retries.

a worked example, start to finish

A RAG assistant: ~4,000 input + ~400 output tokens/query, 200,000 queries/mo, on Claude Haiku. Generation = (4,000/1,000 × $0.00025 + 400/1,000 × $0.00125) × 200,000 = ~$300/mo. Add ~$1 to embed the corpus with Titan, plus vector-store hosting (say ~$100–$300/mo on OpenSearch Serverless). Total ≈ $400–$600/mo — before credits make it $0.

beyond tokens

VICosting image generation and fine-tuned models

Two workloads do not fit the token formula cleanly: image generation (billed per image) and custom/fine-tuned models (training charge plus hosting). Here is how to estimate each.

Image generation. Models like Amazon Nova Canvas, Titan Image Generator, and Stability's models are billed per generated image, with the price scaling by resolution and (sometimes) quality. Estimate: images_per_month × price_per_image. Editing operations (inpaint/outpaint) are billed per output image too. As a rough planning band, think cents-to-low-dollars per image; confirm exact per-image prices on the AWS pricing page.

Fine-tuned / custom models. Customizing a model (e.g. fine-tuning Titan or another supported model) has two costs: a one-time training charge (priced by tokens processed during training) and an ongoing hosting charge — a custom model must run on Provisioned Throughput, billed per model-unit per hour, whether or not it is busy. That standing hosting cost is the part teams forget: a fine-tuned model that serves little traffic can be more expensive than calling a base model on-demand. Fine-tune only when a base model genuinely cannot meet the quality bar and you have steady volume to amortize the hosting.

Supporting services. Whatever the model, remember the surrounding AWS bill: vector stores for RAG, S3 for documents and batch I/O, Lambda/ECS for your application, CloudWatch for logs, and data transfer. These are billed by their own services, not Bedrock, but they belong in any honest total.

the fine-tuning trap

A fine-tuned model bills for hosting per hour, continuously, on Provisioned Throughput — not per request. If your custom model handles modest or spiky traffic, on-demand calls to a strong base model (Nova or Claude) are usually cheaper. Fine-tune for quality you cannot otherwise reach, with volume high enough to justify standing capacity.

whats next

VIIAn interactive calculator is coming

This page is intentionally a reference, not a live widget — so the numbers are transparent, citable, and easy to sanity-check by hand. An interactive calculator that automates the walkthrough above is on the roadmap.

The planned interactive tool will let you enter tokens-per-request, requests-per-month, and a model, then instantly show on-demand, Batch, cached, and Provisioned costs side by side, plus a stacked total across multiple workloads. Until it ships, the formula in §I plus the reference tables in §III give you everything the widget would — and the by-hand method means you can defend the estimate in a budget review rather than pointing at a black box.

Two siblings deepen specific parts of the estimate: amazon-bedrock-pricing for the full per-model price reference across every provider, and amazon-nova-pricing for the cheapest first-party tier in detail. For the discount mechanics, see amazon-bedrock-prompt-caching and amazon-bedrock-batch-inference.

why a content page beats a widget for this keyword

A reference page with the formula and worked tables is more useful and more citable than a black-box widget: Google and LLM answer engines can read and quote the numbers, and you can verify any estimate by hand. The interactive tool will complement this page, not replace it.

making it $0

VIIIHow AWS credits zero out the bill you just estimated

Whatever number your estimate landed on — $60/mo for a prototype or $36,000/mo for a high-volume frontier chatbot — AWS credits are designed to cover exactly this spend during the build-and-prove phase. This is the part that makes the calculator academic.

CloudRoute routes startups and companies to vetted AWS partners for two things: AWS credits and DevOps/ML-as-a-service. The customer pays $0 — AWS funds the credit pools through partner-incentive programs, and the partner pays CloudRoute a routing commission. You never see an invoice from us.

The credit pools that apply to Bedrock spend: AWS Activate Portfolio (up to $100K for institutionally funded startups), a Bedrock/GenAI proof-of-concept pool ($10K–$50K aimed specifically at GenAI POCs), and the Generative AI Accelerator (up to $1M for selected AI-first companies). At the monthly rates in the tables above, $100K of Activate credits covers a long runway: a medium chatbot on Claude Haiku (~$300/mo) runs for years on credits; even a high-volume Sonnet workload (~$36,000/mo) gets multiple months fully funded while you prove the product. These pools are largely partner-filed via AWS's ACE program — see the cross-cluster pages on $100K AWS credits, AWS credits for generative-AI startups, and AWS PoC / Bedrock POC funding.

Beyond credits, the partner can build and cost-optimize it with you — pick the right model tier, set up Batch and prompt caching, choose between on-demand and Provisioned, and put the workload in production — funded by the same AWS engagement programs. So the estimate on this page becomes the size of the credit ask, and the bill becomes $0 to you.

the CloudRoute offer, plainly

Estimate your Bedrock bill here, then let CloudRoute route you to AWS credits that cover it (Activate up to $100K, Bedrock/GenAI POC $10K–$50K) and a vetted AWS partner who builds and cost-optimizes the workload. AWS funds it; the partner pays CloudRoute; you pay $0.

one reference workload

Monthly cost by model — a medium chatbot (300,000 msgs/mo)

The clearest single comparison: one fixed workload — a chatbot at 300,000 messages/month, ~1,500 input and ~500 output tokens per message — priced across the four models, on-demand and with prompt caching. This is the number that makes "which model" concrete.

Model	On-demand / mo	With prompt caching / mo	Per-message cost	Relative to Nova Lite
Amazon Nova Lite	~$63	~$44	~$0.00021	1× (baseline)
Claude Haiku	~$300	~$219	~$0.0010	~5×
Llama (large, ~70B)	~$1,718	~$859	~$0.0057	~27×
Claude Sonnet	~$3,600	~$2,628	~$0.012	~57×

Representative 2026 figures — confirm current rates on the AWS Bedrock pricing page. Caching assumes ~1,200 of the 1,500 input tokens are a repeated, cacheable prefix. Same workload on Batch (if it were async) would be roughly half the on-demand column. The 57× spread between Nova Lite and Sonnet is why model choice is the dominant cost lever.

estimated your Bedrock bill?

Get AWS credits that cover it — and a partner who cuts it

Start in 3 minutes →

a recent match

A pre-build cost estimate that became $0 — anonymized

inquiry · seed-stage vertical-AI startup, customer-support copilot, Austin

Seed-stage vertical-AI startup (9 people) building a support copilot; mid-five-figure projected Bedrock spend at launch; new to AWS

Situation: The founders had modeled their Bedrock bill at roughly $4,000–$6,000/month at launch (a Sonnet-heavy chatbot plus RAG over customers' docs) and were nervous about burning runway on inference before product-market fit. They wanted a defensible estimate to show investors and a way to fund the first year of usage.

What CloudRoute did: Routed within 24 hours to a US-based AWS partner with a Bedrock cost-optimization track record. The partner rebuilt the estimate using the formula and reference tables here: routed ~85% of traffic to Claude Haiku with Sonnet escalation only on hard tickets, added prompt caching on the shared system prompt and retrieved context, and moved nightly bulk classification to Batch. Modeled cost dropped from ~$5,000 to ~$1,100/month. They then filed a Bedrock/GenAI POC credit request plus Activate.

Outcome: Credits approved covered the build engagement and well over a year of the optimized ~$1,100/month inference — the live bill to the customer was $0. The cleaner estimate also strengthened the investor deck. CloudRoute's commission was paid by the partner from AWS engagement funding — the customer paid $0.

engagement window: 4 weeks · modeled spend cut ~78% (model routing + caching + batch) · credits secured covering 12+ months · cost to customer: $0

faq

Common questions

Is there an official Amazon Bedrock pricing calculator?

AWS publishes per-model token prices on the Bedrock pricing page and offers the broader AWS Pricing Calculator for overall account estimates, but neither is a focused, model-by-model Bedrock workload calculator. This page fills that gap with the cost formula plus pre-computed reference tables for the common workloads, so you can estimate by hand and verify any number. An interactive widget that automates the walkthrough is on our roadmap.

What is the formula for calculating Bedrock cost?

For a text workload: monthly cost = (input tokens × input rate + output tokens × output rate) × requests per month, where rates are per 1,000 tokens. Sum it across every model you call. For image models use images × price-per-image; for embeddings only the input side is charged; for fine-tuned models add a one-time training charge plus ongoing per-hour hosting on Provisioned Throughput.

How much does a Bedrock chatbot cost per month?

At ~1,500 input and ~500 output tokens per message, representative 2026 monthly costs are: Nova Lite ~$6 (low / 30K msgs), ~$63 (medium / 300K), ~$630 (high / 3M); Claude Haiku roughly 5× those; Claude Sonnet roughly 57× those (~$3,600 at 300K msgs). Prompt caching typically cuts these 30–55%. The dominant variable is which model you pick — see the reference tables on this page.

How do I estimate the cost of a RAG application on Bedrock?

RAG is input-heavy because you put retrieved chunks into the prompt — assume ~4,000 input and ~400 output tokens per query. At 200,000 queries/month that is ~$67/mo on Nova Lite, ~$300 on Claude Haiku, ~$3,600 on Claude Sonnet. Add a small one-time embedding cost (Titan embeddings are ~$0.00002/1K tokens — embedding 50M tokens is ~$1) and your vector-store hosting, which is billed by OpenSearch/Aurora/Pinecone rather than Bedrock.

How much can prompt caching and Batch save?

Batch is ~50% cheaper than on-demand for asynchronous, non-latency-sensitive jobs — the single easiest cost win for bulk work. Prompt caching cuts the cost of repeated context (large system prompts, shared documents, few-shot blocks) and commonly reduces chat/RAG bills 30–55% depending on how much of the input is a repeated prefix. They stack with model choice, which is the biggest lever of all.

When is Provisioned Throughput cheaper than on-demand?

Only at steady, high utilization. Provisioned Throughput reserves dedicated capacity for a fixed hourly (or committed) price instead of per-token billing, so it wins when a predictable, busy production path keeps that capacity well used — and it also removes throttling risk. At low or spiky volume it is more expensive than on-demand. Estimate both and take the lower; many teams run on-demand until a workload is steady enough to justify reserving.

Why is this a content page and not an interactive calculator?

Deliberately. A reference page with the formula and worked tables is transparent (you can verify any number by hand), citable (Google and LLM answer engines can read and quote the figures), and defensible in a budget review — versus a black-box widget. An interactive calculator that automates the same math is planned and will complement this page rather than replace it.

How do AWS credits make the estimated bill $0?

AWS credit pools are sized to cover exactly this build-and-prove spend: Activate Portfolio (up to $100K), a Bedrock/GenAI POC pool ($10K–$50K), and the GenAI Accelerator (up to $1M). At the rates in the tables, $100K funds years of a medium chatbot or many months of a high-volume frontier workload. The pools are largely partner-filed; CloudRoute routes you to the right pool and a vetted AWS partner who can also cost-optimize the build — customer pays $0 because AWS funds the engagement.

Turn your Bedrock estimate into a $0 bill

CloudRoute routes you to AWS credits sized to cover your Bedrock spend (Activate up to $100K, Bedrock/GenAI POC $10K–$50K) and a vetted partner who builds and cost-optimizes it. Customer pays $0 — AWS funds it.

Get matched in 24h →→ see the data & AI persona detail

matched within< 24h

credit ceilingup to $100K+

cost to you$0