A practical way to estimate your Amazon Bedrock bill before you build. This page gives the one cost formula, a set of pre-computed monthly-cost tables for the three most common workloads — a chatbot, a RAG app, and a batch job — at low, medium, and high volume across Amazon Nova, Claude Haiku, Claude Sonnet, and Llama; the four levers that move the number; and a step-by-step walkthrough to estimate your own bill in five minutes. All figures are representative as of 2026 — confirm current rates on the AWS pricing page. An interactive calculator widget is on the way.
Every Bedrock estimate reduces to one formula. Once you can write it down for your workload, a "calculator" is just arithmetic — and you can sanity-check any number a tool gives you.
Bedrock text models are billed on tokens. A token is roughly ¾ of an English word, so 1,000 tokens ≈ 750 words. Every request is metered in two directions: input tokens (your system prompt + any conversation history or retrieved context + the user message) and output tokens (what the model generates). Each direction has its own published rate, almost always per 1,000 tokens (some pages quote per-million; divide by 1,000 to convert).
The monthly cost of a single text workload is therefore:
Monthly cost = ( input_tokens × input_rate + output_tokens × output_rate ) × requests_per_month — where rates are per 1,000 tokens and token counts are per request. Sum this across every model you call. For image models, swap in images × price_per_image. For embeddings, only the input side is charged. For fine-tuned models, add a one-time training charge plus an ongoing hourly hosting charge on Provisioned Throughput.
To make the reference tables concrete, here are the representative 2026 rates the math below uses. These are illustrative — confirm current rates on the AWS Bedrock pricing page before budgeting.
Four models span the practical cost range, and they are the ones the reference tables price: Amazon Nova Lite (a cheap, fast multimodal model), Claude Haiku (fast and inexpensive), Claude Sonnet (the all-round workhorse), and Llama (large, ~70B-class) (a capable open-weight option). Notice the spread: Nova Lite is roughly 50× cheaper on input than Sonnet. That spread is why "which model" dominates every estimate.
| Model | Input / 1K | Output / 1K | Input / 1M | Output / 1M | Typical role |
|---|---|---|---|---|---|
| Amazon Nova Lite | $0.00006 | $0.00024 | $0.06 | $0.24 | Cheap, fast, multimodal |
| Claude Haiku | $0.00025 | $0.00125 | $0.25 | $1.25 | Fast, low-cost, high-throughput |
| Llama (large, ~70B) | $0.00265 | $0.0035 | $2.65 | $3.50 | Capable open-weight |
| Claude Sonnet | $0.003 | $0.015 | $3.00 | $15.00 | Best all-round workhorse |
These are the tables most people want: the monthly on-demand cost of three common workloads, at three volumes, across the four models. Each table states its per-request token assumptions so you can scale it to your own numbers. All figures use the rate card above and are representative as of 2026.
Read these as shape, not gospel. The absolute dollars depend on rates that change; the ratios between models and between volumes are the durable insight. Two patterns jump out immediately: the cost scales linearly with volume (10× the traffic = 10× the bill), and the model choice swings the bill by 50–100× at any given volume. Pick the cheapest model that passes your quality bar and you have already won most of the cost battle.
Assumptions per message: ~1,500 input tokens (a system prompt plus a few turns of history plus the user message) and ~500 output tokens (a typical answer). Volumes: low = 30,000 messages/mo, medium = 300,000, high = 3,000,000.
| Volume (msgs/mo) | Nova Lite | Claude Haiku | Llama 70B | Claude Sonnet |
|---|---|---|---|---|
| Low — 30,000 | ~$6 | ~$30 | ~$172 | ~$360 |
| Medium — 300,000 | ~$63 | ~$300 | ~$1,720 | ~$3,600 |
| High — 3,000,000 | ~$630 | ~$3,000 | ~$17,200 | ~$36,000 |
Assumptions per query: ~4,000 input tokens (the user query plus several retrieved document chunks plus a system prompt) and ~400 output tokens (a grounded answer). RAG is input-heavy because you stuff retrieved context into the prompt. Volumes: low = 20,000 queries/mo, medium = 200,000, high = 2,000,000. Embedding the corpus is a separate, usually small, one-time-ish cost (see note).
| Volume (queries/mo) | Nova Lite | Claude Haiku | Llama 70B | Claude Sonnet |
|---|---|---|---|---|
| Low — 20,000 | ~$7 | ~$30 | ~$240 | ~$360 |
| Medium — 200,000 | ~$67 | ~$300 | ~$2,400 | ~$3,600 |
| High — 2,000,000 | ~$672 | ~$3,000 | ~$24,000 | ~$36,000 |
Assumptions per item: ~2,000 input tokens and ~300 output tokens (e.g. read a document, emit a label or short summary). Because batch jobs are not latency-sensitive, they run on Bedrock Batch, which is ~50% cheaper than on-demand — the figures below already apply that discount. Volumes: low = 100,000 items/mo, medium = 1,000,000, high = 10,000,000.
| Volume (items/mo) | Nova Lite | Claude Haiku | Llama 70B | Claude Sonnet |
|---|---|---|---|---|
| Low — 100,000 | ~$10 | ~$44 | ~$318 | ~$525 |
| Medium — 1,000,000 | ~$96 | ~$438 | ~$3,175 | ~$5,250 |
| High — 10,000,000 | ~$960 | ~$4,375 | ~$31,750 | ~$52,500 |
A calculator output is only useful if you know which inputs to change to bring it down. Four levers dominate, in rough order of impact. Pulling them in combination is how teams take a scary estimate and make it boring.
These are not mutually exclusive — a single product can route cheap requests to a small model on-demand, push nightly bulk work to Batch, cache a large shared system prompt, and reserve capacity for a steady high-QPS path, all at once.
Re-running the medium chatbot (300,000 msgs/mo, 1,500 in / 500 out) with ~1,200 of the 1,500 input tokens cached: Nova Lite ~$63 → ~$44, Claude Haiku ~$300 → ~$219, Claude Sonnet ~$3,600 → ~$2,628, Llama 70B ~$1,718 → ~$859. The bigger and more-repeated your shared context, the larger the cut.
Use this to produce a defensible monthly estimate for your specific workload in about five minutes. It is the same procedure an interactive calculator would automate — done by hand so you understand every input.
Work one workload at a time (chatbot, RAG, batch, image), then add the workloads together for a total. Keep a note of every assumption so you can revisit it when traffic or models change.
A RAG assistant: ~4,000 input + ~400 output tokens/query, 200,000 queries/mo, on Claude Haiku. Generation = (4,000/1,000 × $0.00025 + 400/1,000 × $0.00125) × 200,000 = ~$300/mo. Add ~$1 to embed the corpus with Titan, plus vector-store hosting (say ~$100–$300/mo on OpenSearch Serverless). Total ≈ $400–$600/mo — before credits make it $0.
Two workloads do not fit the token formula cleanly: image generation (billed per image) and custom/fine-tuned models (training charge plus hosting). Here is how to estimate each.
Image generation. Models like Amazon Nova Canvas, Titan Image Generator, and Stability's models are billed per generated image, with the price scaling by resolution and (sometimes) quality. Estimate: images_per_month × price_per_image. Editing operations (inpaint/outpaint) are billed per output image too. As a rough planning band, think cents-to-low-dollars per image; confirm exact per-image prices on the AWS pricing page.
Fine-tuned / custom models. Customizing a model (e.g. fine-tuning Titan or another supported model) has two costs: a one-time training charge (priced by tokens processed during training) and an ongoing hosting charge — a custom model must run on Provisioned Throughput, billed per model-unit per hour, whether or not it is busy. That standing hosting cost is the part teams forget: a fine-tuned model that serves little traffic can be more expensive than calling a base model on-demand. Fine-tune only when a base model genuinely cannot meet the quality bar and you have steady volume to amortize the hosting.
Supporting services. Whatever the model, remember the surrounding AWS bill: vector stores for RAG, S3 for documents and batch I/O, Lambda/ECS for your application, CloudWatch for logs, and data transfer. These are billed by their own services, not Bedrock, but they belong in any honest total.
A fine-tuned model bills for hosting per hour, continuously, on Provisioned Throughput — not per request. If your custom model handles modest or spiky traffic, on-demand calls to a strong base model (Nova or Claude) are usually cheaper. Fine-tune for quality you cannot otherwise reach, with volume high enough to justify standing capacity.
This page is intentionally a reference, not a live widget — so the numbers are transparent, citable, and easy to sanity-check by hand. An interactive calculator that automates the walkthrough above is on the roadmap.
The planned interactive tool will let you enter tokens-per-request, requests-per-month, and a model, then instantly show on-demand, Batch, cached, and Provisioned costs side by side, plus a stacked total across multiple workloads. Until it ships, the formula in §I plus the reference tables in §III give you everything the widget would — and the by-hand method means you can defend the estimate in a budget review rather than pointing at a black box.
Two siblings deepen specific parts of the estimate: amazon-bedrock-pricing for the full per-model price reference across every provider, and amazon-nova-pricing for the cheapest first-party tier in detail. For the discount mechanics, see amazon-bedrock-prompt-caching and amazon-bedrock-batch-inference.
A reference page with the formula and worked tables is more useful and more citable than a black-box widget: Google and LLM answer engines can read and quote the numbers, and you can verify any estimate by hand. The interactive tool will complement this page, not replace it.
Whatever number your estimate landed on — $60/mo for a prototype or $36,000/mo for a high-volume frontier chatbot — AWS credits are designed to cover exactly this spend during the build-and-prove phase. This is the part that makes the calculator academic.
CloudRoute routes startups and companies to vetted AWS partners for two things: AWS credits and DevOps/ML-as-a-service. The customer pays $0 — AWS funds the credit pools through partner-incentive programs, and the partner pays CloudRoute a routing commission. You never see an invoice from us.
The credit pools that apply to Bedrock spend: AWS Activate Portfolio (up to $100K for institutionally funded startups), a Bedrock/GenAI proof-of-concept pool ($10K–$50K aimed specifically at GenAI POCs), and the Generative AI Accelerator (up to $1M for selected AI-first companies). At the monthly rates in the tables above, $100K of Activate credits covers a long runway: a medium chatbot on Claude Haiku (~$300/mo) runs for years on credits; even a high-volume Sonnet workload (~$36,000/mo) gets multiple months fully funded while you prove the product. These pools are largely partner-filed via AWS's ACE program — see the cross-cluster pages on $100K AWS credits, AWS credits for generative-AI startups, and AWS PoC / Bedrock POC funding.
Beyond credits, the partner can build and cost-optimize it with you — pick the right model tier, set up Batch and prompt caching, choose between on-demand and Provisioned, and put the workload in production — funded by the same AWS engagement programs. So the estimate on this page becomes the size of the credit ask, and the bill becomes $0 to you.
Estimate your Bedrock bill here, then let CloudRoute route you to AWS credits that cover it (Activate up to $100K, Bedrock/GenAI POC $10K–$50K) and a vetted AWS partner who builds and cost-optimizes the workload. AWS funds it; the partner pays CloudRoute; you pay $0.
The clearest single comparison: one fixed workload — a chatbot at 300,000 messages/month, ~1,500 input and ~500 output tokens per message — priced across the four models, on-demand and with prompt caching. This is the number that makes "which model" concrete.
| Model | On-demand / mo | With prompt caching / mo | Per-message cost | Relative to Nova Lite |
|---|---|---|---|---|
| Amazon Nova Lite | ~$63 | ~$44 | ~$0.00021 | 1× (baseline) |
| Claude Haiku | ~$300 | ~$219 | ~$0.0010 | ~5× |
| Llama (large, ~70B) | ~$1,718 | ~$859 | ~$0.0057 | ~27× |
| Claude Sonnet | ~$3,600 | ~$2,628 | ~$0.012 | ~57× |
Situation: The founders had modeled their Bedrock bill at roughly $4,000–$6,000/month at launch (a Sonnet-heavy chatbot plus RAG over customers' docs) and were nervous about burning runway on inference before product-market fit. They wanted a defensible estimate to show investors and a way to fund the first year of usage.
What CloudRoute did: Routed within 24 hours to a US-based AWS partner with a Bedrock cost-optimization track record. The partner rebuilt the estimate using the formula and reference tables here: routed ~85% of traffic to Claude Haiku with Sonnet escalation only on hard tickets, added prompt caching on the shared system prompt and retrieved context, and moved nightly bulk classification to Batch. Modeled cost dropped from ~$5,000 to ~$1,100/month. They then filed a Bedrock/GenAI POC credit request plus Activate.
Outcome: Credits approved covered the build engagement and well over a year of the optimized ~$1,100/month inference — the live bill to the customer was $0. The cleaner estimate also strengthened the investor deck. CloudRoute's commission was paid by the partner from AWS engagement funding — the customer paid $0.
engagement window: 4 weeks · modeled spend cut ~78% (model routing + caching + batch) · credits secured covering 12+ months · cost to customer: $0
CloudRoute routes you to AWS credits sized to cover your Bedrock spend (Activate up to $100K, Bedrock/GenAI POC $10K–$50K) and a vetted partner who builds and cost-optimizes it. Customer pays $0 — AWS funds it.