for AWS partners →Cover your token bill with AWS credits →

bedrock token costs · input vs output · 2026

Amazon Bedrock token costs — what a token is, and what each one costs.

A focused, neutral reference for how Amazon Bedrock charges for tokens in 2026: exactly what counts as a token and how Bedrock meters it, why input and output tokens are priced separately (and why output costs 3–5× more), a big side-by-side per-model table in both per-1K and per-1M for Nova, Claude, Llama, Mistral, Cohere and Titan, how to estimate token counts for your own text before you ship, why output tokens quietly dominate most bills, and how prompt caching and Batch change the per-token math. All figures are representative as of 2026 — confirm current rates on the AWS Bedrock pricing page.

Cover your token bill with AWS credits →→ jump to the per-model token table

billing unit

the token

1,000 tokens ≈

750 words

output vs input

~3–5× pricier

cost with credits

TL;DR

Bedrock charges per token, and a token is a sub-word chunk — roughly ¾ of an English word, so 1,000 tokens ≈ 750 words. Every request is metered twice: input tokens (everything you send — prompt, system instruction, history, retrieved context) and output tokens (everything the model writes back). You pay a separate published rate for each, almost always quoted per 1,000 or per 1,000,000 tokens.
Output tokens are the expensive half — typically priced 3–5× the input rate for the same model — and in most real workloads they quietly dominate the bill even though there are fewer of them. The two numbers that set your cost are which model you call (a 400×+ range from Amazon Nova Micro to a Claude Opus-class model) and how many output tokens you let it generate.
You can estimate cost before you ship: count characters ÷ 4 (or words × 1.33) for a rough token count, multiply input and output volumes by the model's per-1K rates, and add the two. Prompt caching cuts the per-token rate on repeated input context, and Batch cuts the per-token rate ~50% on non-interactive jobs. And for startups, AWS credits cover the whole token bill — CloudRoute routes you to the credit pool and a vetted partner so the build costs $0.

the unit

IWhat a token actually is on Bedrock

Before any price makes sense you have to be precise about the thing being priced. On Bedrock the billable unit is the token, not the word, the character, or the API call — and the gap between a token and a word is where most budget surprises start.

A token is a sub-word chunk of text produced by the model's tokenizer. Common short words are usually a single token ("the", "cloud", "model"), but longer or rarer words split into several ("tokenization" might be two or three tokens), and whitespace, punctuation, numbers, and code symbols all consume tokens too. A useful working rule for English prose is that 1 token ≈ ¾ of a word, so 1,000 tokens ≈ 750 words and a token is roughly 4 characters on average. Other languages and code tokenize less efficiently — non-English text and source code often use noticeably more tokens per word.

Each model family has its own tokenizer, so the exact same paragraph can be a slightly different number of tokens on Claude versus Llama versus Amazon Nova. The differences are usually small for ordinary English, but they mean a token count is always model-specific — there is no single universal token. When you compare model prices, you are comparing the rate per token and implicitly the efficiency of that model's tokenizer, though the per-token rate dominates the comparison in practice.

Crucially, Bedrock meters tokens in both directions of every request. Input tokens are everything you send into the model on a given call: the user's message, your system prompt or instructions, any conversation history you replay, any documents or retrieved context you stuff into the prompt, and any tool or function definitions. Output tokens are everything the model generates in response. You are billed for each separately, at different rates, and the count resets every request — there is no flat monthly token allowance on the on-demand path; you pay for exactly what flows through.

A few practical consequences fall straight out of this definition. Long system prompts are not free — a 600-word instruction block re-sent on every call is ~800 input tokens billed every single time (which is exactly what prompt caching exists to fix). Conversation history compounds — replaying the whole transcript on each turn means later turns in a long chat cost far more in input than early ones. And retrieved context in RAG is usually the largest input component — pulling 3,000 tokens of documents to answer a 50-token question means the question is a rounding error and the retrieval is the cost.

the one-line definition

A token ≈ ¾ of an English word (≈ 4 characters); 1,000 tokens ≈ 750 words. Bedrock bills input tokens (everything you send: prompt + system instruction + history + retrieved context + tool defs) and output tokens (everything the model writes) separately, per request, at different rates.

two prices, one request

IIInput tokens vs output tokens — why there are two prices

The single most important thing to internalize about Bedrock token costs is that input and output are billed at different rates, and output is the expensive one. Almost every cost-design decision flows from this asymmetry.

For a given model, the published output rate is typically 3–5× the input rate. On a representative Claude Sonnet-class model that is roughly $3 per million input tokens against $15 per million output — a 5× gap. The reason is mechanical: reading your prompt is a single forward pass the model can process in parallel, but generating output is autoregressive — the model produces one token at a time, each conditioned on all the tokens before it, so output is far more compute-intensive per token. You are paying for that difference.

This asymmetry means two workloads with the same total token count can have wildly different bills depending on the input/output split. A workload that reads a lot and writes a little — classification, extraction, routing, sentiment scoring, "answer yes or no" — is cheap, because the bulk of its tokens are charged at the lower input rate. A workload that writes a lot from a short prompt — drafting articles, generating code, long-form summaries, synthetic data — is dominated by the higher output rate even though it sends very little in.

There is a second-order effect that catches teams out: output tokens are also the ones you control least precisely. You know exactly how long your input is before you send it, but the model decides how much to generate (up to your max-output cap). A prompt that invites a rambling answer can cost several times what a tightly-scoped one does for identical input. This is why "cap your max output tokens" and "ask for terse answers" are real cost levers, not just style preferences — and why streaming a response does not change the price (you are billed for the tokens generated, not for how they are delivered).

The design takeaway is to think in two budgets, not one. Estimate input tokens (which you can measure precisely from your prompt template, history policy, and retrieval size) and output tokens (which you bound with a max-tokens limit and shape with instructions) separately, price each at the model's respective rate, and add them. Treating "tokens" as one undifferentiated number is the most common way teams misjudge a Bedrock bill — usually underestimating it, because they reason about the short prompt they typed and forget the long answer they asked for.

input vs output tokens · how they differ · 2026

Dimension	Input tokens	Output tokens
What it covers	Prompt + system instruction + history + retrieved context + tool defs	Everything the model generates back
Relative price	Baseline (lower)	Typically 3–5× the input rate
Why	Read in one parallel forward pass	Generated one token at a time (autoregressive)
How predictable	Known exactly before you send	Bounded by your max-output cap; model decides actual length
Main lever	Trim history, shrink retrieved context, prompt caching	Cap max tokens, ask for terse answers
Cheap when	Workload reads a lot, writes a little (classify/extract)	Rarely the cheap side — minimize generated length

Same total token count, very different bills: the input/output split usually matters more than the raw token volume. Both rates are per-model — see the full table in §IV.

count before you ship

IIIHow to estimate token counts for your own text

You do not need to call the API to get a usable cost estimate. With two back-of-envelope rules and a simple formula you can size a Bedrock bill from a prompt template and an expected answer length before you write a line of production code.

For a fast first pass on English text, use either of these equivalent approximations: tokens ≈ characters ÷ 4, or tokens ≈ words × 1.33. So a 500-word document is roughly 665 tokens; a 2,000-character chunk is roughly 500 tokens. These are deliberately rough — they run a little high for simple prose and a little low for code, numbers, and non-English text, which tokenize less efficiently — but they are accurate enough to choose a model and set a budget. When you need precision, count tokens exactly with the model family's own tokenizer (the Bedrock Converse API returns the actual input and output token counts in its response metadata, which is the ground truth for any real workload).

The per-request cost formula then has just two terms, one per direction:

Step 1 — estimate input tokens — Add up everything you send on one call: system prompt + user message + any replayed history + retrieved/RAG context + tool definitions. Convert with characters ÷ 4 or words × 1.33. This is your input token count per request.
Step 2 — estimate output tokens — Decide the typical answer length you expect (or will cap with max-tokens) and convert the same way. A "short answer" is ~50–150 tokens; a paragraph ~200–300; a long structured response 600–1,000+. This is your output token count per request.
Step 3 — price each direction — Per-request cost = (input tokens ÷ 1,000 × input rate per 1K) + (output tokens ÷ 1,000 × output rate per 1K). Use the model's two rates from the §IV table.
Step 4 — scale to volume — Multiply the per-request cost by requests per month. Or work in millions directly: (monthly input tokens ÷ 1,000,000 × input rate per 1M) + (monthly output tokens ÷ 1,000,000 × output rate per 1M).

A quick worked example to anchor the method. Suppose a support assistant on a Claude Haiku-class model sends, per request, an 800-token input (a fixed system prompt plus a short question plus a little history) and generates a 400-token answer. Per request that is 0.8 × $0.00025 + 0.4 × $0.00125 ≈ $0.0002 + $0.0005 = $0.0007, or about $0.70 per thousand conversations. Two things to notice immediately: even though the input is twice the size of the output, the output contributes more of the cost (the 5× rate gap outweighs the 2× volume gap) — and the absolute number is tiny, which is why prototypes feel free and only scale makes Bedrock a real line item.

the estimate in one formula

tokens ≈ characters ÷ 4 (or words × 1.33). Cost per request = (input tokens ÷ 1K × input rate) + (output tokens ÷ 1K × output rate). Multiply by monthly requests. For exact counts, read the token usage the Converse API returns on every call. To model a full mix, see the amazon-bedrock-pricing-calculator sibling.

the per-model table

IVPer-model token costs — input and output, per 1K and per 1M

This is the reference table itself: representative 2026 on-demand token rates across the major model families on Bedrock — Amazon Nova, Claude, Llama, Mistral, Cohere, and Amazon Titan — shown side by side in both per-1,000 and per-1,000,000 tokens, with input and output broken out so the asymmetry is visible at a glance.

Rows run roughly cheapest to most expensive by output rate (since output usually drives cost). The per-1K and per-1M columns are the same number scaled by 1,000 — both are included because AWS and model providers quote sometimes one, sometimes the other, and having both side by side removes a constant source of confusion. The final column is the output-to-input ratio, which makes the 3–5× rule concrete and shows how much "writing" costs relative to "reading" on each model.

Read the table as a ranking and a sanity-check, not an audited price sheet. Foundation-model prices move frequently as providers compete, vary by region, and exclude the prompt-caching and Batch discounts covered in §VI. The spread is the real lesson: from Amazon Nova Micro to a Claude Opus-class model, the output rate ranges roughly $0.14 to $75 per million — a factor of more than 500×. Picking the smallest model that does the job is, by a wide margin, the biggest token-cost decision you make.

representative on-demand bedrock token costs · per 1K and per 1M · input vs output · 2026

Model	Provider	Input / 1K	Output / 1K	Input / 1M	Output / 1M	Output:input
Nova Micro	Amazon	$0.000035	$0.00014	$0.035	$0.14	4×
Nova Lite	Amazon	$0.00006	$0.00024	$0.06	$0.24	4×
Mistral Small	Mistral	$0.0002	$0.0006	$0.20	$0.60	3×
Llama (small, ~8B)	Meta	$0.00022	$0.00072	$0.22	$0.72	~3.3×
Titan Text Lite	Amazon	$0.00015	$0.0002	$0.15	$0.20	~1.3×
Claude Haiku	Anthropic	$0.00025	$0.00125	$0.25	$1.25	5×
Titan Text Express	Amazon	$0.0002	$0.0006	$0.20	$0.60	3×
Cohere Command	Cohere	$0.001	$0.002	$1.00	$2.00	2×
Llama (large, ~70B+)	Meta	$0.00265	$0.0035	$2.65	$3.50	~1.3×
Mistral Large	Mistral	$0.002	$0.006	$2.00	$6.00	3×
Nova Pro	Amazon	$0.0008	$0.0032	$0.80	$3.20	4×
Nova Premier	Amazon	$0.0025	$0.0125	$2.50	$12.50	5×
Claude Sonnet	Anthropic	$0.003	$0.015	$3.00	$15.00	5×
Claude Opus-class	Anthropic	$0.015	$0.075	$15.00	$75.00	5×

Representative 2026 figures for relative comparison only — confirm current rates on the AWS Bedrock pricing page. Output is typically 3–5× input (the open-weight Llama-large and Titan text rows are closer because their input rate is relatively high). Per-1M = per-1K × 1,000. Excludes prompt-caching and Batch discounts (§VI); image/video and embeddings are priced differently (see note in §V). Rates vary by region.

where the money goes

VWhy output tokens quietly dominate most bills

Teams instinctively optimize the prompt — the thing they wrote and can see. But on a large fraction of real workloads, the output tokens are where the money actually goes, and they get less attention precisely because the model, not the engineer, produces them.

The arithmetic is straightforward once you combine the two facts from §II. Output is priced 3–5× higher per token, so even when a request generates fewer output tokens than it consumes in input, output can still be the larger cost line. Take the Haiku example from §III: 800 input tokens and 400 output tokens — twice as many input tokens — yet output costs $0.0005 against input's $0.0002, more than double. The rate gap beats the volume gap. Unless your workload is genuinely input-heavy (long retrieved context, short answers), expect output to be a big share or the majority of the bill.

This flips the usual optimization instinct. Trimming the system prompt by 100 tokens saves you 100 input tokens per call; letting the model write a 500-token answer where 150 would do costs you 350 output tokens per call at 3–5× the rate — roughly 10–17× more impact per token. The highest-leverage token optimization on a generation-heavy workload is almost always making the output shorter: cap max-output tokens, instruct the model to be concise, ask for structured/JSON output instead of prose, and avoid prompting patterns that invite preamble ("Sure! Here is a detailed explanation…") when you only need the answer.

There are important exceptions, and naming them keeps the rule honest. RAG and long-context workloads are input-dominated — when you stuff 3,000–10,000 tokens of retrieved documents into every call and get back a 300-token answer, input is the bill and retrieval tuning (fewer, better chunks) plus prompt caching are the levers. Classification and extraction are input-dominated for the same reason — lots of text in, a label or a few fields out. So the real guidance is: identify which side of each workload is the cost, then optimize that side. Generation-heavy → shrink output. Context-heavy → shrink and cache input. The mistake is optimizing the side that is not the cost.

One more reason output deserves attention: it is the harder number to predict and the easier one to let drift. Input is fixed by your template and policies; output length is a behavior of the model that can change when you swap models, tweak a prompt, or upgrade to a more verbose version. Monitoring average output tokens per request over time is one of the most useful Bedrock cost metrics, because a silent increase there — a new model that "explains its reasoning," a prompt change that loosened the format — shows up directly on the bill.

the counter-intuitive rule

Because output is priced 3–5× input, shortening the answer usually saves more than shortening the prompt — often 10×+ more per token. On generation-heavy workloads, cap max-output tokens and demand concise/structured output first. On RAG and classification (input-dominated), do the opposite: shrink and cache the input. Note: image/video are billed per-image/second and embeddings per input token only — neither follows the input/output split above.

changing the per-token math

VIHow prompt caching and Batch change the per-token cost

The per-token rates in §IV are the on-demand list price. Two mechanisms change the actual rate you pay without changing the model: prompt caching discounts repeated input tokens, and Batch discounts every token on non-interactive jobs. Both are the difference between a list-price bill and a real one.

These stack with the model-choice lever rather than replacing it, and they target different parts of the bill, so the right move depends on which side dominates (per §V). An input-heavy chatbot benefits most from prompt caching (discounting the repeated context it re-sends). A high-volume offline pipeline benefits most from Batch (halving every token on work that can wait). A single product often uses both: interactive traffic served on-demand with caching on the shared prompt, and nightly bulk jobs pushed through Batch. What neither does is change the fundamental unit — you are still paying per token; you are just paying a lower rate per token. The broader set of cost levers and the full pricing-mode comparison live on the amazon-bedrock-pricing page; this page stays on the token math itself.

Prompt caching — a discount on repeated input tokens

Prompt caching attacks the input side specifically. When many requests share a large common prefix — a long fixed system prompt, a big instruction set, a reference document, or few-shot examples — caching lets Bedrock store that prefix so subsequent requests do not pay full input price for it again. Cached input tokens are billed at a steep discount versus normal input tokens (with a smaller one-time charge to write the cache). It changes nothing about output pricing — it is purely an input-token discount — but on workloads where the same context is re-sent thousands of times (chatbots with a long system prompt, RAG with shared instructions, agents with large tool definitions), the input portion of the bill can fall by a large fraction. It only helps when context actually repeats; a workload where every prompt is unique gets no benefit. See the dedicated amazon-bedrock-prompt-caching page for the exact mechanics and cache lifetime.

Batch — ~50% off every token, asynchronously

Batch attacks both sides at once but trades away latency. You submit a large set of requests as a single job (typically a file in S3) and Bedrock processes them in the background, returning results when done. In exchange for giving up real-time responses, both input and output tokens are billed at roughly half the on-demand rate. There is no per-model gymnastics — it is a flat ~50% cut on the same token pricing, applied to bulk, non-interactive work: corpus summarization, classification at scale, enrichment, offline evaluation, embedding a large dataset. For any token-heavy job that does not need an instant answer, Batch is the single easiest way to halve the per-token cost. See the amazon-bedrock-batch-inference page for the job mechanics.

two discounts, two targets

Prompt caching = a steep discount on repeated input tokens (helps input-heavy, repetitive workloads). Batch = ~50% off both input and output for non-interactive jobs. They stack with model choice and with each other across different request paths — pick the one that targets your dominant cost side.

how it becomes $0

VIIHow AWS credits make your token bill $0 to build

Everything above prices tokens if you pay AWS directly. For most startups and many companies the relevant number is different, because Bedrock token spend is fully credit-eligible — every input and output token, plus embeddings and the supporting services, draws down AWS credits before it touches your card.

AWS runs several credit programs specifically to put generative-AI workloads on AWS, and Bedrock token usage counts against them automatically. The relevant pools: AWS Activate (general startup credits, commonly up to $100K for institutionally-funded startups); a dedicated Bedrock / Generative-AI POC pool ($10K–$50K) aimed at proving out a specific GenAI use case; and the competitive Generative AI Accelerator (credit awards up to $1M for a small cohort of AI-first startups). Credits apply against your AWS bill — including every Bedrock input and output token, embeddings, fine-tuning, and the vector store and storage around them — until they run out.

The practical catch is that most of these pools are partner-filed: they are requested through the AWS Partner Network (the ACE program), not a public self-serve form. That is why teams typically route through an AWS partner rather than applying alone — and it is the gap CloudRoute fills. CloudRoute matches you to the right credit pool for your stage and to a vetted AWS DevOps/ML partner who both files the credit application and helps build the workload cost-efficiently (the tiered model routing, the prompt caching, the Batch pipelines described above). The customer pays $0 — AWS funds the credit pool, AWS pays the partner through engagement-funding programs, and the partner pays CloudRoute a routing commission. You never see an invoice.

Put together with the token math on this page, the picture for a startup is simple: estimate your token costs so you understand the shape of the workload, build it cost-efficiently (right-sized model, capped output, caching, Batch), and then have a $25K–$100K credit pool absorb the spend entirely while you find product-market fit — paying real money only once usage, and ideally revenue, has scaled past the credits. Related: the cross-cluster pages on AWS credits for generative-AI startups and Bedrock POC funding cover the credit mechanics in full.

same workload, different model

The same 50M-token workload, priced across models

To make the per-token spread concrete, here is one identical illustrative workload — 50M tokens a month split 60/40 input/output (30M input, 20M output), a typical assistant-style mix — priced on six models from the §IV table. It isolates the single biggest token-cost lever: model choice. Figures are representative 2026 illustrations, not quotes.

Model	Input cost (30M)	Output cost (20M)	Est. monthly total	Output share	vs cheapest
Nova Micro	$1.05	$2.80	~$3.85	73%	1× (baseline)
Nova Lite	$1.80	$4.80	~$6.60	73%	~1.7×
Claude Haiku	$7.50	$25.00	~$32.50	77%	~8×
Nova Pro	$24.00	$64.00	~$88.00	73%	~23×
Claude Sonnet	$90.00	$300.00	~$390.00	77%	~101×
Claude Opus-class	$450.00	$1,500.00	~$1,950.00	77%	~506×

Same 50M tokens (30M in / 20M out) every row — only the model changes, and the bill spans ~$4 to ~$1,950 (a >500× range). Note that output is the majority of the cost in every row despite being the minority of the tokens, which is the §V point made numeric. Representative 2026 figures; confirm current rates on the AWS Bedrock pricing page. Batch (~50% off) and prompt caching would lower every row.

before your token bill is real money

Get AWS credits that cover every Bedrock token — and a partner to build it (you pay $0)

Get matched in 24h →

a recent match

A token bill cut by an order of magnitude — and then funded to $0 — anonymized

inquiry · seed-stage AI writing tool, Berlin

Seed-stage AI writing product, 9 people, generating long-form drafts on a frontier model for every request

Situation: The product was output-heavy by design — users asked for full drafts, so most of every request's tokens were generation, billed at the frontier model's top output rate. With no max-output discipline and a frontier model on every call, the modeled token bill was climbing fast as usage grew, and the team had no spare runway to absorb it during the seed period.

What CloudRoute did: CloudRoute matched them in under 24 hours to an EU AWS partner with GenAI cost-engineering experience. The partner (1) moved the easy 70% of generations — outlines, short rewrites, tone tweaks — onto Nova Lite / Claude Haiku and reserved the frontier model only for full long-form drafts; (2) set sensible max-output caps and switched to structured output where the UI allowed it, cutting average output tokens per request by roughly a third; (3) cached the large shared style-and-instruction prompt; and (4) filed a Bedrock POC credit application plus an Activate application to fund the launch.

Outcome: Modeled token cost fell by roughly an order of magnitude through model-routing and output discipline before any discount — and the remaining spend was fully covered by the approved credits, so the team paid $0 during the build and early launch. CloudRoute's commission was paid by the partner from AWS engagement funding, not by the customer.

token cost cut: ~10× via routing + output caps · credits secured: POC + Activate · out-of-pocket during build: $0

faq

Common questions

How much does a token cost on Amazon Bedrock?

It depends entirely on the model and on whether the token is input or output. Representative 2026 on-demand rates run from about $0.035 per million input / $0.14 per million output on the cheapest models (Amazon Nova Micro) up to about $15 per million input / $75 per million output on a Claude Opus-class model — a spread of more than 500×. Output tokens are typically priced 3–5× higher than input tokens for the same model. Always confirm current rates on the AWS Bedrock pricing page, and remember Batch (~50% off) and prompt caching can lower the effective per-token rate.

What counts as a token in Bedrock?

A token is a sub-word chunk of text from the model's tokenizer — roughly ¾ of an English word, or about 4 characters, so 1,000 tokens ≈ 750 words. Bedrock counts tokens in both directions of every request: input tokens are everything you send (your prompt, the system instruction, any conversation history, retrieved/RAG context, and tool definitions) and output tokens are everything the model generates back. You are billed separately for input and output, and the count resets every request. Code and non-English text use more tokens per word than plain English.

Why are output tokens more expensive than input tokens?

Because generation is the compute-intensive part. Reading your input is a single forward pass the model processes in parallel, but output is autoregressive — the model produces one token at a time, each conditioned on all the previous ones — so each output token costs more to produce. That is why output is typically priced 3–5× the input rate for the same model, and why on most workloads output is the larger share of the bill even when there are fewer output tokens than input tokens.

How do I estimate how many tokens my text will use?

Use tokens ≈ characters ÷ 4, or equivalently words × 1.33, for a quick English estimate — so a 500-word document is roughly 665 tokens. These rules run slightly high for prose and slightly low for code and non-English text. For exact counts, the Bedrock Converse API returns the actual input and output token counts in its response metadata on every call, which is the ground truth. Estimate input and output separately, since they are priced separately.

How do I calculate the token cost of a Bedrock request?

Cost per request = (input tokens ÷ 1,000 × the model's input rate per 1K) + (output tokens ÷ 1,000 × its output rate per 1K), then multiply by requests per month. For example, an 800-token input and 400-token output on a Claude Haiku-class model ($0.00025 / $0.00125 per 1K) is about $0.0007 per request, or ~$0.70 per thousand requests. You can also work in millions directly using the per-1M rates. See the amazon-bedrock-pricing-calculator to model a full mix of models and volumes.

Do input and output tokens cost the same on Bedrock?

No. They are billed at different published rates, and output is the expensive one — typically 3–5× the input rate for the same model. This means a workload that reads a lot and writes a little (classification, extraction, routing) is cheap, while one that writes a lot from a short prompt (long-form generation, code, synthetic data) is dominated by output cost. Always price the two directions separately rather than treating tokens as one number.

How can I reduce Bedrock token costs?

The biggest lever is model choice — route easy requests to a cheap model (Nova Micro/Lite, Claude Haiku) and reserve frontier models for hard ones; the per-token spread is over 500×. After that: cap max-output tokens and ask for concise or structured answers (output is the pricier side, so this saves the most on generation-heavy work); turn on prompt caching to discount repeated input context on chatbots and RAG; move non-interactive bulk jobs to Batch for ~50% off every token; and trim retrieved context and conversation history on input-heavy workloads. The full set of levers is on the amazon-bedrock-pricing page.

Are images and embeddings billed per token on Bedrock too?

Not the same way. Text models are billed per input and output token as described here. Image and video generation (Amazon Nova Canvas/Reel, Stability) are billed per image or per second of video, not per token. Embedding models (Amazon Titan Text Embeddings, Cohere Embed) are billed per input token only — the output vector is not charged — at very low rates, so the cost there comes from volume when you embed a large corpus. Only the text input/output token split on this page follows the 3–5× rule.

Can AWS credits cover Bedrock token costs?

Yes — every Bedrock input and output token is credit-eligible, along with embeddings, fine-tuning, and supporting services, and credits apply automatically against your AWS bill until exhausted. The relevant pools are AWS Activate (up to $100K), a dedicated Bedrock/GenAI POC pool ($10K–$50K), and the GenAI Accelerator (up to $1M for selected startups). These are largely partner-filed via the AWS Partner Network, which is why teams route through a partner. CloudRoute matches you to the right pool and a vetted AWS partner who files the application and builds the workload — customer pays $0, AWS funds it.

Stop counting tokens — get them funded

However your token bill pencils out, AWS credits can cover every input and output token. CloudRoute routes you to the right credit pool (Activate up to $100K, Bedrock POC $10K–$50K, GenAI Accelerator up to $1M) and a vetted AWS partner to build and cost-tune the workload. Customer pays $0.

Get matched in 24h →→ see the AI-team persona detail

matched within< 24h

GenAI credit ceilingup to $1M

cost to you$0