for AWS partners →Fund your Bedrock build with AWS credits →

amazon bedrock models · the full 2026 catalog

Every model on Amazon Bedrock — the complete 2026 catalog.

Q: Which models are available on Amazon Bedrock in 2026?

Bedrock hosts foundation models from eight-plus providers: Anthropic Claude (Haiku/Sonnet/Opus), Amazon Nova (Micro/Lite/Pro/Premier text, Canvas image, Reel video, Act agentic) and Amazon Titan (text + embeddings), Meta Llama, Mistral, Cohere (Command, Embed, Rerank), Stability AI (Stable Diffusion / Stable Image), AI21 (Jamba), and DeepSeek (reasoning models). Exact versions vary by AWS Region and change continuously — confirm the live list under Model access in the Bedrock console.

Q: What is the best model on Amazon Bedrock?

There is no single best model — the catalog is a cost/capability curve and the right choice depends on the task, quality bar, latency budget, and volume. For frontier reasoning, coding, and agents, Anthropic's Claude (Opus/Sonnet) is the most widely deployed; for lowest cost and latency, Amazon Nova; for open-weight fine-tuning freedom, Meta Llama; for embeddings/RAG, Titan or Cohere. Most production systems route across several models. Run a Bedrock model evaluation on your own data to decide.

Q: How do I choose which Bedrock model to use?

Start from the job (classify, extract, summarize, reason, code, generate an image, embed for search), then match it to the cheapest tier that clears your quality bar: small models (Nova Micro/Lite, Claude Haiku) for high-volume easy calls, a workhorse (Claude Sonnet, Nova Pro) for production reasoning, a frontier model (Claude Opus, Nova Premier) only for the hardest steps, and an embedding model (Titan, Cohere) for anything search-shaped. Validate with Bedrock model evaluation on a representative slice of your real traffic before committing.

Q: What context windows do Bedrock models support?

Context windows vary widely by model and are raised over time. As representative orders of magnitude in 2026: Claude around ~200K tokens, AI21 Jamba around ~256K, Llama and Mistral and several Nova and DeepSeek models in the ~128K range, and older Titan Text models smaller (~8K–32K). Some frontier models support very large windows in specific Regions or configurations. Because you pay per input token, a larger window costs more per call — pair big windows with prompt caching. Confirm the current window for your exact model and Region in the documentation.

Q: How are Bedrock models priced?

There is no platform fee. Text models are billed per 1,000 input tokens and per 1,000 output tokens at a rate set per model, with output several times more expensive than input; embedding models are billed per 1,000 input tokens with no output charge; image models bill per image and video models per second. Four levers apply across the catalog: on-demand (default), batch (~50% cheaper, asynchronous), provisioned throughput (reserved capacity), and prompt caching (discount on repeated context). Rates vary by model and Region — see the AWS Bedrock pricing page.

Q: Can I switch between Bedrock models without rewriting my code?

For chat models, yes — almost always. The Converse API gives one consistent request/response schema across providers, so switching from one model to another is typically just changing the modelId string. That is what makes cost-saving model routing (cheap model for easy calls, frontier model for hard ones) a configuration decision rather than a rewrite. Non-conversational modalities like image, video, and embeddings use the lower-level InvokeModel API, where request shapes are model-specific.

Q: Do all Bedrock models keep my data private?

Yes. Across every model in the catalog, your prompts and outputs are not used to train the underlying foundation models and are not shared with the model providers — AWS serves the providers' models on your behalf inside AWS. Content is encrypted in transit and at rest (with optional customer-managed KMS keys), processed only in the AWS Region you call, can be kept off the public internet via VPC endpoints (PrivateLink), and is governed by IAM with full CloudTrail audit logging.

Q: How can I afford to run many models in production?

Two ways. First, control unit cost: route the easy majority of calls to small models, run offline work as batch (~50% off), enable prompt caching for repeated context, and reserve provisioned throughput only at high steady volume. Second, fund the bill with AWS credits — Activate Portfolio (up to $100K), Bedrock/GenAI POC ($10K–$50K), and the GenAI Accelerator (up to $1M). CloudRoute routes you to a vetted AWS partner who files the credit application and can build the multi-model workload with you; AWS funds the credits and the engagement, so you pay $0.

One managed API, every major foundation-model family: Anthropic Claude, Meta Llama, Mistral, Amazon Nova and Titan, Cohere, Stability AI, AI21, and DeepSeek. This is the reference catalog — every provider with its representative models, modality, context window, best-fit use, and relative cost in one scannable table — plus how to choose a model, how to turn it on, and a link into each model's deep page.

Fund your Bedrock build with AWS credits →→ jump to the full catalog table

model providers

modalities

text · image · video · embeddings

shared request schema

Converse API

data used to train base models

none

TL;DR

Amazon Bedrock hosts foundation models from eight-plus providers behind one API: Anthropic (Claude), Meta (Llama), Mistral, Amazon (Nova + Titan), Cohere, Stability AI, AI21 (Jamba), and DeepSeek. They span text/chat, reasoning, image, video, and embeddings — and you reach almost all chat models through the single Converse API, so switching is usually a one-line modelId change.
There is no single "best" model. The catalog is a cost/capability curve: ultra-cheap small models (Nova Micro/Lite, Claude Haiku, Mistral small) for high-volume easy calls; workhorse models (Claude Sonnet, Nova Pro) for production reasoning and agents; frontier models (Claude Opus, Nova Premier) for the hardest steps; embedding models (Titan, Cohere) for RAG and search; and dedicated image (Nova Canvas, Stable Diffusion) and video (Nova Reel) generators. Most production systems route across several.
Exact model versions, context windows, and per-token prices vary by AWS Region and change continuously — always confirm the live list under Model access in the Bedrock console and current rates on the AWS pricing page. GenAI bills scale fast; CloudRoute routes you to AWS credits (Activate Portfolio up to $100K, Bedrock/GenAI POC $10K–$50K, GenAI Accelerator up to $1M) and vetted partners to build it — you pay $0.

orientation

IHow the Bedrock model catalog is organized

Amazon Bedrock's defining feature is breadth: a single, fully-managed API in front of foundation models from eight-plus providers, covering text and chat, reasoning, image generation, video generation, and embeddings. Before the catalog table, it helps to understand the four axes that actually distinguish one model from another.

The first axis is provider. Each provider on Bedrock brings a distinct lineage and set of strengths: Anthropic (the Claude family) for frontier reasoning, coding, and agentic tool use; Amazon with two lines — the newer Nova family tuned for very low cost and latency, and the older Titan family still useful for text and especially embeddings; Meta (Llama) for strong open-weight, freely fine-tunable models; Mistral for fast, price-efficient throughput; Cohere for retrieval, embeddings, and reranking; Stability AI for image generation; AI21 for the long-context Jamba family; and DeepSeek for cost-efficient open reasoning models. You are never locked to one — they all live behind the same authentication, billing, and (for chat) request schema.

The second axis is modality: what kind of input and output the model handles. Most of the catalog is text/chat (some with vision input). A subset is reasoning-tuned for harder multi-step problems. A separate group does image generation (Amazon Nova Canvas, Stability's Stable Diffusion / Stable Image), one does video generation (Amazon Nova Reel), and a distinct class produces embeddings — the numeric vectors that power semantic search and retrieval-augmented generation rather than human-readable text.

The third axis is the context window — how many tokens of input (prompt plus retrieved documents plus conversation history) the model can consider at once. Larger windows let you stuff in whole contracts, codebases, or long chat histories without aggressive chunking, but a bigger window costs more per call because you are paying for every input token. Context windows differ widely across the catalog and are raised over time, so the table gives representative orders of magnitude rather than guarantees.

The fourth axis is cost, which on Bedrock is almost always per token (per 1,000 input tokens and per 1,000 output tokens), set per model, with output priced several times higher than input. Image models bill per image and video models per second of output. Because the cheapest text models are roughly two orders of magnitude cheaper than frontier ones, which model you send each call to is the single biggest driver of a Bedrock bill — which is why this catalog frames every family by relative cost as well as capability.

read the catalog this way

Pick by job, not by brand. Cheap small model for the 90% of easy, high-volume calls (classification, extraction, routing); a workhorse model for production reasoning and agents; a frontier model only for the hardest 10%; an embedding model for anything involving search or RAG; and a dedicated image or video model when you need media. Almost all of it is reachable through the one Converse API.

the full reference

IIThe full model catalog — every family on Bedrock

This is the comprehensive map: every model family on Amazon Bedrock with its provider, representative models, modality, an approximate context-window order of magnitude, best-fit use, and relative cost on a $ (cheapest) → $$$$ (frontier) scale. Exact versions, context windows, and prices vary by Region and change continuously — confirm the live list under Model access in the Bedrock console and current rates on the AWS pricing page.

Two reading notes. First, relative cost is a position on a curve, not a rate: $ means an order of magnitude cheaper than $$$$, and the goal is to show where each family sits so you can route calls deliberately. Second, context windows are representative — they are raised over time and a few frontier models support very large windows in specific Regions or configurations, so treat the column as "roughly this scale," not a contractual ceiling. Each family links to its own deep page where one exists.

amazon bedrock model catalog · representative as of 2026 — verify live availability, context windows, and prices in the Bedrock console / AWS pricing page

Model family	Provider	Modality	Context window (approx.)	Best-fit use	Relative cost
Claude (Haiku · Sonnet · Opus)	Anthropic	Text / chat / vision / tools	~200K tokens	Frontier reasoning, coding, agents, long-context analysis, instruction-following	$ → $$$$
Nova (Micro · Lite · Pro · Premier)	Amazon	Text / chat / vision	~128K–300K tokens	Lowest cost & latency text; tiered from ultra-cheap to frontier	$ → $$$
Nova Canvas	Amazon	Image generation	n/a (prompt → image)	Native image generation, editing, variations	$$ (per image)
Nova Reel	Amazon	Video generation	n/a (prompt → video)	Short-form video generation from text/images	$$$ (per second)
Nova Act	Amazon	Agentic / actions	—	Agentic model for taking actions in software environments	$$
Titan Text	Amazon	Text / chat	~8K–32K tokens	Economical text generation and summarization	$
Titan Embeddings (Text · Multimodal)	Amazon	Embeddings (vectors)	input ~8K tokens	RAG, semantic search, clustering; text + image embeddings	$ (per 1K tokens)
Llama (instruct + smaller variants)	Meta	Text / chat / vision	~128K tokens	Open-weight, freely fine-tunable, self-hostable lineage	$ → $$
Mistral (Large + smaller models)	Mistral AI	Text / chat	~32K–128K tokens	Fast, cost-efficient high-throughput tasks	$ → $$
Command	Cohere	Text / chat	~128K tokens	Enterprise text generation tuned for retrieval workflows	$ → $$
Embed · Rerank	Cohere	Embeddings / rerank	input ~512 tokens/chunk	Enterprise search, retrieval, and result reranking	$ (per 1K tokens)
Stable Diffusion / Stable Image	Stability AI	Image generation	n/a (prompt → image)	Marketing, product, and creative imagery	$$ (per image)
Jamba family	AI21 Labs	Text / chat	~256K tokens	Long-context, hybrid-architecture text generation	$ → $$
DeepSeek reasoning models	DeepSeek	Text / reasoning	~128K tokens	Cost-efficient open reasoning alternatives	$ → $$

Availability differs by Region — a model live in us-east-1 or us-west-2 may not yet be in eu-central-1 or ap-southeast-1, and frontier models often land in US Regions first. Context windows are approximate orders of magnitude that are raised over time; relative cost ($→$$$$) shows position on the price/capability curve, not a rate. Cross-region inference profiles (a related capability) let Bedrock serve a request from one of several Regions within a geography to improve availability. Always confirm the current per-Region list under Model access in the console and live prices at aws.amazon.com/bedrock/pricing.

the catalog sliced by job

IIIThe catalog by modality — text, reasoning, image, video, embeddings

The single table above is sorted by family; in practice you choose by the job in front of you. Here is the same catalog sliced by what you are actually trying to produce, which is usually the faster way to land on a shortlist.

Text & chat (the bulk of the catalog)

Most Bedrock models are conversational text models, and they form a clear tier ladder. Small/cheap: Amazon Nova Micro and Lite, Claude Haiku, smaller Mistral and Llama variants — pennies-per-million-token economics for classification, extraction, routing, and simple drafting. Workhorse: Claude Sonnet, Amazon Nova Pro, Mistral Large, larger Llama — the default for production chat, reasoning, coding, and agents. Frontier: Claude Opus and Amazon Nova Premier for the hardest reasoning you escalate to deliberately. All of these answer through the Converse API with the same request shape.

Reasoning-tuned models

A growing slice of the catalog is tuned to "think" through multi-step problems before answering — Claude's extended-reasoning modes and DeepSeek's reasoning models are representative. They trade higher latency and token use for stronger performance on math, complex coding, planning, and multi-hop analysis. Use them for the genuinely hard problems and keep a cheaper model for the easy majority; reasoning models are not where you want to route a high-volume classification endpoint.

Image & video generation

For pixels rather than tokens, Bedrock offers image generation via Amazon Nova Canvas and Stability AI's Stable Diffusion / Stable Image family, and video generation via Amazon Nova Reel. These bill per image or per second of generated video rather than per token, and you typically call them through the lower-level InvokeModel API rather than Converse, since they are not conversational. They cover marketing creative, product imagery, image editing and variation, and short-form video from text or images.

Embeddings (the engine behind RAG and search)

Embedding models turn text (and, for multimodal embeddings, images) into vectors so you can do semantic search, clustering, deduplication, and retrieval-augmented generation. Amazon Titan Text Embeddings and Titan Multimodal Embeddings, plus Cohere Embed (with Cohere Rerank to reorder results), are the catalog's embedding workhorses. They are cheap and are billed per 1,000 input tokens with no output-token cost, because the output is a vector. Any time you build RAG, semantic search, or recommendations, an embedding model is doing the quiet heavy lifting underneath your chat model.

a decision rule

IVHow to choose a model (and why you usually pick several)

The most common Bedrock question is not "which platform" but "which model on the platform." There is no single right answer because the models sit at different points on a cost/capability curve — but there is a reliable decision procedure.

Start from the job, not the brand. Define the task (classify, extract, summarize, reason, write code, generate an image, embed for search), the quality bar it actually needs, the latency budget, and the volume. Then map those onto the curve: a high-volume, low-difficulty task wants the cheapest model that clears the quality bar; a low-volume, high-difficulty task can justify a frontier model; anything involving search or grounding needs an embedding model regardless of which chat model sits on top.

The second principle is that production systems rarely pick one model — they route. The dominant cost-control pattern sends the easy majority of calls (often 80–95%) to a small, cheap model and escalates only the hard remainder to a workhorse or frontier model, with an embedding model handling retrieval throughout. Because every chat model shares the Converse API, this routing is a configuration decision, not a rewrite: you change the modelId string per call path. A support assistant might run Nova Lite or Claude Haiku for intent classification, Claude Sonnet for the actual grounded answer, Titan or Cohere for retrieval, and reach for Claude Opus or Nova Premier only on escalations.

The third principle is to decide with evidence, not vendor benchmarks. Public leaderboards rarely reflect your data, your prompts, or your latency and cost constraints. Bedrock's built-in model evaluation lets you compare candidate models on your own datasets — using automated metrics or human review — before you commit. Run a short evaluation across two or three candidates from different tiers on a representative slice of your real traffic; the cheapest model that passes your bar is almost always the right production choice, with a more capable model wired in as the escalation path.

Match the model tier to the task difficulty — Cheap small model for easy, high-volume calls; workhorse for production reasoning and agents; frontier only for the hardest steps. Over-provisioning the model is the most common source of avoidable Bedrock spend.
Mind the context window — If you routinely feed whole documents, long histories, or large retrieved contexts, favor a larger-window model (Claude, Jamba, larger Nova) — but remember you pay per input token, so a big window plus a big prompt on every call adds up. Pair large windows with prompt caching.
Pick embeddings for anything search-shaped — RAG, semantic search, clustering, and recommendations are embedding-model jobs (Titan, Cohere Embed). The chat model on top is a separate choice from the embedding model underneath.
Prefer open-weight when you need fine-tuning freedom — Meta Llama and several open models can be fine-tuned freely and have a self-hostable lineage, which matters if portability or deep customization is a requirement.
Use a dedicated media model for pixels — Do not try to coerce a text model into image/video work — route to Nova Canvas / Stable Image for images and Nova Reel for video.
Evaluate on your own data before committing — Run a Bedrock model evaluation across a couple of candidate tiers on representative traffic. Choose the cheapest model that clears the bar and keep a stronger one as the escalation path.

turning a model on

VHow to access a model — Model access, IAM, Regions, and modelId

Models in the catalog are off by default. You explicitly request access per model, govern it with IAM, choose a Region, and then call the model with the shared Converse API (or InvokeModel for non-chat modalities). The catalog is large, but turning any one model on is the same short path.

Access is opt-in by design: AWS wants every model your organization can call to be a deliberate, auditable decision, and several models carry provider end-user license terms you must accept first. The flow has four parts.

Step 1 — Request model access (per model, per Region)

In the Bedrock console, open Model access and request the specific models you want (for example, Claude Sonnet plus Titan Text Embeddings). Most are granted within seconds to a few minutes; some require accepting the provider's EULA. Access is per Region — enabling a model in us-east-1 does not enable it in eu-west-1. This Model access page is also the authoritative live catalog for your account: it shows exactly which models and versions are available to you in that Region right now.

Step 2 — Authorize callers with IAM

Bedrock is a standard AWS service governed by IAM. Grant principals (users, roles, Lambda functions, ECS tasks) actions such as bedrock:InvokeModel, bedrock:InvokeModelWithResponseStream, and bedrock:Converse, scoped where useful to specific model ARNs so a service can call exactly the models it needs and nothing else. All API activity is recorded in CloudTrail, and you can capture full request/response payloads via Bedrock model-invocation logging to S3 or CloudWatch.

Step 3 — Choose a Region (and cross-region inference)

Pick a Region for data-residency, latency, and availability — your prompts and completions are processed in the Region you call, and not every model lives in every Region (frontier models often land in US Regions first). When you hit availability or throughput limits, cross-region inference profiles let a single request be served from one of several Regions within a geography (for example a US or EU profile) without you managing the routing. See the Bedrock overview for the wider setup.

Step 4 — Call the model by modelId

For chat models, use the Converse API: one consistent request/response schema across providers, with multi-turn conversation, system prompts, tool use, and streaming built in. Selecting a model from the catalog is just setting the modelId string — switching from one provider's model to another is typically a one-line change. Use the lower-level InvokeModel for non-conversational modalities (image, video, embeddings) or when you need a provider-specific parameter Converse does not expose. Copy the exact current model IDs from the Bedrock console; they are versioned and Region-specific.

one call structure, the whole text catalog

Because the Converse API normalizes the request and response across providers, the same call works for Claude, Llama, Mistral, Nova, Cohere Command, Jamba, and more — only the modelId changes. That is what makes routing across the catalog (cheap model for easy calls, frontier model for hard ones) a configuration decision rather than a rewrite. Model IDs are versioned and Region-specific — always copy the current ID from the console.

what the catalog costs

VIWhat the models cost — the shape of Bedrock pricing

You do not need a price for every model to reason about cost — you need the shape. Bedrock has no platform fee and no minimum; you pay per use, and the catalog spreads across roughly two orders of magnitude from cheapest to frontier. The figures below are representative as of 2026 to show relative scale, not audited rates — always confirm on the AWS Bedrock pricing page.

Text models are billed per 1,000 input tokens and per 1,000 output tokens, at a rate set by the specific model, with output almost always several times more expensive than input. Embedding models are billed per 1,000 input tokens with no output-token charge (the output is a vector). Image models bill per image and video models per second of generated output. On top of per-call pricing, the same four levers apply across the whole catalog: on-demand (default), batch (asynchronous, typically ~50% cheaper), provisioned throughput (reserved capacity for high steady volume and for serving fine-tuned models), and prompt caching (a steep discount on repeated context). The detail lives on the Bedrock pricing page.

The practical takeaway is the same one that drives model selection: because a small model can be ~50–100× cheaper per token than a frontier model, routing the easy majority of calls to a small model is usually a bigger cost win than any negotiation or discount. The price table below exists to make that spread concrete.

representative on-demand pricing by catalog tier · per 1K tokens, USD · illustrative 2026 ranges — verify on the AWS pricing page

Catalog tier (example)	Input / 1K tokens	Output / 1K tokens	Batch?	Typical role
Nova Micro (ultra-low-cost)	~$0.000035	~$0.00014	Yes (~50% off)	High-volume classification, routing
Nova Lite / Claude Haiku (small)	~$0.0002–$0.0008	~$0.0008–$0.004	Yes (~50% off)	Cheap chat, extraction, drafts
Mistral / Llama (mid-tier)	~$0.001–$0.003	~$0.003–$0.009	Yes (~50% off)	Balanced throughput tasks
Claude Sonnet / Nova Pro (workhorse)	~$0.003	~$0.015	Yes (~50% off)	Production reasoning, coding, agents
Claude Opus / Nova Premier (frontier)	~$0.015	~$0.075	Yes (~50% off)	Hardest reasoning, escalation only
Titan / Cohere embeddings	~$0.0001–$0.0002	n/a (vectors)	Yes	RAG, semantic search, clustering

Deliberately rounded representative ranges to show relative scale, not audited current rates; actual prices vary by exact model version and Region and change over time. Image models bill per image and video models per second. Confirm live pricing at aws.amazon.com/bedrock/pricing. Highest-leverage cost moves: route cheap calls to small models, run offline work as batch (~50% off), and turn on prompt caching for repeated context.

go deeper per model

VIIEach model family, in depth

This page is the catalog hub; each major family has its own reference page with the full version lineup, context windows, pricing, setup, and use-case detail. Use this section as the index into them.

Because the whole catalog shares one access model and one request schema, the deep pages differ mainly in the model lineups, context windows, and prices — the mechanics of enabling and calling them are identical to section V. Start from the family that fits your job, evaluate two or three candidates on your own data, and route accordingly.

Claude on Amazon Bedrock — Anthropic's Claude family — the Haiku/Sonnet/Opus tiers, long-context behavior, tool use and agents, setup, and cost. The most widely deployed models on Bedrock for serious reasoning. See Claude on Amazon Bedrock.
Amazon Nova — Amazon's own foundation-model family — Micro/Lite/Pro/Premier for text, Canvas for images, Reel for video, and Act for agentic actions — engineered for very low cost and latency. See Amazon Nova.
Amazon Titan — Amazon's earlier family — Titan Text and, most importantly, Titan Text and Multimodal Embeddings, the economical default for RAG and semantic search. See Amazon Titan.
The platform itself — How access, the Converse API, the feature suite (Agents, Knowledge Bases, Guardrails, fine-tuning, Flows, evaluation), and security fit together across every model. See Amazon Bedrock — the complete guide.
Pricing across the catalog — The full breakdown of on-demand, batch, provisioned throughput, prompt caching, and customization costs for the models above. See Amazon Bedrock pricing.
Meta Llama, Mistral, Cohere, Stability, AI21, DeepSeek — The remaining providers — open-weight Llama, fast Mistral, Cohere's retrieval and rerank models, Stability's image models, AI21's long-context Jamba, and DeepSeek's cost-efficient reasoning models — are all enabled and called the same way via Model access and the Converse/InvokeModel APIs described in section V.

funding the catalog

VIIIThe cost reality — and how AWS credits fund your Bedrock build

Choosing the right models keeps unit cost sane; the harder problem is the aggregate bill once a real application serves real traffic. This is where the catalog and the funding story meet.

GenAI is cheap per call and expensive in aggregate. A retrieval-augmented assistant that resends a large system prompt and retrieved context on every turn, across thousands of users, can move from a rounding error to five or six figures a month faster than teams expect — especially if every call hits a frontier model. The levers throughout this page are how you keep it sane: route the easy majority to small models, run offline work as batch (~50% off), turn on prompt caching for repeated context, and reserve provisioned throughput only once volume is steady and high.

The other lever is funding the bill with someone else's money — specifically AWS's. AWS runs credit programs designed precisely for teams building generative AI on Bedrock: Activate Portfolio (up to $100K) for institutionally-funded startups, dedicated Bedrock / GenAI proof-of-concept funding ($10K–$50K) for a defined GenAI build, and the competitive Generative AI Accelerator (up to $1M) for AI-first companies. These pools are largely partner-filed and effectively invisible on the public Activate page.

This is exactly what CloudRoute does: we route you to a vetted AWS partner who files the credit application and, if you need hands, who can build the Bedrock workload with you — choosing and wiring the right models from the catalog above. Because AWS funds both the credits and the partner engagement, you pay $0. See AWS credits for generative-AI startups, AWS PoC / Bedrock POC funding, and $100K AWS credits.

pick the right tier

The catalog as a cost/capability curve — which tier for which job

The fastest way to use this catalog is to map your task onto the tier curve and pick the cheapest tier that clears your quality bar, escalating only when you must. Relative cost is $ (cheapest) → $$$$ (frontier); exact rates live on the AWS pricing page.

Tier	Representative models	Relative cost	Reach for it when	Avoid it when
Ultra-cheap / small	Nova Micro/Lite, Claude Haiku, small Mistral/Llama	$	High-volume easy calls: classification, extraction, routing, simple drafts	The task needs genuine multi-step reasoning or careful long-context work
Workhorse	Claude Sonnet, Nova Pro, Mistral Large, larger Llama	$$ → $$$	Production chat, coding, agents, the default for most real features	The call is trivially easy (over-paying) — or genuinely frontier-hard
Frontier	Claude Opus, Nova Premier	$$$$	The hardest reasoning, planning, and code you escalate to deliberately	High volume or latency-sensitive paths — cost and latency add up fast
Reasoning-tuned	Claude extended reasoning, DeepSeek reasoning	$$ → $$$	Math, complex coding, multi-hop analysis worth extra latency/tokens	Simple or high-throughput tasks that do not need step-by-step thinking
Embeddings	Titan Text/Multimodal, Cohere Embed (+ Rerank)	$	Anything search-shaped: RAG, semantic search, clustering, recommendations	You need human-readable text out (that is a chat model's job)
Image generation	Nova Canvas, Stable Diffusion / Stable Image	$$ (per image)	Marketing, product, creative imagery; editing and variations	You need text, structured data, or video
Video generation	Nova Reel	$$$ (per second)	Short-form video from text or images	You need stills, text, or real-time interaction

A production system almost never picks one tier. The dominant pattern: a cheap small model for the 80–95% of easy calls, a workhorse as the default, a frontier model on escalations, and an embedding model for retrieval — all behind the one Converse API. Run a Bedrock model evaluation on your own data before committing to any tier.

building on bedrock?

Get AWS credits to fund your Bedrock workload — and a vetted partner to pick and wire the right models. You pay $0.

Get matched in 24h →

a recent match

A multi-model Bedrock build, funded by AWS credits — anonymized

inquiry · seed-stage b2b document-automation startup, US

Seed-stage B2B SaaS, 9 people, building a document-intake and Q&A product over customers' contracts; net-new to AWS

Situation: The team had picked "an LLM" but not a model strategy. Their prototype sent every request — from simple "what type of document is this?" classification to deep contract analysis — to a single frontier model, and the projected inference bill at launch volume was alarming. They also needed long-context handling for whole contracts, retrieval over a large document corpus, and no idea which Bedrock models to use for which step, with no ML infrastructure and no GPU budget.

What CloudRoute did: Routed within 19 hours to a US-East AWS partner with a GenAI track record. The partner mapped the workload onto the Bedrock catalog rather than one model: Nova Lite for high-volume document-type classification and routing, Titan Text Embeddings behind a Knowledge Base for retrieval over the contract corpus, Claude Sonnet via the Converse API for the grounded contract Q&A, and Claude's long-context handling for whole-document analysis — with prompt caching on the repeated system prompt and a Guardrail for PII. Switching models per step was just changing modelId. In parallel the partner filed a Bedrock/GenAI proof-of-concept credit application and an Activate Portfolio application.

Outcome: GenAI POC credits ($25K) approved in under 2 weeks, Portfolio ($100K) shortly after — the first ~6 months of inference were fully credit-funded. The multi-model routing cut projected inference cost by roughly 70% versus the single-frontier-model prototype, and the product shipped in 5 weeks. CloudRoute's commission was paid by the partner from AWS engagement funding; the customer paid $0.

time-to-match: < 24h · credits secured: $125K · projected inference cost cut ~70% via model routing · cost to customer: $0

faq

Common questions

Which models are available on Amazon Bedrock in 2026?

Bedrock hosts foundation models from eight-plus providers: Anthropic Claude (Haiku/Sonnet/Opus), Amazon Nova (Micro/Lite/Pro/Premier text, Canvas image, Reel video, Act agentic) and Amazon Titan (text + embeddings), Meta Llama, Mistral, Cohere (Command, Embed, Rerank), Stability AI (Stable Diffusion / Stable Image), AI21 (Jamba), and DeepSeek (reasoning models). Exact versions vary by AWS Region and change continuously — confirm the live list under Model access in the Bedrock console.

What is the best model on Amazon Bedrock?

There is no single best model — the catalog is a cost/capability curve and the right choice depends on the task, quality bar, latency budget, and volume. For frontier reasoning, coding, and agents, Anthropic's Claude (Opus/Sonnet) is the most widely deployed; for lowest cost and latency, Amazon Nova; for open-weight fine-tuning freedom, Meta Llama; for embeddings/RAG, Titan or Cohere. Most production systems route across several models. Run a Bedrock model evaluation on your own data to decide.

How do I choose which Bedrock model to use?

Start from the job (classify, extract, summarize, reason, code, generate an image, embed for search), then match it to the cheapest tier that clears your quality bar: small models (Nova Micro/Lite, Claude Haiku) for high-volume easy calls, a workhorse (Claude Sonnet, Nova Pro) for production reasoning, a frontier model (Claude Opus, Nova Premier) only for the hardest steps, and an embedding model (Titan, Cohere) for anything search-shaped. Validate with Bedrock model evaluation on a representative slice of your real traffic before committing.

What context windows do Bedrock models support?

Context windows vary widely by model and are raised over time. As representative orders of magnitude in 2026: Claude around ~200K tokens, AI21 Jamba around ~256K, Llama and Mistral and several Nova and DeepSeek models in the ~128K range, and older Titan Text models smaller (~8K–32K). Some frontier models support very large windows in specific Regions or configurations. Because you pay per input token, a larger window costs more per call — pair big windows with prompt caching. Confirm the current window for your exact model and Region in the documentation.

How are Bedrock models priced?

There is no platform fee. Text models are billed per 1,000 input tokens and per 1,000 output tokens at a rate set per model, with output several times more expensive than input; embedding models are billed per 1,000 input tokens with no output charge; image models bill per image and video models per second. Four levers apply across the catalog: on-demand (default), batch (~50% cheaper, asynchronous), provisioned throughput (reserved capacity), and prompt caching (discount on repeated context). Rates vary by model and Region — see the AWS Bedrock pricing page.

Can I switch between Bedrock models without rewriting my code?

For chat models, yes — almost always. The Converse API gives one consistent request/response schema across providers, so switching from one model to another is typically just changing the modelId string. That is what makes cost-saving model routing (cheap model for easy calls, frontier model for hard ones) a configuration decision rather than a rewrite. Non-conversational modalities like image, video, and embeddings use the lower-level InvokeModel API, where request shapes are model-specific.

Do all Bedrock models keep my data private?

Yes. Across every model in the catalog, your prompts and outputs are not used to train the underlying foundation models and are not shared with the model providers — AWS serves the providers' models on your behalf inside AWS. Content is encrypted in transit and at rest (with optional customer-managed KMS keys), processed only in the AWS Region you call, can be kept off the public internet via VPC endpoints (PrivateLink), and is governed by IAM with full CloudTrail audit logging.

How can I afford to run many models in production?

Two ways. First, control unit cost: route the easy majority of calls to small models, run offline work as batch (~50% off), enable prompt caching for repeated context, and reserve provisioned throughput only at high steady volume. Second, fund the bill with AWS credits — Activate Portfolio (up to $100K), Bedrock/GenAI POC ($10K–$50K), and the GenAI Accelerator (up to $1M). CloudRoute routes you to a vetted AWS partner who files the credit application and can build the multi-model workload with you; AWS funds the credits and the engagement, so you pay $0.

Pick the right models — and let AWS credits pay for them.

CloudRoute routes you to a vetted AWS partner who files your Bedrock/GenAI credit application (Activate Portfolio up to $100K, GenAI POC $10K–$50K, GenAI Accelerator up to $1M) and, if you need hands, picks and wires the right models from the catalog with you. AWS funds the credits and the engagement. You pay $0.

Get matched in 24h →→ see the data & AI persona detail

matched within< 24h

GenAI credit ceilingup to $1M

cost to you$0

Every model on Amazon Bedrock — the complete 2026 catalog.

IHow the Bedrock model catalog is organized

IIThe full model catalog — every family on Bedrock

IIIThe catalog by modality — text, reasoning, image, video, embeddings

Text & chat (the bulk of the catalog)

Reasoning-tuned models

Image & video generation

Embeddings (the engine behind RAG and search)

IVHow to choose a model (and why you usually pick several)

VHow to access a model — Model access, IAM, Regions, and modelId

Step 1 — Request model access (per model, per Region)

Step 2 — Authorize callers with IAM

Step 3 — Choose a Region (and cross-region inference)

Step 4 — Call the model by modelId

VIWhat the models cost — the shape of Bedrock pricing

VIIEach model family, in depth

VIIIThe cost reality — and how AWS credits fund your Bedrock build

The catalog as a cost/capability curve — which tier for which job

A multi-model Bedrock build, funded by AWS credits — anonymized

Common questions

Pick the right models — and let AWS credits pay for them.

Related guides