One managed API, every major foundation-model family: Anthropic Claude, Meta Llama, Mistral, Amazon Nova and Titan, Cohere, Stability AI, AI21, and DeepSeek. This is the reference catalog — every provider with its representative models, modality, context window, best-fit use, and relative cost in one scannable table — plus how to choose a model, how to turn it on, and a link into each model's deep page.
Amazon Bedrock's defining feature is breadth: a single, fully-managed API in front of foundation models from eight-plus providers, covering text and chat, reasoning, image generation, video generation, and embeddings. Before the catalog table, it helps to understand the four axes that actually distinguish one model from another.
The first axis is provider. Each provider on Bedrock brings a distinct lineage and set of strengths: Anthropic (the Claude family) for frontier reasoning, coding, and agentic tool use; Amazon with two lines — the newer Nova family tuned for very low cost and latency, and the older Titan family still useful for text and especially embeddings; Meta (Llama) for strong open-weight, freely fine-tunable models; Mistral for fast, price-efficient throughput; Cohere for retrieval, embeddings, and reranking; Stability AI for image generation; AI21 for the long-context Jamba family; and DeepSeek for cost-efficient open reasoning models. You are never locked to one — they all live behind the same authentication, billing, and (for chat) request schema.
The second axis is modality: what kind of input and output the model handles. Most of the catalog is text/chat (some with vision input). A subset is reasoning-tuned for harder multi-step problems. A separate group does image generation (Amazon Nova Canvas, Stability's Stable Diffusion / Stable Image), one does video generation (Amazon Nova Reel), and a distinct class produces embeddings — the numeric vectors that power semantic search and retrieval-augmented generation rather than human-readable text.
The third axis is the context window — how many tokens of input (prompt plus retrieved documents plus conversation history) the model can consider at once. Larger windows let you stuff in whole contracts, codebases, or long chat histories without aggressive chunking, but a bigger window costs more per call because you are paying for every input token. Context windows differ widely across the catalog and are raised over time, so the table gives representative orders of magnitude rather than guarantees.
The fourth axis is cost, which on Bedrock is almost always per token (per 1,000 input tokens and per 1,000 output tokens), set per model, with output priced several times higher than input. Image models bill per image and video models per second of output. Because the cheapest text models are roughly two orders of magnitude cheaper than frontier ones, which model you send each call to is the single biggest driver of a Bedrock bill — which is why this catalog frames every family by relative cost as well as capability.
Pick by job, not by brand. Cheap small model for the 90% of easy, high-volume calls (classification, extraction, routing); a workhorse model for production reasoning and agents; a frontier model only for the hardest 10%; an embedding model for anything involving search or RAG; and a dedicated image or video model when you need media. Almost all of it is reachable through the one Converse API.
This is the comprehensive map: every model family on Amazon Bedrock with its provider, representative models, modality, an approximate context-window order of magnitude, best-fit use, and relative cost on a $ (cheapest) → $$$$ (frontier) scale. Exact versions, context windows, and prices vary by Region and change continuously — confirm the live list under Model access in the Bedrock console and current rates on the AWS pricing page.
Two reading notes. First, relative cost is a position on a curve, not a rate: $ means an order of magnitude cheaper than $$$$, and the goal is to show where each family sits so you can route calls deliberately. Second, context windows are representative — they are raised over time and a few frontier models support very large windows in specific Regions or configurations, so treat the column as "roughly this scale," not a contractual ceiling. Each family links to its own deep page where one exists.
| Model family | Provider | Modality | Context window (approx.) | Best-fit use | Relative cost |
|---|---|---|---|---|---|
| Claude (Haiku · Sonnet · Opus) | Anthropic | Text / chat / vision / tools | ~200K tokens | Frontier reasoning, coding, agents, long-context analysis, instruction-following | $ → $$$$ |
| Nova (Micro · Lite · Pro · Premier) | Amazon | Text / chat / vision | ~128K–300K tokens | Lowest cost & latency text; tiered from ultra-cheap to frontier | $ → $$$ |
| Nova Canvas | Amazon | Image generation | n/a (prompt → image) | Native image generation, editing, variations | $$ (per image) |
| Nova Reel | Amazon | Video generation | n/a (prompt → video) | Short-form video generation from text/images | $$$ (per second) |
| Nova Act | Amazon | Agentic / actions | — | Agentic model for taking actions in software environments | $$ |
| Titan Text | Amazon | Text / chat | ~8K–32K tokens | Economical text generation and summarization | $ |
| Titan Embeddings (Text · Multimodal) | Amazon | Embeddings (vectors) | input ~8K tokens | RAG, semantic search, clustering; text + image embeddings | $ (per 1K tokens) |
| Llama (instruct + smaller variants) | Meta | Text / chat / vision | ~128K tokens | Open-weight, freely fine-tunable, self-hostable lineage | $ → $$ |
| Mistral (Large + smaller models) | Mistral AI | Text / chat | ~32K–128K tokens | Fast, cost-efficient high-throughput tasks | $ → $$ |
| Command | Cohere | Text / chat | ~128K tokens | Enterprise text generation tuned for retrieval workflows | $ → $$ |
| Embed · Rerank | Cohere | Embeddings / rerank | input ~512 tokens/chunk | Enterprise search, retrieval, and result reranking | $ (per 1K tokens) |
| Stable Diffusion / Stable Image | Stability AI | Image generation | n/a (prompt → image) | Marketing, product, and creative imagery | $$ (per image) |
| Jamba family | AI21 Labs | Text / chat | ~256K tokens | Long-context, hybrid-architecture text generation | $ → $$ |
| DeepSeek reasoning models | DeepSeek | Text / reasoning | ~128K tokens | Cost-efficient open reasoning alternatives | $ → $$ |
The single table above is sorted by family; in practice you choose by the job in front of you. Here is the same catalog sliced by what you are actually trying to produce, which is usually the faster way to land on a shortlist.
Most Bedrock models are conversational text models, and they form a clear tier ladder. Small/cheap: Amazon Nova Micro and Lite, Claude Haiku, smaller Mistral and Llama variants — pennies-per-million-token economics for classification, extraction, routing, and simple drafting. Workhorse: Claude Sonnet, Amazon Nova Pro, Mistral Large, larger Llama — the default for production chat, reasoning, coding, and agents. Frontier: Claude Opus and Amazon Nova Premier for the hardest reasoning you escalate to deliberately. All of these answer through the Converse API with the same request shape.
A growing slice of the catalog is tuned to "think" through multi-step problems before answering — Claude's extended-reasoning modes and DeepSeek's reasoning models are representative. They trade higher latency and token use for stronger performance on math, complex coding, planning, and multi-hop analysis. Use them for the genuinely hard problems and keep a cheaper model for the easy majority; reasoning models are not where you want to route a high-volume classification endpoint.
For pixels rather than tokens, Bedrock offers image generation via Amazon Nova Canvas and Stability AI's Stable Diffusion / Stable Image family, and video generation via Amazon Nova Reel. These bill per image or per second of generated video rather than per token, and you typically call them through the lower-level InvokeModel API rather than Converse, since they are not conversational. They cover marketing creative, product imagery, image editing and variation, and short-form video from text or images.
Embedding models turn text (and, for multimodal embeddings, images) into vectors so you can do semantic search, clustering, deduplication, and retrieval-augmented generation. Amazon Titan Text Embeddings and Titan Multimodal Embeddings, plus Cohere Embed (with Cohere Rerank to reorder results), are the catalog's embedding workhorses. They are cheap and are billed per 1,000 input tokens with no output-token cost, because the output is a vector. Any time you build RAG, semantic search, or recommendations, an embedding model is doing the quiet heavy lifting underneath your chat model.
The most common Bedrock question is not "which platform" but "which model on the platform." There is no single right answer because the models sit at different points on a cost/capability curve — but there is a reliable decision procedure.
Start from the job, not the brand. Define the task (classify, extract, summarize, reason, write code, generate an image, embed for search), the quality bar it actually needs, the latency budget, and the volume. Then map those onto the curve: a high-volume, low-difficulty task wants the cheapest model that clears the quality bar; a low-volume, high-difficulty task can justify a frontier model; anything involving search or grounding needs an embedding model regardless of which chat model sits on top.
The second principle is that production systems rarely pick one model — they route. The dominant cost-control pattern sends the easy majority of calls (often 80–95%) to a small, cheap model and escalates only the hard remainder to a workhorse or frontier model, with an embedding model handling retrieval throughout. Because every chat model shares the Converse API, this routing is a configuration decision, not a rewrite: you change the modelId string per call path. A support assistant might run Nova Lite or Claude Haiku for intent classification, Claude Sonnet for the actual grounded answer, Titan or Cohere for retrieval, and reach for Claude Opus or Nova Premier only on escalations.
The third principle is to decide with evidence, not vendor benchmarks. Public leaderboards rarely reflect your data, your prompts, or your latency and cost constraints. Bedrock's built-in model evaluation lets you compare candidate models on your own datasets — using automated metrics or human review — before you commit. Run a short evaluation across two or three candidates from different tiers on a representative slice of your real traffic; the cheapest model that passes your bar is almost always the right production choice, with a more capable model wired in as the escalation path.
Models in the catalog are off by default. You explicitly request access per model, govern it with IAM, choose a Region, and then call the model with the shared Converse API (or InvokeModel for non-chat modalities). The catalog is large, but turning any one model on is the same short path.
Access is opt-in by design: AWS wants every model your organization can call to be a deliberate, auditable decision, and several models carry provider end-user license terms you must accept first. The flow has four parts.
In the Bedrock console, open Model access and request the specific models you want (for example, Claude Sonnet plus Titan Text Embeddings). Most are granted within seconds to a few minutes; some require accepting the provider's EULA. Access is per Region — enabling a model in us-east-1 does not enable it in eu-west-1. This Model access page is also the authoritative live catalog for your account: it shows exactly which models and versions are available to you in that Region right now.
Bedrock is a standard AWS service governed by IAM. Grant principals (users, roles, Lambda functions, ECS tasks) actions such as bedrock:InvokeModel, bedrock:InvokeModelWithResponseStream, and bedrock:Converse, scoped where useful to specific model ARNs so a service can call exactly the models it needs and nothing else. All API activity is recorded in CloudTrail, and you can capture full request/response payloads via Bedrock model-invocation logging to S3 or CloudWatch.
Pick a Region for data-residency, latency, and availability — your prompts and completions are processed in the Region you call, and not every model lives in every Region (frontier models often land in US Regions first). When you hit availability or throughput limits, cross-region inference profiles let a single request be served from one of several Regions within a geography (for example a US or EU profile) without you managing the routing. See the Bedrock overview for the wider setup.
For chat models, use the Converse API: one consistent request/response schema across providers, with multi-turn conversation, system prompts, tool use, and streaming built in. Selecting a model from the catalog is just setting the modelId string — switching from one provider's model to another is typically a one-line change. Use the lower-level InvokeModel for non-conversational modalities (image, video, embeddings) or when you need a provider-specific parameter Converse does not expose. Copy the exact current model IDs from the Bedrock console; they are versioned and Region-specific.
Because the Converse API normalizes the request and response across providers, the same call works for Claude, Llama, Mistral, Nova, Cohere Command, Jamba, and more — only the modelId changes. That is what makes routing across the catalog (cheap model for easy calls, frontier model for hard ones) a configuration decision rather than a rewrite. Model IDs are versioned and Region-specific — always copy the current ID from the console.
You do not need a price for every model to reason about cost — you need the shape. Bedrock has no platform fee and no minimum; you pay per use, and the catalog spreads across roughly two orders of magnitude from cheapest to frontier. The figures below are representative as of 2026 to show relative scale, not audited rates — always confirm on the AWS Bedrock pricing page.
Text models are billed per 1,000 input tokens and per 1,000 output tokens, at a rate set by the specific model, with output almost always several times more expensive than input. Embedding models are billed per 1,000 input tokens with no output-token charge (the output is a vector). Image models bill per image and video models per second of generated output. On top of per-call pricing, the same four levers apply across the whole catalog: on-demand (default), batch (asynchronous, typically ~50% cheaper), provisioned throughput (reserved capacity for high steady volume and for serving fine-tuned models), and prompt caching (a steep discount on repeated context). The detail lives on the Bedrock pricing page.
The practical takeaway is the same one that drives model selection: because a small model can be ~50–100× cheaper per token than a frontier model, routing the easy majority of calls to a small model is usually a bigger cost win than any negotiation or discount. The price table below exists to make that spread concrete.
| Catalog tier (example) | Input / 1K tokens | Output / 1K tokens | Batch? | Typical role |
|---|---|---|---|---|
| Nova Micro (ultra-low-cost) | ~$0.000035 | ~$0.00014 | Yes (~50% off) | High-volume classification, routing |
| Nova Lite / Claude Haiku (small) | ~$0.0002–$0.0008 | ~$0.0008–$0.004 | Yes (~50% off) | Cheap chat, extraction, drafts |
| Mistral / Llama (mid-tier) | ~$0.001–$0.003 | ~$0.003–$0.009 | Yes (~50% off) | Balanced throughput tasks |
| Claude Sonnet / Nova Pro (workhorse) | ~$0.003 | ~$0.015 | Yes (~50% off) | Production reasoning, coding, agents |
| Claude Opus / Nova Premier (frontier) | ~$0.015 | ~$0.075 | Yes (~50% off) | Hardest reasoning, escalation only |
| Titan / Cohere embeddings | ~$0.0001–$0.0002 | n/a (vectors) | Yes | RAG, semantic search, clustering |
This page is the catalog hub; each major family has its own reference page with the full version lineup, context windows, pricing, setup, and use-case detail. Use this section as the index into them.
Because the whole catalog shares one access model and one request schema, the deep pages differ mainly in the model lineups, context windows, and prices — the mechanics of enabling and calling them are identical to section V. Start from the family that fits your job, evaluate two or three candidates on your own data, and route accordingly.
Choosing the right models keeps unit cost sane; the harder problem is the aggregate bill once a real application serves real traffic. This is where the catalog and the funding story meet.
GenAI is cheap per call and expensive in aggregate. A retrieval-augmented assistant that resends a large system prompt and retrieved context on every turn, across thousands of users, can move from a rounding error to five or six figures a month faster than teams expect — especially if every call hits a frontier model. The levers throughout this page are how you keep it sane: route the easy majority to small models, run offline work as batch (~50% off), turn on prompt caching for repeated context, and reserve provisioned throughput only once volume is steady and high.
The other lever is funding the bill with someone else's money — specifically AWS's. AWS runs credit programs designed precisely for teams building generative AI on Bedrock: Activate Portfolio (up to $100K) for institutionally-funded startups, dedicated Bedrock / GenAI proof-of-concept funding ($10K–$50K) for a defined GenAI build, and the competitive Generative AI Accelerator (up to $1M) for AI-first companies. These pools are largely partner-filed and effectively invisible on the public Activate page.
This is exactly what CloudRoute does: we route you to a vetted AWS partner who files the credit application and, if you need hands, who can build the Bedrock workload with you — choosing and wiring the right models from the catalog above. Because AWS funds both the credits and the partner engagement, you pay $0. See AWS credits for generative-AI startups, AWS PoC / Bedrock POC funding, and $100K AWS credits.
The fastest way to use this catalog is to map your task onto the tier curve and pick the cheapest tier that clears your quality bar, escalating only when you must. Relative cost is $ (cheapest) → $$$$ (frontier); exact rates live on the AWS pricing page.
| Tier | Representative models | Relative cost | Reach for it when | Avoid it when |
|---|---|---|---|---|
| Ultra-cheap / small | Nova Micro/Lite, Claude Haiku, small Mistral/Llama | $ | High-volume easy calls: classification, extraction, routing, simple drafts | The task needs genuine multi-step reasoning or careful long-context work |
| Workhorse | Claude Sonnet, Nova Pro, Mistral Large, larger Llama | $$ → $$$ | Production chat, coding, agents, the default for most real features | The call is trivially easy (over-paying) — or genuinely frontier-hard |
| Frontier | Claude Opus, Nova Premier | $$$$ | The hardest reasoning, planning, and code you escalate to deliberately | High volume or latency-sensitive paths — cost and latency add up fast |
| Reasoning-tuned | Claude extended reasoning, DeepSeek reasoning | $$ → $$$ | Math, complex coding, multi-hop analysis worth extra latency/tokens | Simple or high-throughput tasks that do not need step-by-step thinking |
| Embeddings | Titan Text/Multimodal, Cohere Embed (+ Rerank) | $ | Anything search-shaped: RAG, semantic search, clustering, recommendations | You need human-readable text out (that is a chat model's job) |
| Image generation | Nova Canvas, Stable Diffusion / Stable Image | $$ (per image) | Marketing, product, creative imagery; editing and variations | You need text, structured data, or video |
| Video generation | Nova Reel | $$$ (per second) | Short-form video from text or images | You need stills, text, or real-time interaction |
Situation: The team had picked "an LLM" but not a model strategy. Their prototype sent every request — from simple "what type of document is this?" classification to deep contract analysis — to a single frontier model, and the projected inference bill at launch volume was alarming. They also needed long-context handling for whole contracts, retrieval over a large document corpus, and no idea which Bedrock models to use for which step, with no ML infrastructure and no GPU budget.
What CloudRoute did: Routed within 19 hours to a US-East AWS partner with a GenAI track record. The partner mapped the workload onto the Bedrock catalog rather than one model: Nova Lite for high-volume document-type classification and routing, Titan Text Embeddings behind a Knowledge Base for retrieval over the contract corpus, Claude Sonnet via the Converse API for the grounded contract Q&A, and Claude's long-context handling for whole-document analysis — with prompt caching on the repeated system prompt and a Guardrail for PII. Switching models per step was just changing modelId. In parallel the partner filed a Bedrock/GenAI proof-of-concept credit application and an Activate Portfolio application.
Outcome: GenAI POC credits ($25K) approved in under 2 weeks, Portfolio ($100K) shortly after — the first ~6 months of inference were fully credit-funded. The multi-model routing cut projected inference cost by roughly 70% versus the single-frontier-model prototype, and the product shipped in 5 weeks. CloudRoute's commission was paid by the partner from AWS engagement funding; the customer paid $0.
time-to-match: < 24h · credits secured: $125K · projected inference cost cut ~70% via model routing · cost to customer: $0
CloudRoute routes you to a vetted AWS partner who files your Bedrock/GenAI credit application (Activate Portfolio up to $100K, GenAI POC $10K–$50K, GenAI Accelerator up to $1M) and, if you need hands, picks and wires the right models from the catalog with you. AWS funds the credits and the engagement. You pay $0.