A complete, neutral reference for running Anthropic's Claude models on Amazon Bedrock in 2026: the Claude family (Opus, Sonnet, Haiku) and which model fits which job; why run Claude through Bedrock instead of the Anthropic API direct (AWS-native IAM and VPC, consolidated billing, data residency — and AWS credits that apply); model IDs and how to enable model access; a per-model pricing table; the capabilities that matter (vision, tool use, long context, prompt caching, extended thinking); a minimal Converse API snippet; use cases per model; a brief Claude-vs-Nova-vs-GPT view; and how AWS credits make running Claude $0.
Anthropic's Claude is available natively on Amazon Bedrock — it is one of the foundation-model providers behind Bedrock's single managed API, alongside Amazon's own Nova and Titan, Meta Llama, Mistral, Cohere, and others. On Bedrock the Claude lineup follows a clear three-tier shape, and choosing the right tier is the most important cost-and-quality decision you will make.
The Claude family is organized as a ladder of three tiers, each a deliberate trade-off between intelligence, speed, and cost. As of 2026 the current generation on Bedrock spans: Claude Opus — the most capable tier, built for the hardest reasoning, complex multi-step analysis, and agentic work where quality dominates; Claude Sonnet — the balanced workhorse that handles the large majority of production traffic with strong reasoning at a fraction of Opus cost and latency; and Claude Haiku — the fast, low-cost tier for high-throughput, latency-sensitive, or simpler tasks. The exact version names and identifiers advance over time as Anthropic ships new generations; this page describes the durable tier structure and points you to the model catalog for current IDs.
The practical discipline is the same one that governs all Bedrock cost: match the model to the task. Use Haiku for the easy, high-volume requests; use Sonnet as the default for real work; reserve Opus for the genuinely hard requests where its extra reasoning earns its higher price. Many production systems route across all three — a cheap model triages and handles the bulk, escalating only the hard cases to a stronger one. That single pattern routinely cuts Claude spend several-fold with little quality loss.
Because all three are served through the same Bedrock API, switching between them is usually a one-line change to the model ID — which makes the route-and-escalate pattern easy to build and easy to tune. The capabilities (vision, tool use, long context, prompt caching, extended thinking) and the security model are consistent across the family, so you design once and choose the tier per request.
One caveat, stated once and meant throughout: exact model version names, model IDs, regional availability, context-window sizes, and per-token prices all change frequently as Anthropic ships new Claude generations and AWS updates Bedrock. The figures and identifiers here are representative as of 2026 to convey the structure and relative cost. Always confirm the current model IDs in the Bedrock model catalog and current rates on the AWS Bedrock pricing page before you build or budget.
Opus = deepest reasoning, highest cost — reserve for hard problems. Sonnet = the balanced workhorse — your sensible default for production. Haiku = fast and cheap — high-volume, latency-sensitive, simpler tasks. Switching between them is a one-line model-ID change, which is why tiered routing is the standard cost pattern.
Claude is available both directly from Anthropic's own API and through Amazon Bedrock. The model is the same Claude either way — so the choice is about everything around the model: security posture, billing, data control, and funding. For teams already on AWS, Bedrock usually wins on all four.
This is the central question this page exists to answer, and the honest framing is that it is an operational decision, not a model-quality one. You are choosing where Claude runs and how it is governed, billed, and paid for. Here is what running Claude through Bedrock buys you over the direct API:
When is the direct API the better pick? If you are not on AWS, want the very newest Claude release the day it ships (new generations sometimes appear on the direct API first), or rely on an Anthropic-specific feature before it lands on Bedrock, going direct can make sense. But for the large population of teams already building on AWS — and especially any startup eligible for AWS credits — Bedrock is usually the stronger home for Claude.
Before you can call Claude on Bedrock, you have to do one small but mandatory thing: request model access in your account. Foundation models on Bedrock are off by default; turning Claude on is a one-time, no-cost step in the console.
Enabling access. In the Bedrock console, open Model access, find the Claude models you want, and request access. For most Claude models this is granted effectively immediately; some models prompt for brief use-case details. There is no charge for enabling access — you only pay when you actually call a model. Access is per-account and per-region, so if you operate in several regions, enable Claude in each one you will call from. This is also where cross-region inference profiles come in: they let Bedrock route your Claude calls across a set of regions for better availability and throughput (see the amazon-bedrock-cross-region-inference sibling).
Model IDs. Every model on Bedrock is invoked by a model ID — a string identifying the provider, model, and version (Claude IDs are namespaced under Anthropic, e.g. an identifier of the shape anthropic.claude-…, with a version suffix). You pass this ID to the API to choose which model and tier answers a request, so moving a request from Haiku to Sonnet to Opus is just a change of model-ID string. Because IDs advance with each Claude generation, do not hard-code a guessed value — read the current ID from the Bedrock model catalog (console) or list it via the API/CLI, and treat it as configuration rather than a literal in your code.
Permissions. The IAM principal making the call needs permission for the relevant Bedrock invoke actions (and, if you use cross-region inference profiles, permission on the profile). A least-privilege policy scoped to the specific Claude model ARNs you intend to use is the recommended posture. Once access is granted and IAM is in place, you are ready to call Claude — the next section shows the minimal request.
Claude on Bedrock is billed per token: a rate per 1,000 input tokens (everything you send) and a higher rate per 1,000 output tokens (everything Claude generates), with output typically priced several times higher than input. The rate depends entirely on the tier — and the spread across tiers is wide enough that model choice is the dominant cost lever.
The table below gives representative 2026 on-demand rates for the three Claude tiers, shown per 1,000 and per 1,000,000 tokens (the per-million column is simply the per-1K figure × 1,000; providers increasingly quote per-million). Use it to rank the tiers by cost and sanity-check a budget — not as an audited price sheet. Two cost levers sit on top of these rates and are not shown in the table: Batch (submit non-interactive work as an async job for roughly half the on-demand price) and prompt caching (stop re-paying full input price for a repeated prefix like a long system prompt). Both can substantially lower the effective rate — see amazon-bedrock-pricing and amazon-bedrock-prompt-caching.
| Claude tier | Input / 1K | Output / 1K | Input / 1M | Output / 1M | Cost position |
|---|---|---|---|---|---|
| Claude Haiku | $0.00025 | $0.00125 | $0.25 | $1.25 | Cheapest — high-volume / fast |
| Claude Sonnet | $0.003 | $0.015 | $3.00 | $15.00 | Mid — the workhorse default |
| Claude Opus-class | $0.015 | $0.075 | $15.00 | $75.00 | Highest — hardest reasoning |
Claude on Bedrock is not just text-in/text-out. The current generation brings a set of capabilities that materially expand what you can build — and several of them are also cost or quality levers. Availability of any given capability can vary by Claude tier and version, so confirm specifics for your chosen model.
Claude models on Bedrock can accept images alongside text in a request and reason about them — reading charts and diagrams, extracting data from screenshots, interpreting documents and photos, and answering questions about visual content. This turns a large class of document-understanding and visual-QA problems into a single Converse call, no separate OCR or vision pipeline required.
Claude supports tool use: you describe tools (functions, APIs, database queries) and Claude decides when to call them and with what arguments, then incorporates the results into its answer. This is the foundation of agentic systems — letting Claude look things up, take actions, and ground its responses in live data. On Bedrock it is exposed through the Converse API's tool fields and underpins Bedrock Agents.
Claude models offer a large context window — room for very long inputs (lengthy documents, large codebases, extended conversation history, many retrieved chunks) in a single request. Long context simplifies RAG and document workflows because you can fit more relevant material in one call. It is also a cost consideration: input is billed per token, so a big context costs more — which is exactly where prompt caching earns its keep.
When many requests share a large common prefix — a long system prompt, a fixed instruction set, a reference document, or tool definitions — prompt caching lets Bedrock cache that prefix so subsequent requests are not billed full input price for it again. On chatbots and RAG with a large fixed context, this can cut the input portion of the bill by a large fraction. It is one of the most effective Claude cost levers; see the amazon-bedrock-prompt-caching sibling for the mechanics.
Newer Claude models support extended thinking — an explicit reasoning mode in which the model spends additional internal steps working through a hard problem before answering, improving quality on complex math, multi-step analysis, and difficult coding. You can typically control how much thinking budget to allow. It trades some latency and output cost for accuracy on genuinely hard tasks — best reserved for the requests that need it rather than turned on for everything.
The recommended way to call Claude (and any chat model) on Bedrock is the <strong>Converse API</strong> — a single, model-agnostic interface for multi-turn messages, system prompts, tool use, and multimodal input. Because it is model-agnostic, the same code calls Haiku, Sonnet, or Opus by changing only the model ID.
A minimal text request with the AWS SDK looks like the snippet below (Python / boto3). You create a Bedrock Runtime client, call converse with a model ID and a list of messages, and read the reply from the response. Swapping modelId between the Haiku, Sonnet, and Opus IDs is the only change needed to move a request across tiers — which is what makes tiered routing a one-line decision.
import boto3client = boto3.client("bedrock-runtime", region_name="us-east-1")resp = client.converse( modelId="anthropic.claude-<tier>-<version>", # from the model catalog messages=[{"role": "user", "content": [{"text": "Summarize this contract clause: ..."}]}], system=[{"text": "You are a concise legal assistant."}], inferenceConfig={"maxTokens": 512, "temperature": 0.2},)print(resp["output"]["message"]["content"][0]["text"])
That is the whole pattern for a basic call. From here you add multi-turn history (append assistant and user messages), tool use (a toolConfig describing your functions, with a streaming or multi-step loop to feed results back), vision (image blocks in the message content), and streaming (the converse_stream variant for token-by-token output). The same shape holds throughout — the API surface barely changes as you add capabilities, which is the point of Converse. The exact model ID string must come from the Bedrock model catalog; the placeholder above is illustrative, not a literal value.
The Converse API is model-agnostic: one interface for messages, system prompts, tool use, and images across every Bedrock model. Switching Claude tiers — or swapping Claude for Nova or Llama — is a change to modelId, not a rewrite. Build once, route per request.
The clearest way to think about the family is by mapping common production workloads to the cheapest tier that does them well. Start a request on the smallest tier that clears your quality bar and only escalate when it does not.
Claude is one strong choice among several on Bedrock. A quick, honest orientation versus the two other names people ask about — Amazon's own Nova family, and OpenAI's GPT models — without turning this into a full shootout.
Claude vs Amazon Nova. Nova is Amazon's own foundation-model family on Bedrock (Micro / Lite / Pro / Premier for text, plus Canvas for images and Reel for video), engineered for low cost and low latency. At the cheap end, Nova Micro and Lite undercut even Haiku and are excellent for very high-volume, simple, latency-sensitive work. Claude tends to be the pick when you want the strongest reasoning and the specific Claude behaviour and capability profile, particularly Sonnet and Opus for harder tasks. A common pattern is to mix them: Nova for the cheapest bulk path, Claude for the quality path — trivial to do behind one Converse API. See the amazon-nova sibling.
Claude vs GPT. The Bedrock-specific point is availability: Bedrock's value is its catalog of providers, and which exact third-party frontier models are offered changes over time, by region, and by provider agreement. Where multiple frontier families are available on Bedrock, the right choice is workload-specific — benchmark the candidates on your task and prompts rather than on leaderboard headlines, since relative strengths shift with each generation. The structural advantages of running on Bedrock (IAM/VPC, consolidated billing, data residency, and AWS credits) apply regardless of which model you land on. For a fuller treatment, see amazon-bedrock-vs-openai.
The meta-point: Bedrock lets you defer and revisit this choice cheaply. Because every model sits behind the same API, you can start on Claude, A/B a Nova or other model on part of your traffic, and re-tier as prices and capabilities move — without re-plumbing your application.
Everything above prices Claude on Bedrock if you pay AWS directly. For most startups and many companies the relevant number is different — because AWS will frequently fund the build with credits, and Claude usage on Bedrock draws those credits down before it ever touches your card. This is the single tightest fit in CloudRoute's whole offer.
Claude inference on Bedrock is ordinary AWS spend, so it is fully credit-eligible and credits apply automatically against your bill until exhausted — covering Claude tokens, any Batch and prompt-caching usage, plus the supporting services (Knowledge Bases, vector store, S3, logging). The relevant pools: AWS Activate (general startup credits, commonly up to $100K for institutionally-funded startups); a dedicated Bedrock / Generative-AI POC pool ($10K–$50K) aimed at proving out a GenAI use case; and the competitive Generative AI Accelerator (awards up to $1M for a small cohort of AI-first startups). This is precisely the advantage Bedrock has over the direct Anthropic API for a funded startup: credits apply to Claude on Bedrock; they do not apply to the direct API.
The practical mechanic is that most of these pools are partner-filed — requested through the AWS Partner Network (the ACE program), not a public self-serve form — which is why teams route through an AWS partner rather than applying alone. That is the gap CloudRoute fills. CloudRoute matches you to the right credit pool for your stage and to a vetted AWS DevOps/ML partner who both files the credit application and helps build the Claude workload — the tiered model router, the RAG pipeline behind Knowledge Bases, the agent with tool use, prompt caching on the fixed context. The customer pays $0 — AWS funds the credit pool, AWS pays the partner through engagement-funding programs, and the partner pays CloudRoute a routing commission. You never see an invoice.
Put together with the tiered-routing and caching levers above, the picture for a startup is: build on the Claude tier each request actually needs, cache the repeated context, and run the whole thing on a $25K–$100K (or larger) credit pool while you find product-market fit — paying real money only once usage, and ideally revenue, has scaled past the credits. Related: AWS credits for generative-AI startups and Bedrock POC funding for the full credit mechanics.
The core decision in one place: the three Claude tiers compared on intelligence, speed, cost, and the work each is suited to. Match the request to the cheapest tier that clears the bar and escalate from there. Representative 2026 figures for relative comparison, not quotes.
| Tier | Intelligence | Speed | Relative cost (input/1M) | Best for | Avoid for |
|---|---|---|---|---|---|
| Claude Haiku | Good | Fastest | ~$0.25 (lowest) | High-volume, latency-sensitive, simple tasks; tier-1 of a router; Batch | Hard multi-step reasoning |
| Claude Sonnet | Strong | Fast | ~$3 (mid) | The production default: RAG, agents, support, coding, content | Throwaway bulk where Haiku suffices |
| Claude Opus-class | Deepest | Moderate | ~$15 (highest) | Hardest reasoning, complex analysis, high-stakes agentic steps | High-volume simple work (wasteful) |
Situation: The product already used Claude via the Anthropic API direct and the bill was climbing as usage grew — paid out of runway, on a separate vendor invoice, with every request hitting a frontier tier. They were already an AWS customer for the rest of their stack and wanted (a) to bring Claude under their AWS security and billing, and (b) to stop paying for it out of pocket.
What CloudRoute did: CloudRoute matched them in under 24 hours to a US-East AWS partner with GenAI experience. The partner (1) migrated the Claude calls to Bedrock's Converse API — IAM auth, VPC endpoints, one consolidated bill; (2) introduced a tiered router (Haiku for the easy majority, Sonnet for real work, Opus only for the hard cases); (3) turned on prompt caching for the fixed system prompt; and (4) filed a Bedrock POC credit application plus an Activate Portfolio application to fund the workload.
Outcome: Claude now runs on Bedrock under the team's existing AWS IAM and billing, and the tiered router plus caching cut the modeled per-request cost substantially — but the decisive change was that the spend now draws down AWS credits instead of runway, so the team pays $0 during the build and early scale. CloudRoute's commission was paid by the partner from AWS engagement funding, not by the customer.
moved: direct API → Bedrock Converse · pattern: tiered routing + prompt caching · credits secured: POC + Activate · out-of-pocket: $0
The direct Anthropic API bills your card; Claude on Bedrock draws down AWS credits — under your existing IAM, VPC, and billing. CloudRoute routes you to the right credit pool (Activate up to $100K, Bedrock POC $10K–$50K, GenAI Accelerator up to $1M) and a vetted AWS partner who moves Claude onto Bedrock, builds the tiered router, and turns on caching. Customer pays $0.