for AWS partners →Get AWS credits to run Sonnet →

claude sonnet on amazon bedrock · the default workhorse · 2026

Claude Sonnet on Amazon Bedrock — the balanced default for production.

A complete, neutral reference for running Anthropic's Claude Sonnet on Amazon Bedrock in 2026: why Sonnet is the model most production traffic should default to (the quality/speed/cost balance), its model ID and how to enable access, Sonnet's own per-token pricing with the Batch and prompt-caching levers, the exact decision boundary for when to reach up to Opus or down to Haiku, the capabilities that matter on Sonnet (vision, tool use, long context, prompt caching, extended thinking), the production use cases where Sonnet lives, concrete cost tips — and how AWS credits make Sonnet usage $0.

Get AWS credits to run Sonnet →→ when Sonnet vs Opus vs Haiku

role

the default tier

balance

quality · speed · cost

access via

one AWS API

cost with credits

TL;DR

Claude Sonnet is the middle tier of the Claude family on Amazon Bedrock and the model most production systems should default to — it carries the large majority of real work at strong quality, with far lower cost and latency than Opus and meaningfully more capability than Haiku. As a representative 2026 reference point it sits near $3 per million input tokens and $15 per million output tokens on-demand: roughly an order of magnitude cheaper than Opus and an order of magnitude dearer than Haiku.
The whole point of Sonnet is the balance. Reach UP to Opus only for the genuinely hard requests — deep multi-step reasoning, complex analysis, high-stakes agentic steps — where its extra depth earns the higher price; reach DOWN to Haiku for the easy, high-volume, latency-sensitive work where Sonnet is overkill. Sonnet supports the capabilities that matter — vision, tool use, long context, prompt caching, extended thinking — and is the usual reasoning engine behind Bedrock Agents and Knowledge Bases.
On Bedrock, Sonnet usage is ordinary AWS spend, so AWS credits apply to it (unlike the direct Anthropic API). Combine the right tier per request with Batch (~50% off non-interactive work) and prompt caching (stop re-paying for a fixed system prompt) and the effective rate drops further still. AWS credits — Activate up to $100K, Bedrock/GenAI POC $10K–$50K, the GenAI Accelerator up to $1M — cover Sonnet inference entirely; CloudRoute routes you to the credit pool and a vetted AWS partner, so you pay $0.

the model

IWhat Claude Sonnet is — and why it is the default tier

Claude Sonnet is the middle of Anthropic's three-tier Claude family on Amazon Bedrock, sitting between Opus (the most capable, most expensive tier) and Haiku (the fastest, cheapest tier). It is engineered to be the model you reach for first: strong enough for the large majority of production work, fast enough for interactive use, and priced low enough to run at scale.

If you take only one thing from this page: Sonnet is the sensible default. The discipline that governs all Bedrock cost is "match the model to the task," and for most tasks the right match is Sonnet. It clears the quality bar for RAG assistants, customer-support agents, content generation, coding help, document analysis, and the reasoning behind most agents — while costing a fraction of Opus and responding fast enough to sit in a live request path. The two other tiers are the exceptions you reach for deliberately: Opus when a request is genuinely hard, Haiku when a request is genuinely easy and high-volume. Sonnet is the broad middle where most real traffic should live.

Sonnet runs on Bedrock exactly like every other foundation model there — behind Bedrock's single managed Converse API, governed by AWS IAM, callable over private networking with VPC endpoints, encryptable with your own KMS keys, and audited in CloudTrail. Your prompts and responses stay in your AWS account and the region you choose, and your inputs and outputs are not used to train the base model. This page focuses on Sonnet specifically; for the full Claude-on-Bedrock picture — the whole family, the deep "why Bedrock vs the Anthropic API direct" argument, and the shared mechanics — see the claude-on-amazon-bedrock sibling.

The reason Sonnet works so well as a default is that the route-and-escalate pattern is cheap to build on Bedrock. Because switching tiers is a one-line change to the model ID on the same Converse API, you can start every request on Sonnet and either drop the easy ones to Haiku or escalate the hard ones to Opus without rewriting anything. Sonnet anchors that pattern: it is the baseline the cheaper and more expensive tiers are measured against.

One caveat, stated once and meant throughout: exact Claude Sonnet version names, model IDs, regional availability, context-window size, and per-token prices all change as Anthropic ships new Sonnet generations and AWS updates Bedrock. The figures and identifiers here are representative as of 2026 to convey the balance and relative cost. Always confirm the current Sonnet model ID in the Bedrock model catalog and current rates on the AWS Bedrock pricing page before you build or budget.

where Sonnet sits

Opus = deepest reasoning, highest cost — the escalation target for hard problems. Sonnet = the balanced workhorse — your sensible default, where most production traffic belongs. Haiku = fast and cheap — the drop-down target for easy, high-volume work. Switching between them is a one-line model-ID change, so Sonnet-as-default with selective escalation is the standard pattern.

why it wins the middle

IIThe balance: quality, speed, and cost in one tier

Sonnet earns the "workhorse" label because it is the tier where the three things you actually trade off — answer quality, response latency, and price per token — meet at a point that fits the widest range of production workloads. None of the three is the absolute best in the family, and that is exactly the point.

Think of the Claude family as a single dial from cheap-and-fast to deep-and-expensive. Haiku is at one end, Opus at the other, and Sonnet is the deliberately-chosen middle. What makes the middle valuable is that most real work does not need the extreme of either end. A customer-support answer, a RAG response grounded in retrieved documents, a code suggestion, a document summary — these need solid reasoning and reliable instruction-following, which Sonnet provides, but they rarely need Opus-grade depth, and they usually need more nuance than you would trust to the cheapest tier.

Quality — strong, not maximal (and usually enough)

Sonnet delivers strong reasoning, reliable instruction-following, solid coding, and good long-document comprehension. It is not the deepest reasoner in the family — that is Opus — but the gap only matters on genuinely hard requests. For the broad middle of production work, Sonnet's quality clears the bar, and the right move is to spend the Opus premium only where a request actually exceeds Sonnet's reach. The practical test: if Sonnet answers a representative sample of your requests correctly, it is the right default and you escalate the exceptions, rather than paying Opus rates for everything to cover a minority of hard cases.

Speed — fast enough for the live request path

Sonnet is fast — faster than Opus, slower than Haiku — and crucially fast enough to sit in an interactive, user-facing path (chat, support, coding assistance) without feeling sluggish. With streaming (the converse_stream variant of the Converse API) the first tokens arrive quickly, which is what users actually perceive. Where Haiku's lower latency genuinely matters — very high-throughput pipelines, real-time triage at scale — that is a signal to drop those specific requests to Haiku, not to abandon Sonnet for the rest.

Cost — an order of magnitude under Opus

This is the lever that makes Sonnet the default. As a representative 2026 reference, Sonnet sits near $3 per million input tokens and $15 per million output tokens on-demand — roughly 10× cheaper than Opus-class pricing and roughly 10× more expensive than Haiku. Because output is priced several times higher than input across the family, concise outputs save real money on Sonnet just as they do elsewhere. The order-of-magnitude step on each side of Sonnet is the entire economic case for tiered routing: every request you correctly send to Haiku instead of Sonnet, and every request you avoid sending to Opus when Sonnet suffices, compounds across production volume.

getting in

IIIClaude Sonnet model ID and how to enable access

Before you can call Sonnet on Bedrock you have to request model access in your account — foundation models are off by default — and you have to invoke Sonnet by its model ID. Both are quick; neither costs anything until you actually call the model.

Enabling access. In the Bedrock console, open Model access, find Claude Sonnet, and request access. For Sonnet this is typically granted effectively immediately (some Claude models prompt for brief use-case details). There is no charge for enabling access — you pay only when you invoke. Access is per-account and per-region, so enable Sonnet in each region you will call from. If you want Bedrock to spread Sonnet traffic across a set of regions for better availability and throughput, use a cross-region inference profile (see the amazon-bedrock-pricing sibling for how that interacts with rates, and the cross-region-inference page in the cluster for the mechanics).

The Sonnet model ID. Sonnet is invoked by a model ID — a string identifying the provider, model, and version, namespaced under Anthropic (of the shape anthropic.claude-…sonnet… with a version suffix). You pass this ID to the Converse API to select Sonnet for a request; moving that same request to Haiku or Opus is just a change of the model-ID string. Because the ID advances with each Sonnet generation, do not hard-code a guessed value — read the current Sonnet ID from the Bedrock model catalog in the console (or list it via the API/CLI) and treat it as configuration rather than a literal in your code. Storing the three tier IDs (Haiku, Sonnet, Opus) as config is what makes route-and-escalate a one-line decision at runtime.

Permissions. The IAM principal making the call needs permission for the Bedrock invoke actions on the Sonnet model ARN (and, if you use a cross-region inference profile, permission on the profile). A least-privilege policy scoped to the specific Sonnet ARN you intend to call is the recommended posture; widen it to include the Haiku and Opus ARNs only if you actually route across tiers. Once access is granted and IAM is in place, you are ready to call Sonnet.

Open the Bedrock console → Model access → request access to Claude Sonnet (free; usually instant).
Enable Sonnet in each region you will call from; consider a cross-region inference profile for availability.
Get the current Sonnet model ID from the model catalog or via the API — do not hard-code a guessed version string.
Attach an IAM policy granting the Bedrock invoke actions on the specific Sonnet model ARN (least privilege).
Store the Haiku / Sonnet / Opus IDs as config so escalation is a one-line change; you are billed only on invocation.

what it costs

IVClaude Sonnet pricing on Bedrock — and the levers that lower it

Sonnet on Bedrock is billed per token: a rate per million input tokens (everything you send) and a higher rate per million output tokens (everything Sonnet generates), with output priced several times higher than input. The headline on-demand rate is only the starting point — two levers, Batch and prompt caching, can cut the effective rate substantially.

As a representative 2026 reference, Sonnet's on-demand rate sits near $3 per million input tokens and $15 per million output tokens — output about 5× input, which is why output length is a real cost lever. The table below places Sonnet against the tiers on either side so you can see the order-of-magnitude steps that make tiered routing pay; treat it as relative comparison, not an audited price sheet. Beyond the table sit two levers that change Sonnet's effective cost: Batch — submit non-interactive Sonnet work as an asynchronous job for roughly half the on-demand price — and prompt caching — stop re-paying full input price for a repeated prefix such as a long system prompt, instruction set, reference document, or tool definitions. On a Sonnet chatbot or RAG service with a large fixed context, caching can remove a large fraction of the input bill. For the full pricing model, including how Provisioned Throughput and cross-region inference interact with these, see the amazon-bedrock-pricing sibling.

Claude Sonnet vs the tiers on either side · representative on-demand rates · per 1M tokens · 2026

Claude tier	Input / 1M	Output / 1M	Relative to Sonnet	Cost position
Claude Haiku	$0.25	$1.25	~10× cheaper	Drop-down: high-volume / fast
Claude Sonnet	$3.00	$15.00	— (the default)	The workhorse baseline
Claude Opus-class	$15.00	$75.00	~5× dearer	Escalation: hardest reasoning

Representative 2026 figures for relative comparison only — confirm current Sonnet rates on the AWS Bedrock pricing page (they change with each generation and vary by region). Output is typically ~5× input. Batch (~50% off) and prompt caching (discounted repeated context) lower Sonnet's effective rate further. The order-of-magnitude step down to Haiku and up to Opus is the whole economic case for routing.

the three Sonnet cost levers

1) Tier discipline — keep Sonnet as the default; drop easy requests to Haiku, escalate only hard ones to Opus. 2) Batch — run non-interactive Sonnet work async for ~50% off. 3) Prompt caching — cache the fixed prefix (system prompt, reference docs, tool defs) so you stop re-paying input price for it. Stack all three and Sonnet's effective rate drops well below its on-demand sticker.

the core decision

VWhen Sonnet is right — and when to reach up to Opus or down to Haiku

The single most valuable skill in using the Claude family well is knowing the decision boundary around Sonnet: when Sonnet is the right default, when a request is hard enough to justify Opus, and when a request is easy enough to hand to Haiku. Getting this boundary right is what turns the family into a cost-efficient system instead of an expensive default.

Start every request on the assumption that Sonnet is the answer, then look for a specific reason to move off it in either direction. The reasons to move are concrete, not vibes — they are about the nature of the request, not a general preference for "the best model." Below are the boundaries in both directions.

Reach UP to Opus when the request is genuinely hard

Escalate from Sonnet to Opus when a request needs depth Sonnet cannot reliably reach: deep multi-step reasoning (long chains of dependent logic), complex analysis (synthesizing many sources, reconciling conflicting information), hard coding and refactoring (large, intricate changes across a codebase), research-style synthesis, and high-stakes agentic steps where a wrong action is expensive to undo. The test is empirical: if Sonnet's answers on a class of requests are wrong or shallow often enough to matter, that class is an escalation candidate. The discipline is to escalate the class of hard requests, not to promote everything to Opus — Opus is roughly 5× Sonnet's rate, so blanket use throws money at requests that did not need it. Many systems escalate dynamically: try Sonnet, and only on a low-confidence or failed-validation signal retry the same request on Opus. See the claude-opus-on-amazon-bedrock sibling for where Opus earns its premium.

Reach DOWN to Haiku when the request is easy and high-volume

Drop from Sonnet to Haiku when a request is simple enough that Sonnet is overkill and volume or latency make the savings worth it: classification, routing and triage, data extraction from structured-ish text, short-form generation, real-time chat where speed dominates, the cheap first stage of a tiered router, and bulk processing (especially via Batch). The test here is also empirical: if Haiku clears your quality bar on a class of requests, run that class on Haiku and pocket the ~10× saving. The mistake in this direction is leaving high-volume easy work on Sonnet out of caution — at scale that is the single most common source of avoidable Claude spend. See the claude-haiku-on-amazon-bedrock sibling for the easy-and-fast end of the family.

Stay on Sonnet for the broad middle

Everything that is neither genuinely hard nor trivially easy — which is most production work — stays on Sonnet. RAG answers, support replies, content drafts, routine coding help, document analysis, the reasoning inside most agents: this is Sonnet's home. The goal is not to minimize Sonnet usage but to make sure each request is on the right tier; Sonnet ends up carrying the majority precisely because the majority of requests fall in its band.

what it can do

VISonnet capabilities: vision, tool use, long context, prompt caching, extended thinking

Sonnet is not just strong text-in/text-out — it carries the capabilities that make modern GenAI applications possible, and several of them double as cost or quality levers. Availability of any specific capability can vary by Sonnet version, so confirm specifics for the exact Sonnet model you enable.

Vision (multimodal input)

Sonnet can accept images alongside text in a request and reason about them — reading charts and diagrams, extracting fields from screenshots and scanned documents, interpreting photos, and answering questions about visual content. Because Sonnet pairs solid vision with workhorse pricing, it is often the right tier for production document-understanding and visual-QA at volume, where Opus-grade vision would be more than the task needs. This collapses a class of OCR-plus-vision pipelines into a single Converse call.

Tool use (function calling)

Sonnet supports tool use: you describe tools (functions, APIs, database queries) and Sonnet decides when to call them and with what arguments, then folds the results into its answer. This is the foundation of agentic systems, and Sonnet's balance of reasoning and cost makes it the usual default reasoning engine behind Bedrock Agents — capable enough to plan and call tools reliably, cheap enough to run an agent loop (which can issue many model calls per task) without the cost spiraling. On Bedrock it is exposed through the Converse API's tool fields.

Long context

Sonnet offers a large context window — room for long documents, large chunks of a codebase, extended conversation history, and many retrieved passages in a single request. Long context simplifies RAG and document workflows: you can fit more relevant material in one call rather than over-engineering retrieval. It is also a cost dimension, since input is billed per token — a big context costs more on Sonnet just as on any tier, which is exactly where prompt caching earns its keep on Sonnet.

Prompt caching

When many Sonnet requests share a large common prefix — a long system prompt, a fixed instruction set, a reference document, tool definitions — prompt caching lets Bedrock cache that prefix so later requests are not billed full input price for it again. On a Sonnet-powered chatbot or RAG service with a large fixed context, this removes a large fraction of the input bill, and it is one of the most effective ways to lower Sonnet's effective rate. See the amazon-bedrock-pricing sibling for how caching shows up on the bill.

Extended thinking

Newer Sonnet generations support extended thinking — an explicit mode in which the model spends additional internal reasoning steps on a hard problem before answering, lifting quality on difficult math, multi-step analysis, and tricky coding. You can typically control the thinking budget. On Sonnet this is a useful middle path: for a request that is borderline-hard, turning on extended thinking can lift Sonnet over the bar without escalating all the way to Opus — though it trades some latency and output cost, so reserve it for the requests that need it rather than enabling it globally.

where Sonnet lives

VIIProduction use cases — the work Sonnet carries

The clearest way to see why Sonnet is the default is to look at the production workloads that land on it. These are the cases where Sonnet's quality is enough, its latency fits a live path, and its cost is low enough to run at scale — the broad middle of real GenAI applications.

RAG knowledge assistants — Sonnet is the typical default for retrieval-augmented generation: it reasons well over retrieved passages, follows grounding instructions reliably, and handles the large contexts that RAG produces — with prompt caching on the fixed system prompt keeping cost down. The reasoning engine behind most Bedrock Knowledge Bases deployments lives here.
Customer-support agents — Support automation needs solid reasoning, reliable tone and policy-following, and fast-enough responses for a live conversation — Sonnet's exact balance. It is capable enough to use tools (look up an order, check a policy) and cheap enough to handle high ticket volume, with the hard or sensitive cases escalated to Opus or a human.
Coding assistance — For most day-to-day coding help — explaining code, writing functions, fixing bugs, drafting tests — Sonnet clears the bar at a fraction of Opus cost and fast enough to sit in an IDE flow. Reserve Opus for large, intricate refactors and genuinely hard algorithmic work where its depth pays off.
Content generation and transformation — Drafting, rewriting, summarizing, translating, and reformatting at production quality — Sonnet handles the bulk of content workloads well, with concise-output discipline and caching keeping the per-item cost low. Truly throwaway bulk generation can drop to Haiku.
Document analysis and extraction — Summarizing long documents, answering questions over them, and extracting structured data (including from images via vision) — Sonnet's long context and multimodal input make this a single-call job for most documents. Very high-volume, simple extraction can move to Haiku; unusually complex analysis can escalate to Opus.
The reasoning engine inside agents — Most Bedrock Agents run Sonnet as the planner-and-tool-caller: capable enough to decompose a task and call tools reliably, cheap enough that a multi-call agent loop stays affordable. High-stakes individual steps within an agent can be routed to Opus while the loop stays on Sonnet.

spending less

VIIICost tips: getting the most out of Sonnet

Sonnet is already the cost-efficient default, but a handful of concrete practices compound to lower its effective rate substantially. None require a different model — they are about how you call Sonnet and what you route to it.

These are the levers, in roughly the order of impact for a typical production system. The first is about tier discipline around Sonnet; the rest are about calling Sonnet efficiently once a request is correctly on it.

Route, don't default-up — Keep Sonnet as the default and move only deliberately: drop easy, high-volume requests to Haiku (~10× cheaper) and escalate only genuinely hard ones to Opus. The single biggest avoidable cost is leaving easy bulk work on Sonnet (or worse, Opus) when Haiku would clear the bar. Because tiers are a one-line model-ID change, this routing is cheap to build and tune.
Cache the fixed prefix — If your Sonnet requests share a long system prompt, instruction block, reference document, or tool definitions, turn on prompt caching so you stop re-paying full input price for that prefix on every call. On chatbots and RAG with a large fixed context this is often the largest single saving available on Sonnet.
Batch the non-interactive work — Any Sonnet work that does not need an immediate answer — overnight document processing, bulk classification, backfills, evaluations — should go through Batch for roughly half the on-demand price. Reserve real-time (and streaming) Sonnet calls for the requests a user is actually waiting on.
Shorten outputs — Output is priced several times higher than input, so cap and tighten generations: set a sensible maxTokens, ask for concise answers, and avoid having Sonnet restate large inputs. On output-heavy workloads this is a direct, immediate saving.
Trim and structure the input — Long context costs per token. Retrieve only the passages a request actually needs rather than stuffing the window, and reuse a cached prefix for the stable parts. Good retrieval is a cost lever as much as a quality one on Sonnet.
Pay with credits, not runway — The largest lever of all for a startup: Sonnet on Bedrock is credit-eligible AWS spend, so the effective rate during the build can be $0. Combine the routing, caching, and Batch levers above with an AWS credit pool and you run Sonnet at production scale without touching your card — covered next.

how it becomes $0

IXHow AWS credits make running Sonnet $0

Everything above prices Sonnet if you pay AWS directly. For most startups and many companies the relevant number is different — AWS will frequently fund the build with credits, and Sonnet usage on Bedrock draws those credits down before it ever touches your card. This is the same tie-in that makes Bedrock beat the direct Anthropic API for a funded team: credits apply to Sonnet on Bedrock; they do not apply to the direct API.

Sonnet inference on Bedrock is ordinary AWS spend, so it is fully credit-eligible and credits apply automatically against your bill until exhausted — covering Sonnet tokens, any Batch and prompt-caching usage, plus the supporting services (Knowledge Bases, vector store, S3, logging) around it. The relevant pools: AWS Activate (general startup credits, commonly up to $100K for institutionally-funded startups); a dedicated Bedrock / Generative-AI POC pool ($10K–$50K) aimed at proving out a GenAI use case; and the competitive Generative AI Accelerator (awards up to $1M for a small cohort of AI-first startups). For a Sonnet-based product — a RAG assistant, a support agent, a content engine — these pools comfortably cover inference through the build and early scale.

The practical mechanic is that most of these pools are partner-filed — requested through the AWS Partner Network (the ACE program), not a public self-serve form — which is why teams route through an AWS partner rather than applying alone. That is the gap CloudRoute fills. CloudRoute matches you to the right credit pool for your stage and to a vetted AWS DevOps/ML partner who both files the credit application and helps build the Sonnet workload — the tiered router that keeps Sonnet as the default while dropping to Haiku and escalating to Opus, the RAG pipeline behind Knowledge Bases, the agent with tool use, prompt caching on the fixed context. The customer pays $0 — AWS funds the credit pool, AWS pays the partner through engagement-funding programs, and the partner pays CloudRoute a routing commission. You never see an invoice.

Put together with the routing, caching, and Batch levers above, the picture for a startup is: keep Sonnet as the default, route each request to the tier it actually needs, cache the repeated context, and run the whole thing on a $25K–$100K (or larger) credit pool while you find product-market fit — paying real money only once usage, and ideally revenue, has scaled past the credits. Related: AWS credits for generative-AI startups and Bedrock POC funding for the full credit mechanics.

place Sonnet

Claude Sonnet vs Opus vs Haiku on Bedrock — when to use which

The core decision in one place, anchored on Sonnet: the three Claude tiers compared on intelligence, speed, cost, and the work each is suited to. Default to Sonnet; reach up to Opus for hard requests and down to Haiku for easy high-volume ones. Representative 2026 figures for relative comparison, not quotes.

Tier	Intelligence	Speed	Relative cost (input/1M)	Role vs Sonnet	Best for
Claude Haiku	Good	Fastest	~$0.25 (~10× cheaper)	Drop down to it	High-volume, latency-sensitive, simple tasks; tier-1 of a router; Batch
Claude Sonnet	Strong	Fast	~$3 (the default)	The baseline	The production default: RAG, agents, support, coding, content, document analysis
Claude Opus-class	Deepest	Moderate	~$15 (~5× dearer)	Escalate up to it	Hardest reasoning, complex analysis, high-stakes agentic steps

Sonnet is the workhorse default; the order-of-magnitude step down to Haiku and the ~5× step up to Opus are why tiered routing pays. Batch (~50% off) and prompt caching lower every tier further. Switching tiers is a one-line model-ID change on the Converse API — so default-to-Sonnet-with-escalation is cheap to build.

the cost-efficient default, fully funded

Run Sonnet as your default — on AWS credits, not runway — with a partner who builds the router ($0)

Get matched in 24h →

a recent match

Sonnet became the default tier — and the bill went to $0 — anonymized

inquiry · Series-A vertical SaaS, Toronto

Series-A vertical SaaS, 30 people, running a customer-facing assistant on Claude — every request on a frontier tier

Situation: Their in-product assistant (RAG over customer documents plus a few tools) was already built on Claude, but every request — easy classification, routine Q&A, and the occasional hard analysis alike — was hitting the most capable, most expensive tier. The bill was climbing with usage and paid out of runway. They were already an AWS customer for the rest of the stack and wanted to bring the assistant under AWS billing and stop overpaying per request.

What CloudRoute did: CloudRoute matched them in under 24 hours to a US-East AWS partner with GenAI experience. The partner (1) made <strong>Sonnet the default tier</strong> for the assistant; (2) dropped the easy, high-volume requests (classification, routing, short extractions) to Haiku and reserved Opus for the genuinely hard analysis via a confidence-based escalation; (3) turned on prompt caching for the long fixed system prompt and Batched the overnight document processing; and (4) filed a Bedrock POC credit application plus an Activate Portfolio application to fund the workload.

Outcome: With Sonnet as the default plus selective Haiku/Opus routing, caching, and Batch, the modeled per-request cost dropped substantially versus running everything on the top tier — but the decisive change was that the spend now draws down AWS credits instead of runway, so the team pays $0 during the build and early scale. CloudRoute's commission was paid by the partner from AWS engagement funding, not by the customer.

default tier: Sonnet · routing: Haiku for easy, Opus for hard · levers: caching + Batch · credits secured: POC + Activate · out-of-pocket: $0

faq

Common questions

What is Claude Sonnet on Amazon Bedrock?

Claude Sonnet is the middle tier of Anthropic's Claude family on Amazon Bedrock — the balanced workhorse that sits between Opus (the most capable, most expensive tier) and Haiku (the fastest, cheapest tier). It is engineered to be the model most production traffic defaults to: strong reasoning and instruction-following, fast enough for interactive use, and priced low enough to run at scale. As of 2026 it sits near $3 per million input tokens and $15 per million output tokens on-demand. It runs behind Bedrock's Converse API with the same IAM/VPC/KMS/CloudTrail controls as every other Bedrock model.

Why is Sonnet considered the default Claude model?

Because most real work needs neither the extreme depth of Opus nor the bare-minimum capability of the cheapest tier — it needs the balanced middle. Sonnet clears the quality bar for RAG, support agents, coding help, content, and document analysis while costing roughly 10× less than Opus and responding fast enough for a live request path. The discipline is to start every request on Sonnet, then drop the easy ones to Haiku and escalate only the genuinely hard ones to Opus, rather than paying Opus rates for everything.

How much does Claude Sonnet cost on Bedrock?

Representative 2026 on-demand pricing is roughly $3 per million input tokens and $15 per million output tokens — output about 5× input, and roughly 10× cheaper than Opus-class while roughly 10× more expensive than Haiku. Two levers lower the effective rate: Batch (submit non-interactive Sonnet work async for about half the price) and prompt caching (stop re-paying full input price for a repeated prefix like a long system prompt). These are representative figures for relative comparison; confirm current rates on the AWS Bedrock pricing page, as they change with each generation and vary by region.

When should I use Opus instead of Sonnet?

Reach up to Opus when a request is genuinely hard: deep multi-step reasoning, complex analysis synthesizing many sources, large or intricate coding and refactoring, research-style synthesis, and high-stakes agentic steps where a wrong action is costly. The test is empirical — if Sonnet's answers on a class of requests are wrong or shallow often enough to matter, escalate that class. Opus is roughly 5× Sonnet's rate, so escalate the specific hard requests rather than promoting everything; many systems try Sonnet first and only retry on Opus when a confidence or validation check fails.

When should I use Haiku instead of Sonnet?

Drop down to Haiku when a request is simple enough that Sonnet is overkill and volume or latency make the saving worth it: classification, routing and triage, data extraction, short-form generation, real-time chat where speed dominates, the cheap first stage of a router, and bulk processing (especially via Batch). If Haiku clears your quality bar on a class of requests, run that class on Haiku for roughly a 10× saving. Leaving high-volume easy work on Sonnet is the most common source of avoidable Claude spend at scale.

What capabilities does Claude Sonnet support on Bedrock?

Sonnet supports vision (reasoning over images alongside text), tool use / function calling (the basis for agents — Sonnet is the usual reasoning engine behind Bedrock Agents), a large context window for long documents and history, prompt caching (discounting a repeated prefix like a long system prompt), and extended thinking on newer Sonnet generations (an explicit deeper-reasoning mode you can budget). All are exposed through the Converse API. Availability of a specific capability can vary by Sonnet version, so confirm specifics for the exact model you enable.

What is the Claude Sonnet model ID, and how do I enable access?

In the Bedrock console, open Model access, find Claude Sonnet, and request access — free and usually granted immediately. Access is per-account and per-region, so enable Sonnet in each region you call from and consider a cross-region inference profile for availability. Sonnet is invoked by a model ID — a string namespaced under Anthropic of the shape anthropic.claude-…sonnet… with a version suffix. Because IDs advance with each generation, read the current Sonnet ID from the Bedrock model catalog (or list it via the API/CLI) and treat it as configuration rather than hard-coding a guess. Then attach an IAM policy granting the Bedrock invoke actions on the Sonnet ARN.

How do I lower my Claude Sonnet bill on Bedrock?

In order of impact: (1) keep Sonnet as the default but route — drop easy high-volume requests to Haiku and escalate only hard ones to Opus; (2) turn on prompt caching for any fixed prefix (system prompt, reference docs, tool definitions); (3) run non-interactive work through Batch for about half price; (4) shorten outputs, since output is priced several times higher than input; (5) retrieve only the context a request needs. The biggest lever for a startup is paying with AWS credits instead of runway — Sonnet on Bedrock is credit-eligible spend.

Can AWS credits cover Claude Sonnet usage on Bedrock?

Yes — and this is the key advantage over the direct Anthropic API. Sonnet on Bedrock is ordinary AWS spend, so it is fully credit-eligible and credits apply automatically against your bill, covering Sonnet tokens, Batch and prompt-caching usage, and supporting services. The relevant pools are AWS Activate (up to $100K), a Bedrock/GenAI POC pool ($10K–$50K), and the GenAI Accelerator (up to $1M), and they are largely partner-filed via the AWS Partner Network. CloudRoute routes you to the right pool and a vetted AWS partner who files the application and builds the Sonnet workload — customer pays $0, AWS funds it.

Make Sonnet your default — on AWS's budget, not your runway

The direct Anthropic API bills your card; Claude Sonnet on Bedrock draws down AWS credits — under your existing IAM, VPC, and billing. CloudRoute routes you to the right credit pool (Activate up to $100K, Bedrock POC $10K–$50K, GenAI Accelerator up to $1M) and a vetted AWS partner who makes Sonnet the default, builds the Haiku/Opus routing, and turns on caching. Customer pays $0.

Get matched in 24h →→ see the AI-team persona detail

matched within< 24h

GenAI credit ceilingup to $1M

cost to you$0