aws bedrock · the complete 2026 guide

Amazon Bedrock — the complete 2026 guide.

One managed API, every major foundation model — Anthropic Claude, Meta Llama, Mistral, Amazon Nova, Cohere, Stability AI, AI21, DeepSeek — with no servers to run and enterprise privacy by default. This is the full reference: what Bedrock is, the entire model catalog, how access and the Converse API work, the real pricing model, the feature suite (Agents, Knowledge Bases, Guardrails, fine-tuning, Flows, evaluation), security and data residency, and exactly when Bedrock beats SageMaker or calling a model vendor directly.

model providers
8+
servers to manage
0
batch discount
~50%
data used to train base models
none
TL;DR
  • Amazon Bedrock is AWS's fully-managed service for calling many foundation models through one API — text, chat, image, video, and embeddings from Anthropic, Meta, Mistral, Amazon (Nova + Titan), Cohere, Stability AI, AI21, and DeepSeek. There are no GPUs to provision; you make an API call and pay per token. Your prompts and outputs are not used to train the base models and stay in your AWS account and region.
  • The platform is more than a model proxy: the Converse API gives one schema across every model, and Agents, Knowledge Bases (managed RAG), Guardrails, fine-tuning, model distillation, Flows, Prompt Management, and model evaluation turn raw model access into production GenAI. Pricing has four levers — on-demand per-token, batch (~50% cheaper), provisioned throughput (reserved capacity), and prompt caching (cuts repeat-context cost).
  • Use Bedrock when you want managed multi-model access with AWS-native security and the least operational overhead; use SageMaker when you need to own training, custom architectures, or non-foundation-model ML; call a vendor API directly only when you specifically want that vendor's newest model the day it ships and don't need AWS data governance. GenAI bills scale fast — CloudRoute routes you to AWS credits (Activate Portfolio up to $100K, Bedrock/GenAI POC $10K–$50K, GenAI Accelerator up to $1M) and vetted partners to build it; you pay $0.
the core idea

IWhat Amazon Bedrock is — and why AWS built it

Amazon Bedrock is a fully-managed service that lets you access a broad selection of high-performing foundation models — from Anthropic, Meta, Mistral, Amazon, Cohere, Stability AI, AI21, and DeepSeek — through a single API, without provisioning or managing any infrastructure. You send a prompt, you get a completion, you pay for the tokens. That is the whole surface area.

Before Bedrock, putting a large language model into production on your own meant one of two uncomfortable paths. Either you rented GPU instances, downloaded open weights, and took on the full operational burden of serving — autoscaling, sharding, quantization, the perpetual scarcity of accelerators — or you signed a contract with a single model vendor and routed your most sensitive data out to their API. Bedrock collapses both problems. There are no servers, GPUs, or clusters for you to run: AWS operates the inference fleet behind the API. And because Bedrock runs inside AWS, your data never leaves your control — prompts and completions are not used to train the underlying foundation models, are not shared with the model providers, and stay within your AWS account and the AWS Region you call.

The second reason Bedrock exists is choice without lock-in. No single foundation model is best at everything. Claude is exceptional at long-context reasoning, careful instruction-following, and agentic tool use; Llama gives you strong open-weight performance you can fine-tune freely; Mistral is fast and cost-efficient for high-throughput tasks; Amazon Nova is built for very low cost and latency; Stability and Amazon Nova Canvas generate images; Cohere and Titan produce the embeddings that power search and RAG. Bedrock lets you mix and match all of them behind one authentication model, one billing relationship, and — through the Converse API — one request schema. Switching from one model to another is often a one-line change to a model ID rather than a rewrite.

The third reason is that Bedrock is not just a model gateway — it is an application platform. On top of raw inference, AWS layers managed building blocks that most teams would otherwise have to assemble themselves: Agents that plan and call your APIs, Knowledge Bases that implement retrieval-augmented generation end to end, Guardrails that filter harmful content and redact PII, fine-tuning and distillation to specialize models, Flows to orchestrate multi-step generative workflows visually, Prompt Management to version prompts, and model evaluation to compare candidates on your own data. Each is covered in section V.

Put simply: Bedrock is the managed, secure, multi-model foundation for building generative-AI applications on AWS. If Amazon EC2 abstracted away the data center, Bedrock abstracts away the model-serving stack — you reason about prompts, tokens, and outcomes, not about GPUs.

the one-sentence definition

Amazon Bedrock = a single, fully-managed API for many foundation models (Claude, Llama, Mistral, Nova, Titan, Cohere, Stability, AI21, DeepSeek), with serverless inference, AWS-native security, and a built-in suite for RAG, agents, guardrails, and customization — you never touch a GPU, and your data is never used to train the base models.

every provider on the platform

IIThe full model catalog — every provider on Bedrock

Bedrock's defining feature is breadth. The catalog spans text and chat models, image and video generators, and embedding models, from eight-plus providers. The exact model versions available evolve continuously and vary by AWS Region — the table below is a representative map of the catalog as of 2026; always confirm the live list in the Bedrock console under Model access.

A few patterns are worth internalizing before reading the table. Anthropic's Claude family is the most widely deployed set of models on Bedrock for serious enterprise reasoning, coding, and agentic work, with a tiered lineup that trades cost against capability. Amazon's own families come in two lines: the newer Nova family (Micro, Lite, Pro, Premier for text; Canvas for images; Reel for video; plus the agentic Nova Act), engineered for very low price and latency, and the older Titan family, still useful for text and especially for embeddings. Meta Llama gives you strong open-weight models you are free to fine-tune. Mistral targets speed and price efficiency. Cohere is known for retrieval and embeddings; Stability AI for image generation; AI21 for its Jamba long-context models; and DeepSeek brings cost-efficient open reasoning models into the managed catalog.

The strategic point for builders: you do not have to pick one provider for your whole application. A common production pattern routes cheap, high-volume calls (classification, extraction, routing) to a small fast model like Nova Lite, Claude Haiku, or Mistral, and escalates only the hard reasoning steps to a frontier model like Claude Sonnet/Opus or Nova Premier — all through the same Converse API, all on one bill.

amazon bedrock model catalog by provider · representative as of 2026 — check the Bedrock console for live availability
ProviderRepresentative modelsModalityBest-fit use
AnthropicClaude Opus, Claude Sonnet, Claude HaikuText / chat / vision / toolsFrontier reasoning, coding, agents, long-context analysis
MetaLlama (instruct + smaller variants)Text / chat / visionOpen-weight, freely fine-tunable, self-hostable lineage
MistralMistral Large + smaller modelsText / chatFast, cost-efficient high-throughput tasks
Amazon NovaNova Micro / Lite / Pro / Premier; Canvas; Reel; ActText / image / video / agenticLowest cost + latency; native image & video generation
Amazon TitanTitan Text; Titan Text Embeddings; Titan Multimodal EmbeddingsText / embeddingsEmbeddings for RAG & search; economical text
CohereCommand (text); Embed (embeddings); RerankText / embeddings / rerankEnterprise search, retrieval, reranking
Stability AIStable Diffusion / Stable Image familyImage generationMarketing, product, and creative imagery
AI21 LabsJamba familyText / chatLong-context, hybrid-architecture text generation
DeepSeekDeepSeek reasoning modelsText / reasoningCost-efficient open reasoning models
Availability differs by Region — a model live in us-east-1 or us-west-2 may not yet be in eu-central-1 or ap-southeast-1. Cross-region inference profiles (a related capability) let Bedrock route a request across Regions in a geography to improve availability and throughput. Always verify the current per-Region list in the console.
getting from zero to a first call

IIIHow access works — model access, IAM, Regions, and the Converse API

Bedrock is off by default. You do not get models the instant you open the console; you explicitly request access per model, govern it with IAM, choose a Region, and then call models with either the modern Converse API or the lower-level InvokeModel API. Here is the path from a fresh account to a first completion.

The reason access is opt-in is governance. AWS wants every model your organization can call to be a deliberate, auditable decision, and several models carry provider end-user license terms you must accept. So enabling Bedrock is a four-part setup.

Step 1 — Enable model access

In the Bedrock console, open Model access and request the specific models you intend to use (for example, Claude Sonnet and Titan Text Embeddings). For most models access is granted within seconds to a few minutes; some require accepting the provider's EULA first. You enable models per Region — granting access in us-east-1 does not grant it in eu-west-1.

Step 2 — Authorize with IAM

Bedrock is a standard AWS service governed by IAM. You grant principals (users, roles, Lambda functions, ECS tasks) actions such as bedrock:InvokeModel, bedrock:InvokeModelWithResponseStream, and bedrock:Converse, and you can scope policies down to specific model ARNs. Least-privilege here means a service can call exactly the models it needs and nothing else. All API activity is logged to CloudTrail, and you can capture full request/response payloads with Bedrock model-invocation logging to S3 or CloudWatch.

Step 3 — Pick a Region (and consider cross-region inference)

Choose a Region for data-residency, latency, and model-availability reasons — your prompts and completions are processed in the Region you call. Not every model lives in every Region, and frontier models often land first in US Regions. To smooth out capacity and availability, Bedrock offers cross-region inference profiles, which let a single request be served from one of several Regions within a geography (e.g. a US or EU profile) without you managing the routing. Pick the Region first; reach for cross-region inference when you hit availability or throughput limits.

Step 4 — Call a model: Converse vs InvokeModel

Bedrock exposes two ways to call a model. InvokeModel is the original, lower-level API: you send a raw body whose JSON shape is specific to each provider, and you parse a provider-specific response. It gives maximum control but means provider-specific code. The Converse API is the modern, recommended path: one consistent request and response schema across every chat model, with first-class support for multi-turn conversations, system prompts, tool use (function calling), and streaming. With Converse, switching models is usually just changing the modelId string. Use Converse for chat and agentic applications; reach for InvokeModel only for non-conversational modalities (such as image or embedding endpoints) or when you need a provider-specific parameter Converse does not expose.

A minimal Converse call in Python (using the AWS SDK, boto3) looks like this:

a minimal Converse API call (python / boto3)

import boto3
brt = boto3.client("bedrock-runtime", region_name="us-east-1")
resp = brt.converse(
  modelId="anthropic.claude-sonnet",  # swap this string to switch models
  messages=[{"role": "user", "content": [{"text": "Summarize our refund policy in 3 bullets."}]}],
  inferenceConfig={"maxTokens": 512, "temperature": 0.2},
)
print(resp["output"]["message"]["content"][0]["text"])

The same call structure works for Claude, Llama, Mistral, Nova, Cohere, and more — only modelId changes. Model IDs shown are illustrative; copy the exact current IDs from the Bedrock console.

how you actually pay

IVThe pricing model — on-demand, batch, provisioned throughput, and prompt caching

Bedrock has no platform fee and no minimum. You pay for what you run, and there are four distinct pricing levers — getting them right is the difference between a GenAI bill that is trivial and one that is alarming. Token prices are per-model and per-Region; the figures below are representative as of 2026 and exist to show relative scale — always check the AWS Bedrock pricing page for current rates.

On-Demand is the default. You are billed per 1,000 input tokens and per 1,000 output tokens, at a rate set by the specific model, with no commitment. Output tokens almost always cost several times more than input tokens. Small models (Nova Micro/Lite, Claude Haiku, Mistral small) are an order of magnitude cheaper than frontier models (Claude Opus, Nova Premier), which is why model routing matters so much for cost. For image and video models, you pay per image or per second of generated video rather than per token.

Batch inference processes large sets of prompts asynchronously and is typically priced around 50% lower than on-demand. If your workload is not latency-sensitive — overnight document processing, bulk classification, dataset enrichment, embedding a corpus — batch is the single biggest cost lever available and should be the default for offline jobs.

Provisioned Throughput reserves dedicated model capacity, billed hourly (with deeper discounts on 1-month and 6-month commitments) and measured in model units. It guarantees consistent throughput and latency and is required for serving custom (fine-tuned) models. It only makes economic sense at high, steady volume; for spiky or low volume, on-demand is cheaper.

Prompt caching lets you mark stable, repeated context — a long system prompt, a large document, a tool schema — so Bedrock caches it across calls and you are not billed full input-token price to re-process the same tokens every request. For agents and chat apps that resend a big system prompt or document on every turn, cached input tokens are billed at a steep discount, which can cut total input cost substantially. On top of these four levers, expect additional charges for model customization (fine-tuning/distillation training plus storage of the custom model) and for related services your app uses, such as the vector store behind a Knowledge Base.

representative bedrock on-demand token pricing · per 1K tokens, USD · illustrative 2026 ranges — verify on the AWS pricing page
Model tier (example)Input / 1K tokensOutput / 1K tokensBatch?Typical role
Nova Micro (ultra-low-cost)~$0.000035~$0.00014Yes (~50% off)High-volume classification, routing
Nova Lite / Claude Haiku (small)~$0.0002–$0.0008~$0.0008–$0.004Yes (~50% off)Cheap chat, extraction, drafts
Mistral / mid-tier~$0.001–$0.003~$0.003–$0.009Yes (~50% off)Balanced throughput tasks
Claude Sonnet / Nova Pro (workhorse)~$0.003~$0.015Yes (~50% off)Production reasoning, coding, agents
Claude Opus / Nova Premier (frontier)~$0.015~$0.075Yes (~50% off)Hardest reasoning, escalation only
Titan / Cohere embeddings~$0.0001–$0.0002n/a (vectors)YesRAG, semantic search, clustering
Numbers are deliberately rounded representative ranges to show relative scale, not audited current rates; actual prices vary by exact model version and Region and change over time. Image models bill per image and video models per second. Always confirm live pricing at aws.amazon.com/bedrock/pricing. The single highest-leverage cost moves: route cheap calls to small models, run offline work as batch (~50% off), and turn on prompt caching for repeated context.
beyond raw inference

VThe feature suite — what turns model access into production GenAI

The reason teams standardize on Bedrock rather than a bare model API is the managed application layer around the models. Each capability below would otherwise be a project of its own; on Bedrock each is a managed feature you configure rather than build.

These features compose. A typical production assistant uses a Knowledge Base for grounding, wraps the model in Guardrails, exposes actions through an Agent, manages its system prompts via Prompt Management, and was chosen using model evaluation — all on Bedrock, all under one IAM and billing boundary. That composability, not any single model, is the platform's real moat.

  • Agents — Bedrock Agents let a model plan multi-step tasks and call your own APIs and AWS Lambda functions to actually do things — look up an order, file a ticket, run a query — with the orchestration, prompt construction, and tool invocation managed for you. See Amazon Bedrock Agents.
  • Knowledge Bases — A fully-managed retrieval-augmented generation (RAG) pipeline: point it at documents in S3, and Bedrock chunks them, generates embeddings, stores them in a vector database, and at query time retrieves the relevant passages and grounds the model's answer in them — with citations. See Bedrock Knowledge Bases and RAG on AWS.
  • Guardrails — A configurable safety layer that filters harmful content, blocks denied topics, redacts or blocks PII, and adds contextual-grounding checks to reduce hallucination — applied consistently across any model. See Bedrock Guardrails.
  • Fine-tuning & model distillation — Customize a base model on your own labeled data to improve quality on your domain, or distill a large model's behavior into a smaller, cheaper one. Custom models are served via Provisioned Throughput. See Bedrock fine-tuning.
  • Flows — A visual, drag-and-drop builder for chaining prompts, models, Knowledge Bases, Agents, and logic into a single deployable generative workflow — without wiring the orchestration code by hand. See Bedrock Flows.
  • Prompt Management — Create, version, test, and store prompts as first-class resources so prompt changes are tracked and reusable across teams and applications, rather than buried in code.
  • Model evaluation — Compare candidate models on your own datasets using automated metrics or human review, so the choice of model is grounded in measured performance on your tasks rather than vendor benchmarks.
enterprise readiness

VISecurity, privacy, compliance, and data residency

For most enterprises the deciding factor in favor of Bedrock is not a benchmark — it is governance. Bedrock is designed so that adopting generative AI does not mean loosening your data controls.

Your data is yours. Prompts, completions, and any data you submit to Bedrock are not used to train the underlying foundation models and are not shared with the model providers. Anthropic, Meta, Mistral, Cohere and the rest do not see your traffic; AWS serves their models on your behalf inside AWS. Your content is encrypted in transit and at rest, and you can use your own AWS KMS keys for encryption.

It stays in your Region. A request to Bedrock is processed in the AWS Region you call, which is how you satisfy data-residency requirements — keep EU data in an EU Region, and so on. When you use cross-region inference, requests are routed only within a defined geography (for example, EU Regions for an EU profile), preserving the residency boundary you chose.

It plugs into your existing controls. Bedrock runs in your VPC reach via AWS PrivateLink (VPC endpoints), so traffic need never traverse the public internet. Access is governed by IAM, every call is recorded in CloudTrail, and you can log full invocation payloads for audit. Guardrails add a content- and PII-safety layer on top.

Compliance coverage is broad. Bedrock is included in AWS's major compliance programs and attestations — commonly SOC 1/2/3, ISO 27001, HIPAA eligibility, PCI DSS, FedRAMP and others depending on Region — which lets regulated industries (financial services, healthcare, public sector) build on it within their existing compliance posture. Always confirm the current scope for your specific Region and program in AWS Artifact, since coverage expands over time.

why this matters for the buy decision

The combination of no training on your data, in-Region processing, private VPC networking, IAM + CloudTrail governance, and broad compliance attestations is precisely what lets a bank, hospital, or government agency ship generative AI at all. It is the most common reason teams choose Bedrock over sending sensitive data to a public model endpoint.

decision guidance

VIIBedrock vs SageMaker vs calling model APIs directly

Bedrock is not the only way to run AI on AWS, and it is not always the right one. The honest decision rule comes down to how much of the ML stack you want to own and how much your data governance matters.

The three options answer three different questions. Bedrock answers "I want to use existing foundation models through a managed, secure API with the least operational overhead." Amazon SageMaker answers "I need to own the ML lifecycle — bring my own model or architecture, run custom training, control the serving infrastructure, or do classical (non-foundation-model) ML." Calling a model vendor's API directly (e.g. straight to a provider's cloud) answers "I want this specific vendor's newest model the moment it ships and I am comfortable with my data leaving AWS."

They are complementary, not mutually exclusive. A common architecture uses Bedrock for the GenAI application layer and SageMaker for a custom recommendation, forecasting, or vision model that no foundation model covers — both inside the same AWS account. The detailed head-to-head lives at Bedrock vs SageMaker; the cross-cloud comparisons at Bedrock vs OpenAI and Bedrock vs Azure OpenAI.

decision guide · bedrock vs sagemaker vs direct vendor api
DimensionAmazon BedrockAmazon SageMakerDirect vendor API
Best forUsing foundation models in apps, fastOwning the full ML lifecycle / custom modelsOne vendor's newest model, day one
Infra you manageNone (serverless)You configure training & endpointsNone (vendor-hosted)
Model choice8+ providers, one APIAny model you bring or buildThat one vendor only
Data governanceStays in your AWS account & RegionStays in your AWS account & RegionLeaves AWS to the vendor
Custom / classical MLFine-tune FMs onlyFull — any architecture, any MLNo
Time to first callMinutesHours to daysMinutes
Typical buyerApp & product teamsML / data-science teamsTeams wedded to one model
Rule of thumb: default to Bedrock for generative-AI features; add SageMaker when you must own training or do non-FM ML; reach for a direct vendor API only when a specific just-released model and AWS data governance are genuinely in tension — and even then, that model often arrives on Bedrock shortly after.
first build + the cost reality

VIIIGetting started — and the cost reality (how AWS credits fund it)

Standing up a first Bedrock application is a day, not a quarter. The harder problem is not getting started — it is what happens to the bill once the application is real and traffic grows.

The fast path: enable access to one workhorse model (say Claude Sonnet) and one embeddings model (Titan or Cohere) in your Region, attach an IAM policy scoped to those model ARNs, make a first Converse call, then — if you need grounding — point a Knowledge Base at a folder of documents in S3 and wrap the whole thing in a Guardrail. You now have a grounded, governed assistant without having provisioned a single GPU.

Then the cost reality arrives. GenAI is cheap per call and expensive in aggregate. A retrieval-augmented chat assistant that resends a large system prompt and retrieved context on every turn, serving thousands of users, can move from a rounding error to five or six figures a month faster than teams expect — especially if every call hits a frontier model. The levers from section IV are how you keep it sane: route cheap calls to small models, run offline work as batch (~50% off), turn on prompt caching for repeated context, and reserve Provisioned Throughput only once volume is steady and high. The companion pages Bedrock pricing, prompt caching, and batch inference go deep on each.

The other lever is funding the bill with someone else's money — specifically AWS's. AWS runs credit programs designed precisely for teams building generative AI on Bedrock: Activate Portfolio (up to $100K) for institutionally-funded startups, dedicated Bedrock / GenAI proof-of-concept funding ($10K–$50K) for a defined GenAI build, and the competitive Generative AI Accelerator (up to $1M) for AI-first companies. These pools are largely partner-filed and invisible on the public Activate page. This is exactly what CloudRoute does: we route you to a vetted AWS partner who files the credit application and, if you need hands, who can build the Bedrock workload with you — and because AWS funds both the credits and the partner engagement, you pay $0. See AWS credits for generative-AI startups, AWS PoC / Bedrock POC funding, and $100K AWS credits.

pick the right model

Bedrock model families compared — capability vs cost vs fit

The most common Bedrock decision is not "which platform" but "which model on the platform." This is a scannable map of the major families by where they sit on the capability/cost curve and what they are for. Cost is relative ($ cheapest → $$$$ frontier); exact rates live on the AWS pricing page.

Model familyProviderRelative costStrengthsReach for it when
Claude (Haiku → Sonnet → Opus)Anthropic$ → $$$$Reasoning, coding, agents, long context, instruction-followingYou need the most reliable reasoning or agentic tool use
Nova (Micro/Lite/Pro/Premier)Amazon$ → $$$Lowest cost & latency; native image (Canvas) & video (Reel)You optimize for price/latency or need image/video generation
LlamaMeta$ → $$Strong open weights, freely fine-tunableYou want open-weight flexibility and custom fine-tunes
MistralMistral AI$ → $$Fast, cost-efficient throughputHigh-volume tasks where speed and price dominate
Titan / Cohere EmbedAmazon / Cohere$Embeddings + retrieval/rerankYou are building RAG, semantic search, or clustering
Stable Diffusion / Stable ImageStability AI$$ (per image)High-quality image generationYou need creative, product, or marketing imagery
Jamba / DeepSeekAI21 / DeepSeek$ → $$Long-context; cost-efficient open reasoningYou want long-context or budget reasoning alternatives
A production system rarely picks one. The dominant pattern is a cheap small model for the 90% of easy calls plus a frontier model for the 10% that are hard, with embeddings handling retrieval — all behind the one Converse API. Run a Bedrock model evaluation on your own data before committing.
building on bedrock?
Get AWS credits to fund your Bedrock workload — and a vetted partner to build it. You pay $0.
Get matched in 24h →
a recent match

A Bedrock build, funded by AWS credits — anonymized

inquiry · seed-stage b2b support-automation startup, EU
Seed-stage B2B SaaS, 11 people, building an AI support agent over customer docs; EU data-residency requirement; net-new to AWS

Situation: The team wanted a grounded, governed support assistant — RAG over their own knowledge base, with PII redaction and EU data residency — but had no ML infrastructure, no GPU budget, and a hard requirement that customer data never leave the EU or get used to train a vendor's model. Calling a US-hosted model API directly was a non-starter with their compliance reviewer, and standing up self-hosted inference was out of scope for an 11-person team.

What CloudRoute did: Routed within 20 hours to an EU-Central AWS partner with a GenAI + data-residency track record. The partner architected the workload entirely on Amazon Bedrock: a Knowledge Base over the docs in S3 (vectors and processing kept in eu-central-1), Claude Sonnet via the Converse API for answers, a Guardrail for PII redaction and denied topics, and prompt caching plus model routing (Nova Lite for classification, Sonnet only for hard answers) to control cost. In parallel the partner filed a Bedrock/GenAI proof-of-concept credit application and an Activate Portfolio application.

Outcome: GenAI POC credits ($25K) approved in under 2 weeks, Portfolio ($100K) shortly after — the first ~6 months of Bedrock inference were fully credit-funded. Grounded assistant in production in 5 weeks, all data resident in the EU, no traffic to any model vendor. CloudRoute's commission was paid by the partner from AWS engagement funding; the customer paid $0.

time-to-match: < 24h · credits secured: $125K · data residency: EU-only · cost to customer: $0

faq

Common questions

What is Amazon Bedrock in one sentence?
Amazon Bedrock is AWS's fully-managed service for accessing many foundation models — from Anthropic, Meta, Mistral, Amazon (Nova + Titan), Cohere, Stability AI, AI21, and DeepSeek — through a single API, with no servers to manage, where your prompts and outputs are not used to train the base models and stay in your AWS account and Region.
Which models are available on Amazon Bedrock?
As of 2026 the catalog spans Anthropic Claude (Haiku/Sonnet/Opus), Meta Llama, Mistral, Amazon Nova (Micro/Lite/Pro/Premier text, Canvas image, Reel video, Act agentic) and Amazon Titan (text + embeddings), Cohere (Command, Embed, Rerank), Stability AI (image generation), AI21 (Jamba), and DeepSeek. Exact versions vary by AWS Region and change continuously — confirm the live list under Model access in the Bedrock console.
How is Amazon Bedrock priced?
There is no platform fee. You pay per use across four levers: on-demand per 1,000 input/output tokens (rate set per model), batch (typically ~50% cheaper for asynchronous jobs), provisioned throughput (reserved hourly capacity for high steady volume and for serving custom models), and prompt caching (a steep discount on repeated context). Customization (fine-tuning/distillation) and dependent services like a Knowledge Base vector store cost extra. Token rates differ by model and Region — see the AWS Bedrock pricing page for current figures.
What is the difference between the Converse API and InvokeModel?
InvokeModel is the original lower-level API where the request and response JSON are specific to each provider, giving maximum control at the cost of provider-specific code. The Converse API is the modern, recommended interface: one consistent schema across all chat models, with built-in multi-turn conversation, system prompts, tool use, and streaming, so switching models is usually just changing the modelId. Use Converse for chat and agents; use InvokeModel for non-conversational modalities (image, embeddings) or provider-specific parameters Converse does not expose.
Is my data safe on Amazon Bedrock? Does AWS train on my prompts?
No, AWS does not train the foundation models on your data, and your prompts and outputs are not shared with the model providers. Content is encrypted in transit and at rest (with optional customer-managed KMS keys), is processed only in the AWS Region you call, can be kept off the public internet via VPC endpoints (PrivateLink), and is governed by IAM with full CloudTrail audit logging. Bedrock is also included in AWS compliance programs such as SOC, ISO 27001, HIPAA eligibility, and PCI DSS depending on Region — check AWS Artifact for current scope.
Bedrock vs SageMaker — what is the difference and when do I use each?
Bedrock is a managed API for using existing foundation models with minimal operational overhead; SageMaker is the full ML platform for building, training, and deploying your own models (including non-foundation-model and classical ML) with full control over training and serving infrastructure. Use Bedrock for generative-AI features you want fast and managed; use SageMaker when you must own the ML lifecycle or run custom architectures. They are complementary and frequently used together in the same account.
Do I need GPUs or any infrastructure to use Bedrock?
No. Bedrock is serverless from your perspective — AWS operates the inference fleet behind the API. You make an API call and pay per token (or per image/second for media models). You only encounter capacity concepts if you choose Provisioned Throughput to reserve dedicated capacity for high steady volume or to serve a fine-tuned model.
How can I reduce the cost of running on Bedrock?
Four moves, in order of impact: (1) route cheap, high-volume calls to small models (Nova Micro/Lite, Claude Haiku) and escalate only hard steps to frontier models; (2) run any latency-tolerant work as batch for roughly 50% off; (3) enable prompt caching so repeated system prompts and documents are not re-billed at full input price every call; (4) reserve Provisioned Throughput only once volume is high and steady. On top of that, AWS credits can fund the bill outright — Activate Portfolio (up to $100K), Bedrock/GenAI POC ($10K–$50K), and the GenAI Accelerator (up to $1M); CloudRoute routes you to a partner who files them, and you pay $0.

Build on Bedrock — and let AWS credits pay for it.

CloudRoute routes you to a vetted AWS partner who files your Bedrock/GenAI credit application (Activate Portfolio up to $100K, GenAI POC $10K–$50K, GenAI Accelerator up to $1M) and, if you need hands, builds the workload with you. AWS funds the credits and the engagement. You pay $0.

matched within< 24h
GenAI credit ceilingup to $1M
cost to you$0
Amazon Bedrock — the complete 2026 guide (AWS Bedrock) · CloudRoute