Amazon Bedrock is AWS's fully-managed service for building with generative AI — one API that gives you access to dozens of foundation models from Anthropic, Meta, Mistral, Amazon, Cohere and others, without managing a single GPU and without your data ever training someone else's model. This page explains what that actually means, what you can build, how it compares to ChatGPT, what it costs, and how to start — no prior ML background assumed.
If you only read one paragraph: Amazon Bedrock is a fully-managed AWS service that gives you a single, secure API to access and build on many leading generative-AI foundation models — text, chat, image, and embeddings — without provisioning or managing any of the underlying GPU infrastructure yourself.
Unpacking that sentence in plain terms: a foundation model (FM) is a large AI model trained on enormous amounts of data that can be adapted to many tasks — writing, summarizing, answering questions, classifying, generating images. ChatGPT is powered by foundation models. Claude is a foundation model family. Llama is a foundation model family. Historically, to use one of these in your own product you either called a single vendor's cloud API (locking you to that vendor) or you rented GPUs and ran the model yourself (expensive and operationally hard).
Bedrock is AWS's answer to both problems. It is a marketplace-plus-runtime: AWS has negotiated to host foundation models from many providers, exposes them all through one consistent interface, and runs the serving infrastructure on your behalf. You send text in, you get text (or an image, or a vector) back. You pay for what you use. You never see a server.
The word that does the most work in the definition is "managed." In AWS vocabulary, "managed" means AWS operates the undifferentiated heavy lifting — patching, scaling, capacity, availability — so you operate only the part that is specific to your application. With Bedrock, the heavy lifting is "keep dozens of multi-hundred-billion-parameter models loaded, healthy, and fast across regions." You don't want to do that. Bedrock does it.
It is worth saying what Bedrock is not, because the name gets misused. Bedrock is not a single model — it is the doorway to many. It is not a chatbot product you log into (that's Amazon Q, a separate AWS service). It is not a full machine-learning platform for training models from scratch (that's Amazon SageMaker). Bedrock occupies the middle: more flexible than a single-vendor API, far simpler than running your own ML platform.
Before Bedrock, a team that wanted to ship a generative-AI feature faced four separate, real problems. Bedrock collapses all four into a single managed service. Understanding the four problems is the fastest way to understand why Bedrock is shaped the way it is.
Each of these problems used to require its own decision, its own vendor, and its own ongoing operational cost. Bundling them is the entire value proposition — so it is worth taking each in turn.
The foundation-model landscape moves fast. The "best" model for your task in January may be beaten by a cheaper or smarter one by April. If you hard-wire your application to one provider's API, switching means a code rewrite. Bedrock normalizes the interface: you call the same Converse API shape regardless of whether the model underneath is Claude, Llama, Mistral, or Nova. Swapping models is often a one-line change to a model ID. That optionality is insurance against a market that re-prices and re-ranks itself constantly.
Running a large model yourself means renting GPU instances, loading multi-gigabyte weights, handling cold starts, autoscaling for spiky traffic, and paying for idle capacity between requests. For a team that just wants to call a model a few thousand times a day, that is enormous overhead. Bedrock is serverless from your point of view: there is no instance to start, no cluster to size, no idle cost on the on-demand path. You pay per token of input and output, and AWS handles the fleet.
For most companies the blocker to adopting generative AI is not capability — it is "where does my data go?" Bedrock's answer is explicit and is the single most-cited reason enterprises choose it: your prompts and the model's responses are not used to train the underlying foundation models, they are not shared with the model providers, and they stay within your AWS account and your chosen region. Inference runs inside the AWS security perimeter. You can keep traffic on private networking (VPC endpoints / PrivateLink), encrypt with your own keys (KMS), and inherit the compliance attestations AWS already holds (SOC, ISO, HIPAA-eligibility, and more). This is the difference between "we'd love to use AI" and "legal approved it."
A raw model API is just the engine. A real feature needs retrieval over your own documents, guardrails so it stays on-topic, orchestration so it can take actions, and a way to evaluate quality. Bedrock ships these as first-class managed capabilities — Knowledge Bases (managed retrieval-augmented generation), Guardrails (safety + topic filters), Agents and Flows (orchestration), Prompt Management, and model evaluation — so you assemble an application instead of building plumbing. These are covered in depth in the sibling pages; the point here is that Bedrock is a platform, not just an endpoint.
Bedrock exists so a team can ship generative AI with model choice, zero GPU operations, private data, and built-in application building blocks — four hard problems solved by one managed service billed per token.
You do not need to understand transformers or attention to use Bedrock. The mental model is simpler than people expect: choose a model, send a request, get a response — with optional building blocks layered on when you need them.
Step 1 — choose a model. In the AWS console you browse the model catalog and request access to the families you want (a one-click enablement per provider). Each model has an ID like anthropic.claude-sonnet or meta.llama or amazon.nova-pro. You can enable several and compare them on your own task.
Step 2 — send a request. Your application calls Bedrock's API — most commonly the Converse API, a single unified shape for chat-style interactions that works across providers, or InvokeModel for lower-level access. You pass your prompt (and optionally a system instruction, conversation history, images, or tool definitions). You can stream the response token-by-token for a typing effect, or get it all at once.
Step 3 — get a response and pay for the tokens. The model returns its output. Bedrock meters the input tokens (roughly, the length of what you sent — about 0.75 words per token in English) and the output tokens (the length of what the model generated), and bills you per 1,000 tokens at that model's published rate. Nothing is provisioned; you are charged only for the request you just made.
On top of that core loop, Bedrock offers optional managed capabilities you opt into when needed: connect a Knowledge Base so the model can answer using your private documents (this is RAG — retrieval-augmented generation — and Bedrock manages the vector store, chunking, and retrieval for you); attach a Guardrail to block disallowed topics or filter sensitive data; define an Agent so the model can call your APIs to take real actions; or chain steps together with Flows. Each is additive — you can ship value with just Step 1–3 and add the rest later.
Behind the curtain, AWS keeps these models loaded on high-performance accelerators — NVIDIA GPUs and, for some capacity, AWS's own custom silicon (Trainium for training, Inferentia for inference) — spread across regions for availability. Cross-region inference can automatically route your request to a region with spare capacity to smooth out throughput limits. None of this is your responsibility; it is the "managed" part doing its job.
The abstract pitch ("access foundation models") becomes concrete fast once you see the handful of application shapes teams ship. Almost every real Bedrock project is a variation on one of these six patterns.
These are not hypotheticals — they are the patterns that show up again and again across the teams CloudRoute routes to AWS partners. Each maps to specific Bedrock capabilities, noted so you know which sibling page to read next.
Most teams start with a chatbot or a RAG assistant because it delivers visible value in days, then graduate to agents as they get comfortable. Summarization and embeddings often run quietly in the background of larger products.
Bedrock is broad, but it is not the right tool for every situation. Being honest about fit is part of being a useful reference — here is who it suits, and the cases where another path is better.
Bedrock fits you well if: you are a startup or company that wants to add generative-AI features to a product; you already run on AWS or are happy to; you care about data privacy and compliance; you want freedom to switch between models (especially to use Anthropic's Claude, Meta's Llama, or Amazon's Nova through one interface); and you would rather pay per use than operate GPU infrastructure. This describes the large majority of teams shipping AI today.
Bedrock is overkill or off-target if: you just want a personal chatbot to talk to — then a consumer product like ChatGPT or Claude.ai is simpler. If you need to train a brand-new foundation model from scratch, or you need total control of the serving stack and custom kernels, then Amazon SageMaker (the full ML platform) is the right AWS service, not Bedrock. And if you want a ready-made AI assistant over your company's data with no building at all, Amazon Q Business may get you there faster.
A useful way to place it: consumer AI apps (ChatGPT, Claude.ai) are for end-users; Bedrock is for developers building AI into their own products with private data and model choice; SageMaker is for ML teams who need to build, train, and operate models at the lowest level. Many organizations use more than one — Bedrock for product features, SageMaker for bespoke models, Q for internal productivity.
Role-wise, the people who get the most from Bedrock are application developers and product engineers (it lets them ship AI without becoming ML experts), data and ML engineers (it removes infrastructure toil), and founders/CTOs (it makes AI a per-token operating expense rather than a capital project). You do not need a research team to use it well.
This is the comparison everyone asks for, and the confusion is understandable because the products overlap in people's minds. The simplest framing: ChatGPT is a finished app you talk to; Bedrock is a platform you build on. They are not really competitors — they sit at different layers.
ChatGPT (and Claude.ai, and Gemini) is a consumer/end-user product: you open a website or app, type, and get answers. It is one company's models behind one interface. Perfect for individuals and for ad-hoc work.
The OpenAI API is the closer comparison to Bedrock: it lets developers call OpenAI's models programmatically. But it gives you OpenAI's models only, billed through OpenAI, governed by OpenAI's data terms. Bedrock, by contrast, gives you many providers' models through AWS — including Anthropic's Claude, Meta's Llama, Mistral, Cohere, and Amazon's own Nova and Titan — billed through your existing AWS account, governed by AWS's enterprise data terms, and running inside your AWS security perimeter.
The practical differences that matter to a team choosing between "OpenAI API" and "Bedrock" are: model choice (one vendor vs many, with easy switching), where your data lives and whether it can train the base model (Bedrock keeps it in your AWS account/region and does not train base models on it), billing and procurement (a new vendor relationship vs a line item on the AWS bill you may already have negotiated), and integration (Bedrock inherits your AWS IAM, VPC, KMS, and compliance posture). Note that you can in fact reach OpenAI-family and many other models through Bedrock as the ecosystem expands — the point is Bedrock is the aggregator, not a single lab.
A fair summary: if you are an individual who wants to chat, use ChatGPT or Claude.ai. If you are a developer who is all-in on one lab's models and tooling, the direct API is fine. If you are a company that wants model optionality, enterprise data controls, and one bill inside the cloud you already run on, that is the case Bedrock is built for. The deeper, number-by-number breakdown lives on the dedicated amazon-bedrock-vs-openai page.
| ChatGPT (app) | OpenAI API | Amazon Bedrock | |
|---|---|---|---|
| What it is | Consumer chat app | Developer API, one lab | Managed multi-model API + platform |
| Who it's for | Individuals / ad-hoc | Devs committed to OpenAI | Companies building AI features |
| Model choice | OpenAI only | OpenAI only | Anthropic, Meta, Mistral, Amazon, Cohere, more |
| Your data trains the base model? | Depends on settings | No (API tier) | No — stays in your AWS account/region |
| Billing | Per-seat subscription | OpenAI invoice | On your AWS bill, per token |
| Enterprise controls | Limited | Some | Full AWS IAM / VPC / KMS / compliance |
| Best when | You want to chat | You're all-in on OpenAI | You want choice + AWS-grade governance |
The honest short answer: starting is effectively free, prototyping is cheap, and production is "pay for exactly what you use" — which is wonderful until traffic scales, at which point cost discipline (and credits) matters. Here is the plain-English version; the full per-model price table lives on the amazon-bedrock-pricing page.
Bedrock's main pricing model is on-demand, per token. You are billed separately for input tokens (what you send) and output tokens (what the model generates), at a per-1,000-token rate that depends on the model. Cheaper, faster models (Amazon Nova Micro/Lite, Claude Haiku, smaller Llama and Mistral models) cost a fraction of a cent per 1,000 tokens; the most capable models (Claude Opus-class, large frontier models) cost meaningfully more. The skill is matching model to task — you do not need the most expensive model for summarizing a support ticket.
A token is roughly ¾ of a word in English, so a 1,000-token prompt is about 750 words and a 1,000-token answer is about 750 words. That mental conversion is enough to sanity-check a bill: a chatbot turn might be a few hundred to a couple thousand tokens total, costing a fraction of a cent on a mid-tier model.
There are four ways to pay, and choosing the right one is the biggest lever on cost: On-Demand (no commitment, pay per request — the default for prototypes and variable traffic); Batch (submit a large job and get results back asynchronously for roughly half the on-demand price — ideal for bulk summarization or classification that isn't time-sensitive); Provisioned Throughput (reserve dedicated capacity at an hourly rate for predictable, high-volume, low-latency workloads); and prompt caching (when many requests share a long common prefix — a big system prompt or document — Bedrock can cache it so you don't re-pay full price for the repeated context, cutting cost substantially on the right workloads).
Beyond inference, two other line items can appear: customization (fine-tuning a model on your data has a training cost, and a custom model then incurs a storage/hosting cost to keep available), and embeddings (priced per token, very cheap, but high-volume RAG ingestion adds up). For most teams, none of this is large at the start — a working proof-of-concept commonly lands in the single-digit-to-low-tens-of-dollars range.
Where it gets expensive is scale: a popular consumer feature doing millions of generations a month, or fine-tuning and serving custom models, can move from dollars to thousands of dollars quickly. This is precisely the gap AWS credits are designed to cover — and the reason this page exists alongside CloudRoute's offer. (All figures here are representative as of 2026; always confirm current rates on the AWS Bedrock pricing page, and see the amazon-bedrock-pricing sibling for the full per-model table.)
AWS funds generative-AI builds through credit programs: Activate (up to $100K), a dedicated Bedrock / GenAI POC pool ($10K–$50K), and the GenAI Accelerator (up to $1M for selected startups). CloudRoute routes you to the right pool and a vetted partner to build it — the customer pays $0; AWS funds the engagement and the partner pays CloudRoute.
Getting from "I've heard of Bedrock" to "I'm calling a model" is genuinely a same-afternoon exercise. Here is the shortest honest path, plus the smarter path if you intend to ship something real and want AWS to fund it.
The fastest hands-on path (about an hour):
Prototyping on Bedrock is cheap, but a production GenAI workload — real traffic, fine-tuning, or a serious RAG corpus — is where the bill grows. Before you spend your own money on that, it is worth knowing AWS will frequently fund the build with credits. The Activate program (up to $100K), the Bedrock/GenAI POC pool ($10K–$50K), and the GenAI Accelerator (up to $1M for selected AI-first startups) exist specifically to put generative-AI workloads on AWS.
These credit pools are largely partner-filed — they are requested through the AWS Partner Network, not a public self-serve form — which is why most teams route through a partner. CloudRoute matches you to the right credit pool for your stage and to a vetted AWS DevOps/ML partner who can both file the credit application and help build the Bedrock workload. The customer pays $0; AWS funds the credit pool and the partner pays CloudRoute a routing commission. If you are about to build on Bedrock, start there — it is the difference between a self-funded experiment and an AWS-funded build.
AWS has several generative-AI and ML services and the names blur together. This is the plain-English map: what each one is, who it is for, and when to reach for it instead of Bedrock.
| AWS service | What it is, in plain terms | Reach for it when… | Pricing shape |
|---|---|---|---|
| Amazon Bedrock | One managed API to many foundation models + building blocks (RAG, agents, guardrails) | You're a developer adding GenAI to a product with private data and model choice | Per token (on-demand / batch / provisioned) |
| Amazon Q Developer | An AI coding assistant in your IDE / CLI / console | You want AI help writing and operating code | Per seat + usage |
| Amazon Q Business | A ready-made AI assistant over your company's data — little to no building | You want internal "chat with our docs" without engineering it | Per seat |
| Amazon SageMaker | The full ML platform — build, train, tune, and deploy models at the lowest level | You need to train custom models or control the serving stack | Per compute hour + storage |
| Amazon Nova | Amazon's own fast, low-cost model family (text/image/video) — accessed through Bedrock | You want cheap, low-latency models inside Bedrock | Per token (via Bedrock) |
| Trainium / Inferentia | AWS's custom AI chips, cheaper than GPUs, used via the Neuron SDK on EC2 | You're training/serving at scale and want to cut hardware cost | Per instance hour |
Situation: The team had heard of Bedrock but had no ML experience and were nervous about (a) where customer documents would go and (b) the cost of "AI" before they had revenue to justify it. They wanted a "chat with your contracts" assistant inside their product, grounded in each customer's own document set, with hard guarantees that data stayed private and was not used to train any model.
What CloudRoute did: CloudRoute matched them within 20 hours to an EU-Central AWS partner with a RAG + data-residency track record. The partner confirmed Bedrock's data terms satisfied the privacy requirement (prompts/outputs stay in-account, in-region, not used to train base models), filed a Bedrock POC credit application ($25K) plus an Activate Portfolio application, and architected a Knowledge-Bases RAG pipeline on Claude with a Guardrail to keep answers scoped to the uploaded contracts.
Outcome: A working in-product assistant in 5 weeks, grounded in per-tenant documents with citations. Total Bedrock + supporting AWS spend during the build was fully covered by the $25K POC credit — the team spent $0 of their own runway. They graduated to a larger Activate award as usage grew. CloudRoute's commission was paid by the partner from AWS engagement funding.
time-to-prototype: 5 weeks · founder time: ~7 hours · credits secured: $25K POC + Activate · cost to customer: $0
You bring the idea. CloudRoute routes you to AWS credits (Activate up to $100K, Bedrock POC $10K–$50K, GenAI Accelerator up to $1M) and a vetted AWS partner who builds the Bedrock workload. Customer pays $0 — AWS funds it.