for AWS partners →Fund your Bedrock build with AWS credits →

bedrock runtime · control plane vs data plane · 2026

Amazon Bedrock Runtime — `bedrock` vs `bedrock-runtime`, explained.

Q: What is the difference between `bedrock` and `bedrock-runtime`?

`bedrock` is the control-plane (management) API: list and request model access, and create/manage Knowledge Bases, Guardrails, Agents, custom-model and fine-tuning jobs, batch jobs, provisioned throughput, evaluation, and invocation logging. `bedrock-runtime` is the data-plane (inference) API: InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream — the operations that actually send a prompt to a model and return a completion. Rule of thumb: configuring the platform → bedrock; running inference → bedrock-runtime. Inference operations do not exist on the bedrock client.

Q: Which Bedrock SDK client do I use to call a model?

Use the bedrock-runtime client. In Python that is boto3.client("bedrock-runtime"); in the JS SDK v3 it is BedrockRuntimeClient from @aws-sdk/client-bedrock-runtime; in Java it is BedrockRuntimeClient. Call converse() / Converse for chat (one schema across all chat models, switch by changing modelId) or invoke_model() / InvokeModel for embeddings, image, and video models or provider-specific parameters. The plain bedrock client cannot run inference.

Q: How many Bedrock clients are there, and what are they?

Four, in two pairs. Foundation models: bedrock (control plane — manage models/jobs/throughput) and bedrock-runtime (data plane — InvokeModel/Converse). Agents & Knowledge Bases: bedrock-agent (control plane — create/manage agents, KBs, data sources, ingestion, Flows) and bedrock-agent-runtime (data plane — InvokeAgent, Retrieve, RetrieveAndGenerate). A name ending in -runtime is the request-time client; a name without it is the management client.

Q: Why am I getting "object has no attribute 'invoke_model'" / "'converse'"?

You built a bedrock (control-plane) client and tried to run inference on it. The invoke and converse operations live only on the bedrock-runtime client. Construct boto3.client("bedrock-runtime") (or your SDK's equivalent) and call invoke_model / converse there. The reverse error — list_foundation_models missing — means you tried a management call on the runtime client; use the bedrock client for that.

Q: I enabled the model but still get AccessDenied on invoke. Why?

Model access and IAM are two independent gates. Enabling a model under Model access (a control-plane step) does not grant your principal permission to invoke it. The calling role also needs the runtime IAM action — bedrock:InvokeModel (and bedrock:InvokeModelWithResponseStream if you stream, which is a separate action) — ideally scoped to the model ARN. You must satisfy both gates, in the correct Region, for the call to succeed.

Q: Which IAM permissions does a production app actually need?

Usually only data-plane (runtime) actions, scoped to the specific ARNs it uses: bedrock:InvokeModel / bedrock:Converse (and the …Stream variants if you stream) on the model ARNs, and bedrock:InvokeAgent / bedrock:Retrieve / bedrock:RetrieveAndGenerate on the agent/KB ARNs. It should not hold control actions (CreateKnowledgeBase, CreateGuardrail, CreateModelCustomizationJob, etc.) — keep those on a separate admin or infrastructure-as-code deployment role off the request path.

Q: Where do InvokeAgent and Knowledge-Base queries live — runtime or not?

On bedrock-agent-runtime. Building or configuring an agent or Knowledge Base (CreateAgent, CreateKnowledgeBase, CreateDataSource, StartIngestionJob) uses the bedrock-agent control-plane client. Invoking them at request time — InvokeAgent to run an agent, Retrieve to fetch passages, RetrieveAndGenerate to get a grounded answer with citations — uses the bedrock-agent-runtime data-plane client. Mixing these two up is a common source of "operation not found" errors.

Q: How do I keep Bedrock runtime costs under control?

All token spend flows through the data plane, so the levers act there: route cheap, high-volume calls to small models and escalate only hard steps to frontier models; run latency-tolerant work as batch (~50% cheaper); enable prompt caching so repeated system prompts/documents are not re-billed each call; and reserve provisioned throughput only at high steady volume. On top of that, AWS credits can fund the bill outright — Activate Portfolio (up to $100K), Bedrock/GenAI POC ($10K–$50K), and the GenAI Accelerator (up to $1M); CloudRoute routes you to a partner who files them and helps wire the runtime, and you pay $0.

The single most common Bedrock mistake is calling the wrong API. There are two: the `bedrock` control-plane API (manage models, knowledge bases, guardrails, agents, and jobs) and the `bedrock-runtime` data-plane API (InvokeModel and Converse — where inference actually happens). This is the complete reference: what each plane does, which SDK client to use for what (`bedrock`, `bedrock-runtime`, `bedrock-agent`, `bedrock-agent-runtime`), the IAM split between control actions and invoke actions, streaming, worked code, and the gotchas that send "AccessDenied" and "client has no attribute" errors your way.

Fund your Bedrock build with AWS credits →→ jump to the client map

runtime clients

inference lives in

bedrock-runtime

control plane

bedrock

cost to build

$0 with credits

TL;DR

Amazon Bedrock splits into two API surfaces. The control plane — the `bedrock` service/client — is for management: list and request model access, create and manage Knowledge Bases, Guardrails, Agents, custom-model and fine-tuning jobs, provisioned-throughput, and evaluation. The data plane — the `bedrock-runtime` service/client — is for inference: InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream. If you are sending a prompt and getting a completion, you want bedrock-runtime; if you are configuring the platform, you want bedrock.
There are actually four clients, in two pairs. Foundation models: `bedrock` (manage) + `bedrock-runtime` (invoke). Agents & Knowledge Bases: `bedrock-agent` (build/manage agents, KBs, data sources) + `bedrock-agent-runtime` (invoke them at request time — InvokeAgent, Retrieve, RetrieveAndGenerate). The single most common beginner error is calling `invoke_model` / `converse` on the `bedrock` client (it does not exist there) — those operations live only on `bedrock-runtime`.
The split is also an IAM boundary. Control actions are `bedrock:` permissions like CreateKnowledgeBase, CreateGuardrail, CreateModelCustomizationJob, PutModelInvocationLoggingConfiguration. Runtime actions are the invoke permissions — InvokeModel, InvokeModelWithResponseStream, Converse, ConverseStream, InvokeAgent, Retrieve, RetrieveAndGenerate — which you can scope to specific model and resource ARNs. A production app role usually needs only the runtime actions; setup/admin roles get the control actions. GenAI bills scale fast — CloudRoute routes you to AWS credits and a vetted partner who does the wiring; you pay $0.

the core distinction

ITwo APIs, one service — control plane vs data plane

Almost every AWS service splits its API into a control plane (you manage and configure resources) and a data plane (you use them at high volume). Bedrock follows the same pattern, and confusing the two is the first wall most new builders hit. Get this one distinction right and most of the rest of Bedrock falls into place.

The control plane is the management API. It is where you set the platform up and change its configuration: requesting access to foundation models, creating and updating Knowledge Bases, Guardrails, Agents, custom (fine-tuned) models, evaluation jobs, batch-inference jobs, provisioned-throughput model units, and invocation-logging settings. These operations are relatively infrequent, are usually done by an administrator or by infrastructure-as-code, and they create or describe resources. In the AWS SDK this surface is the service simply named bedrock (for example, boto3.client("bedrock")).

The data plane is the runtime API. It is where the actual work happens at request time: you send a prompt to a model and get a completion back. These operations are high-frequency, latency-sensitive, and run on the hot path of your application — every chat message, every RAG answer, every agent step is a data-plane call. In the AWS SDK this surface is the service named bedrock-runtime (for example, boto3.client("bedrock-runtime")). The word "runtime" in the name is the whole point: this is the API you call at runtime to run inference.

Why does AWS separate them at all? Three reasons. Scale and latency: the data plane is engineered for very high request rates and low latency; the control plane is not, and mixing them would compromise both. Security: separating them lets you grant a production application permission to invoke models without giving it permission to reconfigure the platform — least privilege by design (section V). Blast radius: a misbehaving app hammering the data plane cannot accidentally delete a Knowledge Base, because deleting a Knowledge Base is a different API on a different endpoint with different permissions.

A useful mental model: the control plane is the thermostat and wiring — you set it up once and adjust it occasionally; the data plane is the electricity flowing through the wires — it runs constantly while the app is live. You install the wiring with bedrock; you draw current with bedrock-runtime.

the one-line rule

If you are sending a prompt and getting a completion, use bedrock-runtime (InvokeModel / Converse). If you are creating, listing, or configuring models, knowledge bases, guardrails, agents, or jobs, use bedrock. Inference is never on the bedrock client; management is never on the bedrock-runtime client.

four clients, two pairs

IIThe four clients — and which one to reach for

Bedrock does not have two clients; it has four. They form two clean pairs: one pair for foundation models (manage / invoke) and one pair for the higher-level agent and knowledge-base capabilities (manage / invoke). Once you see the pairing, choosing the right client becomes mechanical.

The naming is consistent and worth memorizing because it tells you exactly what a client is for. A bare service name (bedrock, bedrock-agent) is a control-plane / build-time client — it manages resources. A name ending in -runtime (bedrock-runtime, bedrock-agent-runtime) is a data-plane / request-time client — it executes them. So the suffix -runtime is your reliable signal: present means "run it," absent means "configure it."

bedrock — foundation-model control plane

The management API for foundation models and the platform around them. Use it to list available foundation models (ListFoundationModels), manage model access, create and manage custom-model / fine-tuning jobs, batch (model-invocation) jobs, provisioned throughput, model-evaluation jobs, and invocation logging configuration. You do not run inference here.

bedrock-runtime — foundation-model data plane

The inference API. This is where InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream live — the operations that send a prompt to a model and return tokens. The overwhelming majority of your Bedrock API calls in production go here. If you are building a chat feature, a summarizer, an extraction pipeline, or an embeddings job, this is your client.

bedrock-agent — agents & Knowledge Bases control plane

The management API for the higher-level capabilities. Use it to create and manage Agents (instructions, action groups, aliases, versions), create and manage Knowledge Bases and their data sources, kick off ingestion jobs (chunk + embed your documents), and manage Flows and Prompt Management resources. This is build-time configuration for agents and RAG — not the request-time path.

bedrock-agent-runtime — agents & Knowledge Bases data plane

The request-time API for agents and RAG. Use it to invoke an agent (InvokeAgent), to retrieve passages from a Knowledge Base (Retrieve), and to retrieve-and-generate a grounded answer with citations in one call (RetrieveAndGenerate). When your application asks an agent to do something, or queries your RAG knowledge base for an answer, it calls this client — not bedrock-agent.

the four bedrock clients · control plane vs data plane · representative operations as of 2026

SDK client / service	Plane	Scope	Representative operations	You call it…
bedrock	Control	Foundation models + platform	ListFoundationModels, CreateModelCustomizationJob, CreateModelInvocationJob (batch), CreateProvisionedModelThroughput, CreateEvaluationJob, PutModelInvocationLoggingConfiguration	At setup / admin time
bedrock-runtime	Data	Foundation-model inference	InvokeModel, InvokeModelWithResponseStream, Converse, ConverseStream, ApplyGuardrail	On every request (hot path)
bedrock-agent	Control	Agents, Knowledge Bases, Flows	CreateAgent, CreateAgentAlias, CreateKnowledgeBase, CreateDataSource, StartIngestionJob, CreateFlow	At build / config time
bedrock-agent-runtime	Data	Agent + RAG invocation	InvokeAgent, Retrieve, RetrieveAndGenerate, InvokeFlow	On every agent / RAG request

Operation names are representative of the 2026 API and follow AWS's consistent naming; the SDK method casing differs by language (boto3 snake_case invoke_model vs JS InvokeModelCommand). The reliable rule: a client whose name ends in -runtime executes; a client without that suffix configures. The #1 beginner error is calling invoke_model / converse on bedrock — those operations exist only on bedrock-runtime.

inside the data plane

IIIThe bedrock-runtime call surface — InvokeModel vs Converse, and streaming

Within bedrock-runtime there are really two ways to call a model and two ways to receive the response. Knowing which to pick removes most of the friction of your first integration.

The two ways to call a model are Converse (the modern, recommended path) and InvokeModel (the original, lower-level path). Both live on the same bedrock-runtime client; they differ in how much the API does for you.

Converse / ConverseStream — one schema for every chat model

The Converse operation gives you one consistent request and response shape across every chat-capable model on Bedrock. You pass a modelId, a list of messages (each with a role and content), an optional system prompt, an inferenceConfig (max tokens, temperature, top-p), and optional toolConfig for function/tool use. The response comes back in the same structure no matter which provider served it. The practical payoff: switching models is usually a one-line change to the modelId string — no rewriting of provider-specific request bodies. Use Converse for chat, multi-turn conversation, and agentic tool use; it should be your default. ConverseStream is the same operation with the response delivered incrementally as tokens are generated.

InvokeModel / InvokeModelWithResponseStream — the lower-level path

The InvokeModel operation sends a raw request body whose JSON shape is specific to each provider, and returns a provider-specific response body that you parse yourself. It gives maximum control and exposes provider-specific parameters that Converse may not surface — but it means provider-specific code, so changing models is more than a one-line edit. Reach for InvokeModel when you need a non-conversational modality (image generation, video, or text embeddings) or a provider-specific knob Converse does not expose. InvokeModelWithResponseStream is the streaming counterpart.

Streaming — why it is a separate operation

Streaming is not a flag on the standard call; it is a distinct operation (InvokeModelWithResponseStream / ConverseStream) that returns an event stream you iterate over as chunks arrive, rather than a single response object. You use it whenever you want tokens to appear in a UI as they are generated (the typing-indicator effect) instead of waiting for the whole completion. Note that the IAM permission for streaming is also separate — bedrock:InvokeModelWithResponseStream is a different action from bedrock:InvokeModel — which is a frequent cause of "it works until I turn on streaming, then AccessDenied" (section V).

ApplyGuardrail — runtime safety without a model call

One more data-plane operation worth knowing: ApplyGuardrail runs a Guardrail (a safety/policy filter you created on the control plane) against arbitrary text on its own, without invoking a model. It lets you screen input before it reaches a model, or screen output from any source, and is how you apply Bedrock's content/PII filtering to text that did not necessarily come from a Bedrock model call.

which inference operation to pick

Default to Converse for any chat or tool-using app — one schema, swap models by changing modelId. Use InvokeModel for embeddings, image, or video models, or a provider-specific parameter Converse does not expose. Add the …Stream variant (and the matching IAM action) when you want token-by-token output in the UI.

worked examples

IVWorked examples — the two clients side by side

Seeing the same task expressed against each client makes the split concrete. Below: a control-plane call (list models), a data-plane Converse call (run inference), and an agent-runtime call (query a Knowledge Base). All three are illustrative — copy exact model IDs, ARNs, and parameter shapes from the Bedrock console and the current SDK reference.

The pattern to notice across all three: the client you construct determines what you are allowed to do. You cannot call converse() on a bedrock client, and you cannot call list_foundation_models() on a bedrock-runtime client — the methods simply do not exist on the wrong client, which is exactly the "object has no attribute" error new users hit.

Control plane — list available models (bedrock)

A management call: enumerate the foundation models your account can see. This is a bedrock (control-plane) operation — there is no inference here.

Data plane — run inference with Converse (bedrock-runtime)

The hot-path call: send a prompt, get a completion. Note the client name — bedrock-runtime — and that modelId is the only thing you change to switch models.

Agent data plane — RetrieveAndGenerate over a Knowledge Base (bedrock-agent-runtime)

A RAG call: ask a question and get an answer grounded in your Knowledge Base, with citations, in one operation. This is a bedrock-agent-runtime call — the Knowledge Base itself was created earlier with the bedrock-agent control-plane client.

three clients, three jobs (python / boto3, illustrative)

# 1) CONTROL PLANE — manage
bedrock = boto3.client("bedrock", region_name="us-east-1")
models = bedrock.list_foundation_models()

# 2) DATA PLANE — run inference
brt = boto3.client("bedrock-runtime", region_name="us-east-1")
resp = brt.converse(
  modelId="anthropic.claude-sonnet",  # swap to switch models
  messages=[{"role":"user","content":[{"text":"Summarize our refund policy."}]}],
  inferenceConfig={"maxTokens":512,"temperature":0.2},
)
print(resp["output"]["message"]["content"][0]["text"])

# 3) AGENT DATA PLANE — grounded RAG answer
bar = boto3.client("bedrock-agent-runtime", region_name="us-east-1")
ans = bar.retrieve_and_generate(
  input={"text":"What is our SLA for enterprise plans?"},
  retrieveAndGenerateConfiguration={"type":"KNOWLEDGE_BASE", "knowledgeBaseConfiguration":{"knowledgeBaseId":"KB123", "modelArn":"..."}},
)

Model IDs, ARNs, and exact parameter shapes are illustrative — confirm current values in the console and SDK reference.

the permission boundary

VIAM implications — control actions vs invoke actions

The control/data split is not just an SDK convenience; it is a security boundary you should deliberately use. The two planes map to two distinct families of IAM actions, and a well-designed setup grants each role only the family it needs.

All Bedrock permissions sit under the bedrock: IAM namespace, but they fall into two functional groups. Control actions create, read, update, or delete resources — examples include bedrock:CreateKnowledgeBase, bedrock:CreateGuardrail, bedrock:CreateAgent, bedrock:CreateModelCustomizationJob, bedrock:CreateModelInvocationJob (batch), bedrock:PutModelInvocationLoggingConfiguration, and the various List* / Get* describe actions. Runtime (invoke) actions execute models and resources — bedrock:InvokeModel, bedrock:InvokeModelWithResponseStream, bedrock:Converse, bedrock:ConverseStream, bedrock:InvokeAgent, bedrock:Retrieve, and bedrock:RetrieveAndGenerate.

The design rule that follows: a production application role should usually hold only the runtime actions, scoped down to the specific model and resource ARNs it uses. Your service that answers chat messages needs bedrock:Converse on a couple of model ARNs — it does not need, and should not have, the ability to create a Knowledge Base or delete a Guardrail. The control actions belong to a separate setup/admin role (or your infrastructure-as-code deployment role), which provisions the resources once and is not attached to the request path. This is textbook least privilege, and the API split makes it easy to express.

Three IAM subtleties trip people up repeatedly. First, streaming is a separate action: granting bedrock:InvokeModel does not grant bedrock:InvokeModelWithResponseStream, so an app that works on standard calls can throw AccessDenied the moment you enable streaming. Second, model access is a prerequisite to invocation, not a substitute for IAM: even after you enable a model under Model access (a control-plane step), the calling principal still needs the runtime IAM permission to invoke it — two independent gates. Third, you can and should scope invoke permissions to model ARNs (and agent/knowledge-base ARNs) in the policy Resource field, so a role can call exactly the models it is meant to and nothing else.

Finally, the control/data planes share the same auditing and networking story. Every call on either plane is recorded in AWS CloudTrail; both planes are reachable privately over VPC endpoints (PrivateLink) so traffic need not touch the public internet; and you can capture full request/response payloads from the data plane with model-invocation logging (a control-plane setting that governs the data plane). The permission split is the lever; CloudTrail and invocation logging are how you verify it is working.

bedrock IAM actions by plane · who should hold them

Action family	Plane	Example actions	Typical holder	Scope to ARNs?
Manage models / platform	Control	bedrock:ListFoundationModels, bedrock:CreateModelCustomizationJob, bedrock:CreateProvisionedModelThroughput, bedrock:PutModelInvocationLoggingConfiguration	Admin / IaC deploy role	Where supported
Manage agents / KBs	Control	bedrock:CreateAgent, bedrock:CreateKnowledgeBase, bedrock:CreateDataSource, bedrock:StartIngestionJob, bedrock:CreateGuardrail	Admin / IaC deploy role	Where supported
Invoke models	Data	bedrock:InvokeModel, bedrock:InvokeModelWithResponseStream, bedrock:Converse, bedrock:ConverseStream	Production app role	Yes — to model ARNs
Invoke agents / RAG	Data	bedrock:InvokeAgent, bedrock:Retrieve, bedrock:RetrieveAndGenerate	Production app role	Yes — to agent / KB ARNs

Least-privilege pattern: app roles get only the data-plane invoke actions, scoped to the exact model / agent / KB ARNs they use; control actions live on a separate admin or deployment role off the request path. Remember the two independent gates — Model access (control-plane enablement) AND the runtime IAM permission — must both be satisfied to invoke a model. Action names are accurate as of 2026; confirm the current list in the AWS IAM / Bedrock documentation.

where it goes wrong

VIThe errors this confusion causes — and how to read them

Because the control/data split is invisible until you hit it, it produces a recognizable family of errors. Each one points back to the same root cause: an operation, permission, or resource on the wrong plane.

A simple debugging heuristic captures most of these: when something fails, ask "which plane is this operation on, and does this principal have the matching permission on the matching ARN in the matching Region?" Nearly every Bedrock-runtime error resolves to a mismatch in one of those four — plane, permission, ARN, or Region.

"'BedrockClient' object has no attribute 'invoke_model'" (or 'converse') — You built a `bedrock` client and tried to run inference on it. Inference operations live only on `bedrock-runtime`. Construct boto3.client("bedrock-runtime") (or the equivalent in your SDK) and call invoke_model / converse there.
"'BedrockRuntimeClient' object has no attribute 'list_foundation_models'" — The mirror image: you tried a management call on the runtime client. ListFoundationModels and all CRUD/job operations live on the `bedrock` control-plane client.
AccessDenied on invoke even though the model is "enabled" — Model access (a control-plane toggle) and the runtime invoke permission (IAM) are two independent gates. Enabling the model under Model access does not grant bedrock:InvokeModel to your principal — add the runtime IAM permission, scoped to the model ARN.
Works on normal calls, AccessDenied the moment streaming is on — Streaming is a separate IAM action. Grant bedrock:InvokeModelWithResponseStream (and/or bedrock:ConverseStream) in addition to the non-streaming action.
InvokeAgent / Retrieve "not found" or "no such operation" — Agent and Knowledge-Base invocation lives on `bedrock-agent-runtime`, not `bedrock-runtime` and not `bedrock-agent`. The build-time operations (CreateAgent, CreateKnowledgeBase) are on `bedrock-agent`; the request-time ones (InvokeAgent, Retrieve, RetrieveAndGenerate) are on `bedrock-agent-runtime`.
ValidationException / ResourceNotFound by Region — Both planes are regional and model availability differs by Region. A model enabled in us-east-1 is not enabled in eu-central-1, and an agent/KB ARN is Region-specific. Make sure the client Region, the Model-access grant, and the resource ARN all line up — or use a cross-region inference profile where appropriate.

across languages

VIIThe same split across every SDK

The control/data separation is not a boto3 quirk — it is how the Bedrock API is defined, so every AWS SDK exposes the same four services. Only the surface syntax changes.

In Python (boto3) you select the service by string: boto3.client("bedrock"), "bedrock-runtime", "bedrock-agent", "bedrock-agent-runtime", and methods are snake_case (invoke_model, converse, retrieve_and_generate). In the AWS SDK for JavaScript v3 each service is a separate package and client class — @aws-sdk/client-bedrock (BedrockClient), @aws-sdk/client-bedrock-runtime (BedrockRuntimeClient), @aws-sdk/client-bedrock-agent, and @aws-sdk/client-bedrock-agent-runtime — and you send command objects (InvokeModelCommand, ConverseCommand, RetrieveAndGenerateCommand). Java exposes BedrockClient vs BedrockRuntimeClient (and the agent equivalents) as distinct client classes; the AWS CLI mirrors it as aws bedrock … vs aws bedrock-runtime … vs aws bedrock-agent … vs aws bedrock-agent-runtime ….

The takeaway is portable: whatever language you are in, you will construct a separate client per service, and the same four-way split — manage foundation models, invoke foundation models, manage agents/KBs, invoke agents/KBs — applies identically. If you internalize the planes once, the knowledge transfers to every SDK, the CLI, and Terraform/CloudFormation (where control-plane resources are what you declare).

package / client cheat-sheet (JS SDK v3)

Manage models: @aws-sdk/client-bedrock → BedrockClient
Run inference: @aws-sdk/client-bedrock-runtime → BedrockRuntimeClient (InvokeModelCommand, ConverseCommand, …)
Manage agents / KBs: @aws-sdk/client-bedrock-agent → BedrockAgentClient
Invoke agents / RAG: @aws-sdk/client-bedrock-agent-runtime → BedrockAgentRuntimeClient (InvokeAgentCommand, RetrieveAndGenerateCommand, …)

the bigger picture + cost

VIIIWhere the runtime fits — and who pays for the inference

The bedrock-runtime API is the engine of every Bedrock application; everything else is configuration around it. Understanding that makes the cost picture — and how to fund it — straightforward.

Because all inference flows through the data plane, that is where essentially all of your token spend is generated. Control-plane operations (creating a Knowledge Base, listing models) are free or trivial; the bill comes from InvokeModel / Converse / InvokeAgent calls multiplied by your traffic. The cost levers therefore all act on the data plane: route cheap, high-volume calls to small models and escalate only hard steps to frontier models; run latency-tolerant work as batch inference (~50% cheaper); turn on prompt caching so a repeated system prompt or document is not re-billed every request; and reserve provisioned throughput only once volume is high and steady. The full model is in Amazon Bedrock pricing; the platform overview is Amazon Bedrock and the plain-English primer is what is Amazon Bedrock.

The architecture is also worth saying out loud because it explains the IAM and cost story at once: agents and RAG sit on top of the foundation-model runtime. An InvokeAgent call internally drives multiple model invocations (the orchestration loop), and a RetrieveAndGenerate call does a retrieval plus a generation — so the higher-level data-plane operations are token-heavy and inherit the same cost levers. See Amazon Bedrock Agents, Knowledge Bases, and RAG on AWS for those layers.

Which brings us to the part that is genuinely hard to do well alone: standing up the runtime correctly — the right clients, least-privilege IAM, model access, logging, VPC endpoints, and the cost controls — and then funding the inference bill. AWS runs credit programs built precisely for GenAI builds: Activate Portfolio (up to $100K) for institutionally-funded startups, Bedrock / GenAI proof-of-concept funding ($10K–$50K) for a defined build, and the competitive Generative AI Accelerator (up to $1M). These pools are largely partner-filed and invisible on the public Activate page. This is exactly what CloudRoute does: we route you to a vetted AWS partner who handles the setup with you and files the credit application — and because AWS funds both the credits and the partner engagement, you pay $0. See AWS credits for generative-AI startups, AWS PoC / Bedrock POC funding, and $100K AWS credits.

the decision in one table

`bedrock` vs `bedrock-runtime` vs `bedrock-agent` vs `bedrock-agent-runtime`

The whole page distilled: four clients, what each is for, and the single signal that tells them apart. Keep this open next to your first integration and you will pick the right client every time. The reliable tell is the name — a `-runtime` suffix means "execute," its absence means "configure."

Question you are answering	Use this client	Plane	Signature operations	IAM action family
"Manage models, jobs, throughput, logging"	bedrock	Control	ListFoundationModels, CreateModelCustomizationJob, CreateProvisionedModelThroughput	bedrock:Create* / List* / Put*
"Send a prompt, get a completion"	bedrock-runtime	Data	InvokeModel, Converse, ConverseStream	bedrock:InvokeModel / Converse
"Create / configure agents, KBs, data sources"	bedrock-agent	Control	CreateAgent, CreateKnowledgeBase, StartIngestionJob	bedrock:CreateAgent / CreateKnowledgeBase
"Ask an agent to act, or query my RAG KB"	bedrock-agent-runtime	Data	InvokeAgent, Retrieve, RetrieveAndGenerate	bedrock:InvokeAgent / Retrieve

Mnemonic: the suffix is the signal — names ending in -runtime are the hot path (run it); names without it are management (set it up). Inference and agent/RAG invocation are never on the bare bedrock / bedrock-agent clients. Production app roles need only the two data-plane rows, scoped to ARNs.

wiring up Bedrock?

Get AWS credits to fund your Bedrock inference — and a vetted partner to wire the clients, IAM, and logging. You pay $0.

Get matched in 24h →

a recent match

A Bedrock runtime integration, funded by AWS credits — anonymized

inquiry · seed-stage vertical-SaaS team, US

Seed-stage vertical B2B SaaS, 9 people, adding an AI assistant + RAG to an existing product; small platform team; net-new to Bedrock

Situation: The team had built a quick proof of concept but kept hitting the control/data-plane wall in production: their service role could enable models in the console but threw AccessDenied on invoke, streaming broke the moment they turned it on, and their RAG calls failed because they were pointing the wrong client at the Knowledge Base. They also had no cost controls and no idea how big the inference bill would get once the assistant shipped to all customers — and no budget to absorb it during the ramp.

What CloudRoute did: Routed within 22 hours to a US AWS partner with a GenAI + platform-engineering track record. The partner separated the clients cleanly (bedrock-runtime for inference on the app role, bedrock-agent-runtime for RAG, bedrock control-plane on a separate IaC deploy role), wrote least-privilege IAM scoped to the exact model and Knowledge-Base ARNs (including the separate streaming action), enabled model-invocation logging and VPC endpoints, and added the cost levers — small-model routing for cheap calls, prompt caching for the repeated system prompt, and batch for the offline enrichment job. In parallel they filed a Bedrock/GenAI proof-of-concept credit application and an Activate Portfolio application.

Outcome: GenAI POC credits ($25K) approved in under 2 weeks; Portfolio ($100K) shortly after — the assistant's first months of inference were fully credit-funded. The AccessDenied and wrong-client errors disappeared once the plane/permission/ARN/Region alignment was fixed, streaming worked end to end, and the team shipped to all customers with a bill they could forecast. CloudRoute's commission was paid by the partner from AWS engagement funding; the customer paid $0.

time-to-match: < 24h · credits secured: $125K · clients separated + IAM scoped · cost to customer: $0

faq

Common questions

What is the difference between `bedrock` and `bedrock-runtime`?

`bedrock` is the control-plane (management) API: list and request model access, and create/manage Knowledge Bases, Guardrails, Agents, custom-model and fine-tuning jobs, batch jobs, provisioned throughput, evaluation, and invocation logging. `bedrock-runtime` is the data-plane (inference) API: InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream — the operations that actually send a prompt to a model and return a completion. Rule of thumb: configuring the platform → bedrock; running inference → bedrock-runtime. Inference operations do not exist on the bedrock client.

Which Bedrock SDK client do I use to call a model?

Use the bedrock-runtime client. In Python that is boto3.client("bedrock-runtime"); in the JS SDK v3 it is BedrockRuntimeClient from @aws-sdk/client-bedrock-runtime; in Java it is BedrockRuntimeClient. Call converse() / Converse for chat (one schema across all chat models, switch by changing modelId) or invoke_model() / InvokeModel for embeddings, image, and video models or provider-specific parameters. The plain bedrock client cannot run inference.

How many Bedrock clients are there, and what are they?

Four, in two pairs. Foundation models: bedrock (control plane — manage models/jobs/throughput) and bedrock-runtime (data plane — InvokeModel/Converse). Agents & Knowledge Bases: bedrock-agent (control plane — create/manage agents, KBs, data sources, ingestion, Flows) and bedrock-agent-runtime (data plane — InvokeAgent, Retrieve, RetrieveAndGenerate). A name ending in -runtime is the request-time client; a name without it is the management client.

Why am I getting "object has no attribute 'invoke_model'" / "'converse'"?

You built a bedrock (control-plane) client and tried to run inference on it. The invoke and converse operations live only on the bedrock-runtime client. Construct boto3.client("bedrock-runtime") (or your SDK's equivalent) and call invoke_model / converse there. The reverse error — list_foundation_models missing — means you tried a management call on the runtime client; use the bedrock client for that.

I enabled the model but still get AccessDenied on invoke. Why?

Model access and IAM are two independent gates. Enabling a model under Model access (a control-plane step) does not grant your principal permission to invoke it. The calling role also needs the runtime IAM action — bedrock:InvokeModel (and bedrock:InvokeModelWithResponseStream if you stream, which is a separate action) — ideally scoped to the model ARN. You must satisfy both gates, in the correct Region, for the call to succeed.

Which IAM permissions does a production app actually need?

Usually only data-plane (runtime) actions, scoped to the specific ARNs it uses: bedrock:InvokeModel / bedrock:Converse (and the …Stream variants if you stream) on the model ARNs, and bedrock:InvokeAgent / bedrock:Retrieve / bedrock:RetrieveAndGenerate on the agent/KB ARNs. It should not hold control actions (CreateKnowledgeBase, CreateGuardrail, CreateModelCustomizationJob, etc.) — keep those on a separate admin or infrastructure-as-code deployment role off the request path.

Where do InvokeAgent and Knowledge-Base queries live — runtime or not?

On bedrock-agent-runtime. Building or configuring an agent or Knowledge Base (CreateAgent, CreateKnowledgeBase, CreateDataSource, StartIngestionJob) uses the bedrock-agent control-plane client. Invoking them at request time — InvokeAgent to run an agent, Retrieve to fetch passages, RetrieveAndGenerate to get a grounded answer with citations — uses the bedrock-agent-runtime data-plane client. Mixing these two up is a common source of "operation not found" errors.

How do I keep Bedrock runtime costs under control?

All token spend flows through the data plane, so the levers act there: route cheap, high-volume calls to small models and escalate only hard steps to frontier models; run latency-tolerant work as batch (~50% cheaper); enable prompt caching so repeated system prompts/documents are not re-billed each call; and reserve provisioned throughput only at high steady volume. On top of that, AWS credits can fund the bill outright — Activate Portfolio (up to $100K), Bedrock/GenAI POC ($10K–$50K), and the GenAI Accelerator (up to $1M); CloudRoute routes you to a partner who files them and helps wire the runtime, and you pay $0.

Wire up Bedrock right — and let AWS credits pay for the inference.

CloudRoute routes you to a vetted AWS partner who sets up the runtime correctly (clients, least-privilege IAM, model access, logging, cost controls) and files your Bedrock/GenAI credit application (Activate Portfolio up to $100K, GenAI POC $10K–$50K, GenAI Accelerator up to $1M). AWS funds the credits and the engagement. You pay $0.

Get matched in 24h →→ see the data & AI persona detail

matched within< 24h

GenAI credit ceilingup to $1M

cost to you$0

Amazon Bedrock Runtime — `bedrock` vs `bedrock-runtime`, explained.

ITwo APIs, one service — control plane vs data plane

IIThe four clients — and which one to reach for

bedrock — foundation-model control plane

bedrock-runtime — foundation-model data plane

bedrock-agent — agents & Knowledge Bases control plane

bedrock-agent-runtime — agents & Knowledge Bases data plane

IIIThe bedrock-runtime call surface — InvokeModel vs Converse, and streaming

Converse / ConverseStream — one schema for every chat model

InvokeModel / InvokeModelWithResponseStream — the lower-level path

Streaming — why it is a separate operation

ApplyGuardrail — runtime safety without a model call

IVWorked examples — the two clients side by side

Control plane — list available models (bedrock)

Data plane — run inference with Converse (bedrock-runtime)

Agent data plane — RetrieveAndGenerate over a Knowledge Base (bedrock-agent-runtime)

VIAM implications — control actions vs invoke actions

VIThe errors this confusion causes — and how to read them

VIIThe same split across every SDK

VIIIWhere the runtime fits — and who pays for the inference

`bedrock` vs `bedrock-runtime` vs `bedrock-agent` vs `bedrock-agent-runtime`

A Bedrock runtime integration, funded by AWS credits — anonymized

Common questions

Wire up Bedrock right — and let AWS credits pay for the inference.

Related guides