The single most common Bedrock mistake is calling the wrong API. There are two: the `bedrock` control-plane API (manage models, knowledge bases, guardrails, agents, and jobs) and the `bedrock-runtime` data-plane API (InvokeModel and Converse — where inference actually happens). This is the complete reference: what each plane does, which SDK client to use for what (`bedrock`, `bedrock-runtime`, `bedrock-agent`, `bedrock-agent-runtime`), the IAM split between control actions and invoke actions, streaming, worked code, and the gotchas that send "AccessDenied" and "client has no attribute" errors your way.
Almost every AWS service splits its API into a control plane (you manage and configure resources) and a data plane (you use them at high volume). Bedrock follows the same pattern, and confusing the two is the first wall most new builders hit. Get this one distinction right and most of the rest of Bedrock falls into place.
The control plane is the management API. It is where you set the platform up and change its configuration: requesting access to foundation models, creating and updating Knowledge Bases, Guardrails, Agents, custom (fine-tuned) models, evaluation jobs, batch-inference jobs, provisioned-throughput model units, and invocation-logging settings. These operations are relatively infrequent, are usually done by an administrator or by infrastructure-as-code, and they create or describe resources. In the AWS SDK this surface is the service simply named bedrock (for example, boto3.client("bedrock")).
The data plane is the runtime API. It is where the actual work happens at request time: you send a prompt to a model and get a completion back. These operations are high-frequency, latency-sensitive, and run on the hot path of your application — every chat message, every RAG answer, every agent step is a data-plane call. In the AWS SDK this surface is the service named bedrock-runtime (for example, boto3.client("bedrock-runtime")). The word "runtime" in the name is the whole point: this is the API you call at runtime to run inference.
Why does AWS separate them at all? Three reasons. Scale and latency: the data plane is engineered for very high request rates and low latency; the control plane is not, and mixing them would compromise both. Security: separating them lets you grant a production application permission to invoke models without giving it permission to reconfigure the platform — least privilege by design (section V). Blast radius: a misbehaving app hammering the data plane cannot accidentally delete a Knowledge Base, because deleting a Knowledge Base is a different API on a different endpoint with different permissions.
A useful mental model: the control plane is the thermostat and wiring — you set it up once and adjust it occasionally; the data plane is the electricity flowing through the wires — it runs constantly while the app is live. You install the wiring with bedrock; you draw current with bedrock-runtime.
If you are sending a prompt and getting a completion, use bedrock-runtime (InvokeModel / Converse). If you are creating, listing, or configuring models, knowledge bases, guardrails, agents, or jobs, use bedrock. Inference is never on the bedrock client; management is never on the bedrock-runtime client.
Bedrock does not have two clients; it has four. They form two clean pairs: one pair for foundation models (manage / invoke) and one pair for the higher-level agent and knowledge-base capabilities (manage / invoke). Once you see the pairing, choosing the right client becomes mechanical.
The naming is consistent and worth memorizing because it tells you exactly what a client is for. A bare service name (bedrock, bedrock-agent) is a control-plane / build-time client — it manages resources. A name ending in -runtime (bedrock-runtime, bedrock-agent-runtime) is a data-plane / request-time client — it executes them. So the suffix -runtime is your reliable signal: present means "run it," absent means "configure it."
The management API for foundation models and the platform around them. Use it to list available foundation models (ListFoundationModels), manage model access, create and manage custom-model / fine-tuning jobs, batch (model-invocation) jobs, provisioned throughput, model-evaluation jobs, and invocation logging configuration. You do not run inference here.
The inference API. This is where InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream live — the operations that send a prompt to a model and return tokens. The overwhelming majority of your Bedrock API calls in production go here. If you are building a chat feature, a summarizer, an extraction pipeline, or an embeddings job, this is your client.
The management API for the higher-level capabilities. Use it to create and manage Agents (instructions, action groups, aliases, versions), create and manage Knowledge Bases and their data sources, kick off ingestion jobs (chunk + embed your documents), and manage Flows and Prompt Management resources. This is build-time configuration for agents and RAG — not the request-time path.
The request-time API for agents and RAG. Use it to invoke an agent (InvokeAgent), to retrieve passages from a Knowledge Base (Retrieve), and to retrieve-and-generate a grounded answer with citations in one call (RetrieveAndGenerate). When your application asks an agent to do something, or queries your RAG knowledge base for an answer, it calls this client — not bedrock-agent.
| SDK client / service | Plane | Scope | Representative operations | You call it… |
|---|---|---|---|---|
| bedrock | Control | Foundation models + platform | ListFoundationModels, CreateModelCustomizationJob, CreateModelInvocationJob (batch), CreateProvisionedModelThroughput, CreateEvaluationJob, PutModelInvocationLoggingConfiguration | At setup / admin time |
| bedrock-runtime | Data | Foundation-model inference | InvokeModel, InvokeModelWithResponseStream, Converse, ConverseStream, ApplyGuardrail | On every request (hot path) |
| bedrock-agent | Control | Agents, Knowledge Bases, Flows | CreateAgent, CreateAgentAlias, CreateKnowledgeBase, CreateDataSource, StartIngestionJob, CreateFlow | At build / config time |
| bedrock-agent-runtime | Data | Agent + RAG invocation | InvokeAgent, Retrieve, RetrieveAndGenerate, InvokeFlow | On every agent / RAG request |
Within bedrock-runtime there are really two ways to call a model and two ways to receive the response. Knowing which to pick removes most of the friction of your first integration.
The two ways to call a model are Converse (the modern, recommended path) and InvokeModel (the original, lower-level path). Both live on the same bedrock-runtime client; they differ in how much the API does for you.
The Converse operation gives you one consistent request and response shape across every chat-capable model on Bedrock. You pass a modelId, a list of messages (each with a role and content), an optional system prompt, an inferenceConfig (max tokens, temperature, top-p), and optional toolConfig for function/tool use. The response comes back in the same structure no matter which provider served it. The practical payoff: switching models is usually a one-line change to the modelId string — no rewriting of provider-specific request bodies. Use Converse for chat, multi-turn conversation, and agentic tool use; it should be your default. ConverseStream is the same operation with the response delivered incrementally as tokens are generated.
The InvokeModel operation sends a raw request body whose JSON shape is specific to each provider, and returns a provider-specific response body that you parse yourself. It gives maximum control and exposes provider-specific parameters that Converse may not surface — but it means provider-specific code, so changing models is more than a one-line edit. Reach for InvokeModel when you need a non-conversational modality (image generation, video, or text embeddings) or a provider-specific knob Converse does not expose. InvokeModelWithResponseStream is the streaming counterpart.
Streaming is not a flag on the standard call; it is a distinct operation (InvokeModelWithResponseStream / ConverseStream) that returns an event stream you iterate over as chunks arrive, rather than a single response object. You use it whenever you want tokens to appear in a UI as they are generated (the typing-indicator effect) instead of waiting for the whole completion. Note that the IAM permission for streaming is also separate — bedrock:InvokeModelWithResponseStream is a different action from bedrock:InvokeModel — which is a frequent cause of "it works until I turn on streaming, then AccessDenied" (section V).
One more data-plane operation worth knowing: ApplyGuardrail runs a Guardrail (a safety/policy filter you created on the control plane) against arbitrary text on its own, without invoking a model. It lets you screen input before it reaches a model, or screen output from any source, and is how you apply Bedrock's content/PII filtering to text that did not necessarily come from a Bedrock model call.
Default to Converse for any chat or tool-using app — one schema, swap models by changing modelId. Use InvokeModel for embeddings, image, or video models, or a provider-specific parameter Converse does not expose. Add the …Stream variant (and the matching IAM action) when you want token-by-token output in the UI.
Seeing the same task expressed against each client makes the split concrete. Below: a control-plane call (list models), a data-plane Converse call (run inference), and an agent-runtime call (query a Knowledge Base). All three are illustrative — copy exact model IDs, ARNs, and parameter shapes from the Bedrock console and the current SDK reference.
The pattern to notice across all three: the client you construct determines what you are allowed to do. You cannot call converse() on a bedrock client, and you cannot call list_foundation_models() on a bedrock-runtime client — the methods simply do not exist on the wrong client, which is exactly the "object has no attribute" error new users hit.
A management call: enumerate the foundation models your account can see. This is a bedrock (control-plane) operation — there is no inference here.
The hot-path call: send a prompt, get a completion. Note the client name — bedrock-runtime — and that modelId is the only thing you change to switch models.
A RAG call: ask a question and get an answer grounded in your Knowledge Base, with citations, in one operation. This is a bedrock-agent-runtime call — the Knowledge Base itself was created earlier with the bedrock-agent control-plane client.
# 1) CONTROL PLANE — manage
bedrock = boto3.client("bedrock", region_name="us-east-1")
models = bedrock.list_foundation_models()
# 2) DATA PLANE — run inference
brt = boto3.client("bedrock-runtime", region_name="us-east-1")
resp = brt.converse(
modelId="anthropic.claude-sonnet", # swap to switch models
messages=[{"role":"user","content":[{"text":"Summarize our refund policy."}]}],
inferenceConfig={"maxTokens":512,"temperature":0.2},
)
print(resp["output"]["message"]["content"][0]["text"])
# 3) AGENT DATA PLANE — grounded RAG answer
bar = boto3.client("bedrock-agent-runtime", region_name="us-east-1")
ans = bar.retrieve_and_generate(
input={"text":"What is our SLA for enterprise plans?"},
retrieveAndGenerateConfiguration={"type":"KNOWLEDGE_BASE", "knowledgeBaseConfiguration":{"knowledgeBaseId":"KB123", "modelArn":"..."}},
)
Model IDs, ARNs, and exact parameter shapes are illustrative — confirm current values in the console and SDK reference.
The control/data split is not just an SDK convenience; it is a security boundary you should deliberately use. The two planes map to two distinct families of IAM actions, and a well-designed setup grants each role only the family it needs.
All Bedrock permissions sit under the bedrock: IAM namespace, but they fall into two functional groups. Control actions create, read, update, or delete resources — examples include bedrock:CreateKnowledgeBase, bedrock:CreateGuardrail, bedrock:CreateAgent, bedrock:CreateModelCustomizationJob, bedrock:CreateModelInvocationJob (batch), bedrock:PutModelInvocationLoggingConfiguration, and the various List* / Get* describe actions. Runtime (invoke) actions execute models and resources — bedrock:InvokeModel, bedrock:InvokeModelWithResponseStream, bedrock:Converse, bedrock:ConverseStream, bedrock:InvokeAgent, bedrock:Retrieve, and bedrock:RetrieveAndGenerate.
The design rule that follows: a production application role should usually hold only the runtime actions, scoped down to the specific model and resource ARNs it uses. Your service that answers chat messages needs bedrock:Converse on a couple of model ARNs — it does not need, and should not have, the ability to create a Knowledge Base or delete a Guardrail. The control actions belong to a separate setup/admin role (or your infrastructure-as-code deployment role), which provisions the resources once and is not attached to the request path. This is textbook least privilege, and the API split makes it easy to express.
Three IAM subtleties trip people up repeatedly. First, streaming is a separate action: granting bedrock:InvokeModel does not grant bedrock:InvokeModelWithResponseStream, so an app that works on standard calls can throw AccessDenied the moment you enable streaming. Second, model access is a prerequisite to invocation, not a substitute for IAM: even after you enable a model under Model access (a control-plane step), the calling principal still needs the runtime IAM permission to invoke it — two independent gates. Third, you can and should scope invoke permissions to model ARNs (and agent/knowledge-base ARNs) in the policy Resource field, so a role can call exactly the models it is meant to and nothing else.
Finally, the control/data planes share the same auditing and networking story. Every call on either plane is recorded in AWS CloudTrail; both planes are reachable privately over VPC endpoints (PrivateLink) so traffic need not touch the public internet; and you can capture full request/response payloads from the data plane with model-invocation logging (a control-plane setting that governs the data plane). The permission split is the lever; CloudTrail and invocation logging are how you verify it is working.
| Action family | Plane | Example actions | Typical holder | Scope to ARNs? |
|---|---|---|---|---|
| Manage models / platform | Control | bedrock:ListFoundationModels, bedrock:CreateModelCustomizationJob, bedrock:CreateProvisionedModelThroughput, bedrock:PutModelInvocationLoggingConfiguration | Admin / IaC deploy role | Where supported |
| Manage agents / KBs | Control | bedrock:CreateAgent, bedrock:CreateKnowledgeBase, bedrock:CreateDataSource, bedrock:StartIngestionJob, bedrock:CreateGuardrail | Admin / IaC deploy role | Where supported |
| Invoke models | Data | bedrock:InvokeModel, bedrock:InvokeModelWithResponseStream, bedrock:Converse, bedrock:ConverseStream | Production app role | Yes — to model ARNs |
| Invoke agents / RAG | Data | bedrock:InvokeAgent, bedrock:Retrieve, bedrock:RetrieveAndGenerate | Production app role | Yes — to agent / KB ARNs |
Because the control/data split is invisible until you hit it, it produces a recognizable family of errors. Each one points back to the same root cause: an operation, permission, or resource on the wrong plane.
A simple debugging heuristic captures most of these: when something fails, ask "which plane is this operation on, and does this principal have the matching permission on the matching ARN in the matching Region?" Nearly every Bedrock-runtime error resolves to a mismatch in one of those four — plane, permission, ARN, or Region.
The control/data separation is not a boto3 quirk — it is how the Bedrock API is defined, so every AWS SDK exposes the same four services. Only the surface syntax changes.
In Python (boto3) you select the service by string: boto3.client("bedrock"), "bedrock-runtime", "bedrock-agent", "bedrock-agent-runtime", and methods are snake_case (invoke_model, converse, retrieve_and_generate). In the AWS SDK for JavaScript v3 each service is a separate package and client class — @aws-sdk/client-bedrock (BedrockClient), @aws-sdk/client-bedrock-runtime (BedrockRuntimeClient), @aws-sdk/client-bedrock-agent, and @aws-sdk/client-bedrock-agent-runtime — and you send command objects (InvokeModelCommand, ConverseCommand, RetrieveAndGenerateCommand). Java exposes BedrockClient vs BedrockRuntimeClient (and the agent equivalents) as distinct client classes; the AWS CLI mirrors it as aws bedrock … vs aws bedrock-runtime … vs aws bedrock-agent … vs aws bedrock-agent-runtime ….
The takeaway is portable: whatever language you are in, you will construct a separate client per service, and the same four-way split — manage foundation models, invoke foundation models, manage agents/KBs, invoke agents/KBs — applies identically. If you internalize the planes once, the knowledge transfers to every SDK, the CLI, and Terraform/CloudFormation (where control-plane resources are what you declare).
Manage models: @aws-sdk/client-bedrock → BedrockClient
Run inference: @aws-sdk/client-bedrock-runtime → BedrockRuntimeClient (InvokeModelCommand, ConverseCommand, …)
Manage agents / KBs: @aws-sdk/client-bedrock-agent → BedrockAgentClient
Invoke agents / RAG: @aws-sdk/client-bedrock-agent-runtime → BedrockAgentRuntimeClient (InvokeAgentCommand, RetrieveAndGenerateCommand, …)
The bedrock-runtime API is the engine of every Bedrock application; everything else is configuration around it. Understanding that makes the cost picture — and how to fund it — straightforward.
Because all inference flows through the data plane, that is where essentially all of your token spend is generated. Control-plane operations (creating a Knowledge Base, listing models) are free or trivial; the bill comes from InvokeModel / Converse / InvokeAgent calls multiplied by your traffic. The cost levers therefore all act on the data plane: route cheap, high-volume calls to small models and escalate only hard steps to frontier models; run latency-tolerant work as batch inference (~50% cheaper); turn on prompt caching so a repeated system prompt or document is not re-billed every request; and reserve provisioned throughput only once volume is high and steady. The full model is in Amazon Bedrock pricing; the platform overview is Amazon Bedrock and the plain-English primer is what is Amazon Bedrock.
The architecture is also worth saying out loud because it explains the IAM and cost story at once: agents and RAG sit on top of the foundation-model runtime. An InvokeAgent call internally drives multiple model invocations (the orchestration loop), and a RetrieveAndGenerate call does a retrieval plus a generation — so the higher-level data-plane operations are token-heavy and inherit the same cost levers. See Amazon Bedrock Agents, Knowledge Bases, and RAG on AWS for those layers.
Which brings us to the part that is genuinely hard to do well alone: standing up the runtime correctly — the right clients, least-privilege IAM, model access, logging, VPC endpoints, and the cost controls — and then funding the inference bill. AWS runs credit programs built precisely for GenAI builds: Activate Portfolio (up to $100K) for institutionally-funded startups, Bedrock / GenAI proof-of-concept funding ($10K–$50K) for a defined build, and the competitive Generative AI Accelerator (up to $1M). These pools are largely partner-filed and invisible on the public Activate page. This is exactly what CloudRoute does: we route you to a vetted AWS partner who handles the setup with you and files the credit application — and because AWS funds both the credits and the partner engagement, you pay $0. See AWS credits for generative-AI startups, AWS PoC / Bedrock POC funding, and $100K AWS credits.
The whole page distilled: four clients, what each is for, and the single signal that tells them apart. Keep this open next to your first integration and you will pick the right client every time. The reliable tell is the name — a `-runtime` suffix means "execute," its absence means "configure."
| Question you are answering | Use this client | Plane | Signature operations | IAM action family |
|---|---|---|---|---|
| "Manage models, jobs, throughput, logging" | bedrock | Control | ListFoundationModels, CreateModelCustomizationJob, CreateProvisionedModelThroughput | bedrock:Create* / List* / Put* |
| "Send a prompt, get a completion" | bedrock-runtime | Data | InvokeModel, Converse, ConverseStream | bedrock:InvokeModel / Converse |
| "Create / configure agents, KBs, data sources" | bedrock-agent | Control | CreateAgent, CreateKnowledgeBase, StartIngestionJob | bedrock:CreateAgent / CreateKnowledgeBase |
| "Ask an agent to act, or query my RAG KB" | bedrock-agent-runtime | Data | InvokeAgent, Retrieve, RetrieveAndGenerate | bedrock:InvokeAgent / Retrieve |
Situation: The team had built a quick proof of concept but kept hitting the control/data-plane wall in production: their service role could enable models in the console but threw AccessDenied on invoke, streaming broke the moment they turned it on, and their RAG calls failed because they were pointing the wrong client at the Knowledge Base. They also had no cost controls and no idea how big the inference bill would get once the assistant shipped to all customers — and no budget to absorb it during the ramp.
What CloudRoute did: Routed within 22 hours to a US AWS partner with a GenAI + platform-engineering track record. The partner separated the clients cleanly (bedrock-runtime for inference on the app role, bedrock-agent-runtime for RAG, bedrock control-plane on a separate IaC deploy role), wrote least-privilege IAM scoped to the exact model and Knowledge-Base ARNs (including the separate streaming action), enabled model-invocation logging and VPC endpoints, and added the cost levers — small-model routing for cheap calls, prompt caching for the repeated system prompt, and batch for the offline enrichment job. In parallel they filed a Bedrock/GenAI proof-of-concept credit application and an Activate Portfolio application.
Outcome: GenAI POC credits ($25K) approved in under 2 weeks; Portfolio ($100K) shortly after — the assistant's first months of inference were fully credit-funded. The AccessDenied and wrong-client errors disappeared once the plane/permission/ARN/Region alignment was fixed, streaming worked end to end, and the team shipped to all customers with a bill they could forecast. CloudRoute's commission was paid by the partner from AWS engagement funding; the customer paid $0.
time-to-match: < 24h · credits secured: $125K · clients separated + IAM scoped · cost to customer: $0
CloudRoute routes you to a vetted AWS partner who sets up the runtime correctly (clients, least-privilege IAM, model access, logging, cost controls) and files your Bedrock/GenAI credit application (Activate Portfolio up to $100K, GenAI POC $10K–$50K, GenAI Accelerator up to $1M). AWS funds the credits and the engagement. You pay $0.