A complete, neutral reference for building autonomous agents on Amazon Bedrock: what an agent actually is (an LLM that plans and calls your tools), the architecture — action groups backed by Lambda, OpenAPI schemas, the orchestration/ReAct loop, and the prompt templates you can override — plus Knowledge Base RAG, memory and session state, return-of-control, the build → test (trace) → version → alias → deploy lifecycle, observability, cost, the production gotchas nobody warns you about, and when an Agent is the right tool versus Flows or your own orchestration.
A Bedrock Agent is the difference between a model that answers and a model that acts. A plain LLM call returns text. An agent takes a goal, decides which steps and tools are needed to achieve it, executes those steps against real systems, and returns a result — with Bedrock managing the loop in between.
Concretely, an Amazon Bedrock Agent is a managed capability that pairs a foundation model (Claude, Amazon Nova, Llama, etc.) with a set of tools and knowledge sources, plus an orchestration engine that lets the model use them autonomously. You give the agent a natural-language goal ("issue a refund for order 4471 and email the customer"), and the agent figures out the sequence: look up the order, check the refund policy in a knowledge base, call the refund API, then call the email API — observing each result and deciding the next move.
The key word is autonomous, multi-step. The agent is not following a fixed script you wrote. It is reasoning, at each turn, about what to do next given everything it has seen so far. That makes agents the right abstraction for tasks where the path is not known in advance and depends on intermediate results — the opposite of a deterministic pipeline.
What Bedrock manages for you is significant. You do not write the loop that prompts the model, parses its tool-call intent, invokes the function, feeds the result back, and re-prompts. Bedrock does all of that. You declare the pieces (model, instructions, tools, knowledge) and Bedrock runs the orchestration — the plan/act/observe cycle, often described as a ReAct (reason + act) loop. This is the central value proposition: agents move the undifferentiated control-flow plumbing into a managed service.
Agents are part of the broader Bedrock platform alongside Knowledge Bases (managed RAG), Guardrails (a safety/policy layer), Flows (a visual workflow builder for deterministic chains), and the Converse/InvokeModel APIs (raw model calls). An agent typically composes several of these — it is the orchestration layer that ties a model, tools, retrieval, memory, and guardrails into one callable unit you invoke with a single API call.
A raw model call (Converse/InvokeModel) answers a question in one shot. An agent takes a goal, then plans, calls your tools and knowledge bases, observes results, and loops until the task is complete — with Bedrock running the loop. Reach for an agent when the work is multi-step and the path depends on intermediate results.
An agent is assembled from a small number of well-defined parts. Understanding each one — and how the orchestration loop ties them together — is most of what you need to build agents that behave predictably.
At configuration time you define four things: the base model and instructions, one or more action groups, optional associated Knowledge Bases, and (optionally) overrides to the underlying prompt templates. The rest of this section walks through each, then describes the orchestration loop that runs them.
Every agent is backed by one foundation model and a block of instructions — natural-language text that tells the agent its role, what it should and should not do, its tone, and any business rules ("never issue a refund over $500 without escalating"). This is effectively the agent's system prompt and it is the single highest-leverage thing you write: clear, specific instructions are the difference between an agent that stays on task and one that improvises. Model choice matters too — stronger reasoning models follow multi-step plans and tool schemas more reliably, while smaller/faster models cut latency and cost for simpler agents.
An action group is a set of related actions the agent can take — the agent's "hands." Each action group has two halves: (1) a schema describing the available actions, their parameters, and what they return; and (2) an executor that actually runs the action — most commonly an AWS Lambda function. When the model decides to call an action, Bedrock invokes your Lambda with the chosen parameters, gets the result, and feeds it back into the loop. An agent can have multiple action groups (e.g., one for orders, one for shipping, one for notifications), and the model picks the right action across all of them.
The executor does not have to be Lambda. With return-of-control (covered in §V), Bedrock can instead hand the requested action and its parameters back to your application to execute, which is useful when the logic lives outside AWS or must run in your own environment. Either way, the action group is how an agent reaches the real world — querying a database, hitting an internal API, triggering a workflow.
The model only knows an action exists, and how to call it, because of its schema. Bedrock accepts an OpenAPI schema (a standard JSON/YAML description of API operations, paths, parameters, and response shapes) or a simpler function-definition format for each action group. The descriptions in that schema are not boilerplate — the model reads them to decide which action to call and what to pass. Vague or missing descriptions are a leading cause of an agent calling the wrong tool or hallucinating a parameter; precise, well-described schemas are a core reliability lever, every bit as important as the instructions.
This is the engine. When you invoke the agent, Bedrock runs an orchestration loop: it sends the model the user input, the instructions, the available tools, and any conversation/session context. The model reasons about the goal and emits either a final answer or a request to call a tool or query a knowledge base (the "act"). Bedrock executes that action, captures the result (the "observe"), appends it to the working context, and re-prompts the model. The cycle repeats — reason, act, observe — until the model produces a final response. This pattern is commonly called ReAct. You do not implement it; Bedrock does. What you control is what the model can see and do at each step (instructions, tools, knowledge, prompts).
Under the hood, Bedrock uses a set of prompt templates for the distinct stages of the loop — pre-processing (validate/classify the input), orchestration (the main reason/act prompt), knowledge-base response generation, and post-processing (format the final answer). These default templates work out of the box. But Bedrock lets you override any of them, and for production agents this is a real lever: you can tighten how the agent is allowed to reason, inject domain context, change how tool results are summarized, or disable a stage entirely. Most teams start with defaults and override only the orchestration or pre-processing template once they hit a specific behavior they need to control.
| Component | What it is | You provide | Bedrock manages | Primary reliability lever |
|---|---|---|---|---|
| Base model + instructions | The reasoning engine + its system prompt | Model choice + instruction text | Model hosting + invocation | Clear, specific instructions |
| Action group | A set of callable actions (tools) | Lambda (or return-of-control) | Routing the call, feeding back results | Scoping actions narrowly |
| OpenAPI / function schema | The contract describing each action | Schema with rich descriptions | Surfacing tools to the model | Precise parameter + action descriptions |
| Knowledge Base (assoc.) | Managed RAG retrieval source | Data source + vector store | Chunking, embedding, retrieval | Good chunking + relevant corpus |
| Orchestration loop | The ReAct plan/act/observe engine | Nothing (it is managed) | The entire loop | N/A — controlled via the above |
| Prompt templates | Per-stage prompt scaffolding | Optional overrides | Defaults for each stage | Targeted overrides when needed |
| Guardrails (attached) | Safety / policy filter | A guardrail config | Applying it to in/out text | Tuned filters + denied topics |
Beyond tools, three capabilities turn a basic agent into a useful one: retrieval (so it can answer from your data), memory (so it remembers across turns and sessions), and a control hand-off (so your app can run sensitive actions itself).
You can associate one or more Knowledge Bases with an agent. A Bedrock Knowledge Base is managed retrieval-augmented generation: you point it at a data source (commonly documents in S3), and Bedrock chunks the content, embeds it with an embedding model, stores the vectors in a vector store (OpenSearch Serverless, Aurora/pgvector, and other supported options), and exposes retrieval. When associated with an agent, the orchestration loop can — at any step — query the knowledge base for relevant passages and ground its reasoning or its answer in them. This is how an agent answers "what is our refund policy for EU customers?" with your actual policy rather than a guess.
The division of labor: action groups are for doing (call an API, change state), knowledge bases are for knowing (retrieve facts). A typical agent uses both — retrieving policy from a KB, then taking action via a tool. See the amazon-bedrock-knowledge-bases sibling for the retrieval mechanics in depth, and rag-on-aws for the broader RAG architecture on AWS.
Within a single conversation, the agent maintains session state — the running context of that interaction (what the user said, what tools returned). You invoke the agent with a session identifier, and Bedrock keeps the turn-by-turn context tied to it, so the agent remembers what was discussed earlier in the same session and you can also pass session attributes (e.g., a logged-in customer ID) that persist for the conversation.
Bedrock Agents also support memory across sessions: the agent can retain a summary of prior conversations for a given user/memory ID so that a returning user does not start from scratch. This longer-term memory is a configurable feature with its own retention controls. Used well, it is the difference between an assistant that re-asks the same questions every time and one that remembers context — but it also has privacy and cost implications (you are storing and re-injecting user context), so it should be enabled deliberately, not by default.
Sometimes you do not want Bedrock to call a Lambda directly — the action might need to run inside your own backend, touch a system AWS cannot reach, or require a human approval step. Return-of-control handles this: instead of executing an action group via Lambda, Bedrock returns the chosen action and its parameters to your application in the InvokeAgent response. Your code runs the action however it likes, then sends the result back to the agent to continue the loop. This keeps the model's planning inside Bedrock while keeping execution — and sensitive logic, credentials, or approvals — under your control. It is the standard pattern for actions that must not be fully automated, or that live outside AWS.
Action groups let the agent do things (call tools / change state). Knowledge Bases let it know things (retrieve from your data). Memory + session state let it remember (within and across conversations). Return-of-control lets your app, not Bedrock, execute a chosen action. Most real agents use all four.
Bedrock gives agents a proper software lifecycle: you build and iterate against a draft, test with a step-by-step trace, snapshot a version, and route traffic to it through an alias — so deploying a new agent version is a pointer change, not a redeploy.
You create an agent in the Bedrock console, via the API/SDK, or with infrastructure-as-code (CloudFormation/CDK/Terraform — the right choice for anything production). You pick the model, write the instructions, define action groups (attaching each to a Lambda and an OpenAPI/function schema), associate any knowledge bases, attach a guardrail, and configure memory. Before an agent can be tested, you prepare it — Bedrock compiles your configuration into a working DRAFT version you can invoke immediately in a test window.
The agent trace is the feature you will live in while developing. When you invoke an agent with trace enabled, Bedrock returns a structured, step-by-step record of everything the orchestration did: the model's reasoning (its "rationale") at each step, which action it decided to call and with what parameters, what the Lambda or knowledge base returned, and how that fed the next step. This makes the otherwise-opaque loop fully inspectable. When an agent misbehaves — calls the wrong tool, loops, or answers without retrieving — the trace shows you exactly where and why, so you can fix the instruction, the schema description, or the KB rather than guessing. Test in the console for fast iteration, then script test cases against the API for regression coverage.
When the draft behaves, you cut a numbered version — an immutable snapshot of the entire agent configuration (model, instructions, action groups, prompts). Versions never change, which is what makes them safe to run in production. You then create an alias — a named, movable pointer (e.g., prod, staging) that points at a specific version. Your application always calls the alias, never a raw version number.
This indirection is what makes deployment clean. To ship a new agent, you prepare a new draft, cut version N+1, test it (often behind a staging alias), and then repoint the prod alias from version N to N+1. The application code does not change; traffic moves the instant the alias moves, and rolling back is just repointing the alias to the previous version. Aliases also carry the provisioned throughput configuration if you reserve capacity for the agent's model.
In production your application calls the InvokeAgent API (via the AWS SDK) with the agent ID, the alias ID, a session ID, and the user input; the response streams back the agent's output (and, if enabled, the trace). There is no server to manage — the agent is fully managed Bedrock infrastructure. The deployable unit is the (agent, alias) pair, and promoting a change is the alias repoint described above.
| Stage | What you do | Artifact | Used for |
|---|---|---|---|
| Build + prepare | Configure model, instructions, action groups, KBs; prepare | DRAFT version | Immediate testing/iteration |
| Test (trace) | Invoke with trace; inspect reasoning + tool calls | Trace output | Debugging behavior |
| Version | Snapshot the working config | Immutable version N | Stable, reproducible config |
| Alias | Point a named alias at a version | Alias (e.g. prod) | The thing your app calls |
| Deploy / roll back | Repoint the alias to a new/old version | Updated alias target | Zero-code-change release + rollback |
Once an agent is live, two questions dominate: can you see what it is doing, and what does it cost? Both have concrete answers on Bedrock.
The trace is your development microscope; in production you also want aggregate visibility. Bedrock integrates with Amazon CloudWatch for metrics (invocations, latency, errors) and supports model-invocation logging that captures request/response details to CloudWatch Logs or S3 for audit and debugging. Because every action group call is a Lambda invocation, you also get the full Lambda observability surface — CloudWatch logs and metrics, and AWS X-Ray tracing — for the tool side of the agent. For production agents, the standard setup is: enable model-invocation logging, capture the trace on a sampled or error-only basis (it is verbose and adds payload), alarm on Lambda errors and latency, and watch token consumption as a cost signal.
Bedrock does not charge a separate per-agent fee. An agent's cost is the sum of what it consumes: (1) foundation-model tokens for every step of the orchestration loop — this is usually the dominant cost; (2) Lambda invocations and duration for each action call; (3) the Knowledge Base costs (embedding the corpus, the vector store, and the retrieval/model tokens at query time); and (4) Guardrails, billed on the text evaluated. The thing to internalize is that agents are token-heavy: a single user request can trigger several model calls, each re-sending the instructions, tool schemas, and accumulated context. A multi-step agent task can therefore cost many times a single chat completion.
The cost levers follow directly: pick a model that is good enough rather than the largest (a smaller model in the loop multiplies its savings across every step); keep instructions and schemas tight so each prompt is smaller; use prompt caching for the large fixed parts of the prompt (instructions, tool definitions) that repeat on every step — this is especially impactful for agents; cap the number of orchestration steps where possible; and scope knowledge-base retrieval so you are not stuffing huge context into every call. Representative model token rates are on the amazon-bedrock-pricing sibling — and remember all of it is AWS-credit-eligible (see §VIII).
One agent task = many model calls (one per orchestration step), each re-sending the instructions, tool schemas, and accumulated results. That is the cost. Prompt caching on the fixed instruction/schema prefix and choosing the smallest model that does the job are the two biggest levers — both compound across every step of the loop.
Agents that demo well can struggle in production for predictable reasons. Here are the failure modes that bite teams most, and the mitigations.
For any production agent: (1) attach a Guardrail; (2) treat tool/KB content as untrusted (prompt-injection surface); (3) keep high-impact actions behind return-of-control or approval; (4) least-privilege every action Lambda's IAM role. An agent that can act on your systems is a security boundary, not just a feature.
Agents are powerful but not always the right abstraction. Bedrock offers a spectrum — from fully managed autonomy (Agents) to managed-but-deterministic (Flows) to roll-your-own — and matching the tool to the problem saves cost, latency, and debugging pain.
The deciding question is how much the path is known in advance. If the steps are fixed and deterministic ("extract fields → classify → route → respond"), you do not want a model re-deciding the path every time — that adds latency, cost, and non-determinism for no benefit. If the path genuinely depends on reasoning over intermediate results and a variable set of tools, an agent earns its keep. And if you need fine-grained control over the loop, custom logging, or a framework-specific pattern, hand-rolling the orchestration (with the Converse API plus your own code or a library) gives you total control at the cost of building and maintaining the plumbing Bedrock would otherwise manage.
The task is multi-step and the path is not fixed — it depends on intermediate results; the agent needs to choose among several tools/APIs dynamically; you want managed orchestration, memory, and KB integration without building the loop; and you can tolerate the latency/non-determinism of model-driven planning. Customer-support agents, operational copilots that query and act on internal systems, and research/triage assistants are classic fits.
The workflow is deterministic and known — a defined sequence or branching graph of steps (prompt → model → condition → another model → tool). Flows is a visual builder that chains prompts, models, knowledge bases, Lambdas, and conditions into an explicit graph you design. You get predictability, lower latency (no per-step re-planning), easier debugging, and lower cost. Choose Flows when you can draw the pipeline on a whiteboard. See the amazon-bedrock-flows sibling.
You need maximum control — bespoke control flow, deep integration with an existing framework, custom retry/caching/routing logic, or behavior the managed agent does not expose. You build the loop yourself on top of the Converse API (which supports tool use / function calling directly) and your own code or an orchestration library. You own everything, including the maintenance. This is the right call for teams with specific, non-standard requirements and the engineering capacity to support them — and many production stacks mix all three: Flows for the deterministic spine, an Agent for the open-ended sub-task, and custom code at the edges.
The three ways to orchestrate multi-step generative-AI work on AWS, side by side. The right choice is mostly a function of how deterministic the path is and how much control you need versus how much plumbing you want to own.
| Dimension | Bedrock Agents | Bedrock Flows | Custom orchestration |
|---|---|---|---|
| Best for | Open-ended, multi-step tasks; dynamic tool choice | Deterministic, known workflows | Bespoke control / framework integration |
| Path decided by | The model, at runtime (ReAct loop) | You, at design time (explicit graph) | Your code |
| Who runs the loop | Bedrock (managed) | Bedrock (managed) | You (Converse API + your code) |
| Predictability | Lower (non-deterministic) | High (deterministic) | Whatever you build |
| Latency | Higher (per-step re-planning) | Lower (no re-planning) | Depends on your design |
| Build effort | Low — declare components | Low–medium — design the graph | High — build + maintain plumbing |
| Control | Medium (prompts/templates) | Medium–high (explicit nodes) | Total |
Situation: The team wanted a customer-support agent that could do more than answer FAQs — it needed to look up a customer's subscription, check usage against their plan, issue plan changes and refunds within policy, and answer from their help-center docs. They had prototyped a single Claude call with a giant prompt, but it could not take actions, hallucinated policy, and they were nervous about it doing anything customer-facing without a safety layer. They also did not want to fund the inference out of a runway earmarked for hiring.
What CloudRoute did: CloudRoute matched them in under 24 hours to an EU-Central AWS partner with Bedrock agent experience. The partner built a Bedrock Agent: Claude as the base model with tight instructions; three action groups (billing, subscription, notifications) each backed by a least-privileged Lambda with an OpenAPI schema; an associated Knowledge Base over the help-center docs for grounded policy answers; refunds above a threshold routed via return-of-control to a human-approval step; and a Guardrail attached to block prompt-injection and PII leakage. They iterated against the trace, cut a version, and shipped behind a prod alias. The partner also filed a Bedrock POC credit application plus an Activate Portfolio application to fund it.
Outcome: The agent resolved a large share of tier-1 tickets end-to-end within policy, with the trace giving the team confidence in every step it took. Inference, Lambda, the knowledge base, and Guardrails were fully covered by the approved AWS credits, so the build and early production ran at $0 out of pocket. CloudRoute's commission was paid by the partner from AWS engagement funding, not by the customer.
agent: model + 3 action groups + KB + guardrail · high-risk actions behind approval · credits: POC + Activate · out-of-pocket: $0
Whatever your agent would cost to build and run on Bedrock, AWS credits can cover it. CloudRoute routes you to the right credit pool (Activate up to $100K, Bedrock POC $10K–$50K, GenAI Accelerator up to $1M) and a vetted AWS partner who builds the agent — action groups, knowledge bases, guardrails, the lot. Customer pays $0.