A practical, copy-paste quickstart for developers: the prerequisites, installing the AWS CLI and boto3, wiring up credentials and a least-privilege IAM policy, enabling model access in the console, then making your first Bedrock Converse API call — with full code for handling the response and switching to streaming. No GPUs to provision, no model-serving stack to run. Roughly fifteen minutes from an empty AWS account to a working completion.
Bedrock setup has a short list of prerequisites, and getting them straight up front saves the two most common stumbles: calling a Region where the model is not enabled, and using credentials that lack Bedrock permissions. None of this requires infrastructure — there are no GPUs, instances, or clusters to stand up.
At a high level you need five things, and the rest of this page walks through each in order. First, an AWS account with permission to manage IAM and Bedrock (your own account, or an admin who can grant you access). Second, the AWS CLI v2 installed locally, used to store credentials and sanity-check that they work. Third, a set of programmatic credentials — ideally a short-lived role via IAM Identity Center / SSO, or an IAM user with an access key for a pure quickstart — carrying a least-privilege Bedrock policy. Fourth, model access requested in the Bedrock console for the specific models you intend to call, in the Region you intend to call them; Bedrock ships with every model off by default. Fifth, a runtime — this guide uses Python with the AWS SDK, boto3, but the same flow maps directly to the JavaScript, Java, Go, and .NET SDKs.
A few decisions are worth making deliberately before you touch a keyboard. Choose a Region first. Frontier models frequently land in US Regions (us-east-1, us-west-2) before they appear in eu-central-1 or ap-southeast-1, and your prompts and completions are processed in the Region you call — so pick a Region that has the model you want and satisfies any data-residency requirement you have. Decide which model you are starting with. A sensible default for a quickstart is one cheap, fast chat model (for example a small Claude or Amazon Nova model) so your first iterations cost pennies; you can swap the model ID later without rewriting anything. Avoid root credentials. Never use your AWS account root user for day-to-day API calls — create an IAM identity scoped to exactly what you need, which the next sections cover.
If your very first call throws AccessDeniedException, it is almost always one of two things: (1) the IAM principal lacks bedrock:Converse / bedrock:InvokeModel, or (2) you have not requested model access for that model in that Region. Both are fixed in minutes — sections III and IV. A ValidationException about the model usually means a wrong or stale modelId; copy the exact current ID from the console.
Two installs get you ready: the AWS CLI v2 (to hold and verify credentials) and the boto3 SDK (to make the calls from Python). Both are a single command on every major platform.
Install the AWS CLI v2 using the method for your OS, then confirm the version. On macOS the simplest route is Homebrew; on Linux the bundled installer; on Windows the MSI from the AWS docs. The CLI is what reads and writes the shared credentials file that boto3 (and every other AWS SDK) picks up automatically, so installing it now means you do not hard-code keys in your code later.
Then create an isolated Python environment and install boto3, the AWS SDK for Python. Bedrock is exposed through two clients in boto3: bedrock (the control plane — listing models, managing custom models, configuring features) and bedrock-runtime (the data plane — actually invoking models with Converse / InvokeModel). For making completions you want bedrock-runtime; this is the single most common point of confusion in setup, so internalize it now: you call models on the bedrock-runtime client, not the bedrock client.
Pick the install command for your platform, then verify it resolves. A successful aws --version printing aws-cli/2.x confirms the CLI is on your PATH and ready to configure.
Keeping boto3 in a per-project virtual environment avoids version clashes and makes the project reproducible. boto3 pulls in botocore, which carries the Bedrock service models, so a recent boto3 is all you need for the Converse API.
# macOS (Homebrew)
brew install awscli
# Linux (x86_64)
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o awscliv2.zip
unzip awscliv2.zip && sudo ./aws/install
# verify the CLI
aws --version # expect: aws-cli/2.x ...
# Python env + boto3
python3 -m venv .venv && source .venv/bin/activate
pip install --upgrade boto3
python -c "import boto3, botocore; print(boto3.__version__)"
Bedrock is a standard AWS service governed by IAM. Setup here is two parts: give boto3 a way to find credentials, and make sure those credentials carry exactly the Bedrock permissions you need and nothing more. Do this once and every SDK on the machine inherits it.
There are two good ways to provide credentials, and one you should avoid. The avoid-it option is pasting an access key into your source code — it leaks. The simplest quickstart option is an IAM user with an access key, stored via aws configure, which writes a profile into the shared credentials file at ~/.aws/credentials; boto3 reads it automatically with no code. The production-grade option is short-lived credentials from IAM Identity Center (SSO) via aws configure sso or an assumed role, so nothing long-lived sits on disk. For a first call either works; for anything beyond a sandbox, prefer SSO/roles.
However you authenticate, the principal needs a Bedrock policy. The least-privilege starting point grants only the runtime actions you will use — bedrock:InvokeModel, bedrock:InvokeModelWithResponseStream (for streaming), and bedrock:Converse / bedrock:ConverseStream — plus, optionally, the read-only control-plane action bedrock:ListFoundationModels so you can enumerate available models. You can scope Resource down to specific model ARNs to restrict exactly which models a service may call; the example below starts permissive on resource for a quickstart and notes where to tighten it.
Every Bedrock call is recorded in CloudTrail, and you can additionally enable Bedrock model-invocation logging to capture full request and response payloads to S3 or CloudWatch — useful for debugging and audit, and worth turning on early. After you set credentials, verify they resolve before involving Bedrock at all: aws sts get-caller-identity should print your account and principal ARN. If that fails, fix credentials first; it has nothing to do with Bedrock yet.
Create an IAM user, attach the least-privilege Bedrock policy below, generate an access key, then run aws configure and paste the key, secret, default Region, and output format. boto3 will pick up the default profile automatically.
Run aws configure sso, authenticate in the browser, and select the account and permission set that carries the Bedrock policy. This issues short-lived credentials that refresh automatically — no long-lived secret on disk. Pass the profile to boto3 with a named session or the AWS_PROFILE environment variable.
# Least-privilege Bedrock policy (JSON)
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"bedrock:Converse",
"bedrock:ConverseStream",
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": "*" # tighten to specific model ARNs in production
}]
}
# Configure + verify (Option A)
aws configure # paste key, secret, region (e.g. us-east-1), json
aws sts get-caller-identity # prints your account + principal ARN
This is the step most setup guides skip and most first calls trip over. Bedrock does not give you any model the moment you open the console — access is opt-in, requested per model and per Region. Until you enable a model, every call to it returns AccessDeniedException, regardless of your IAM policy.
The reason access is gated is governance: AWS wants every model your organization can call to be a deliberate, auditable choice, and several models carry provider end-user license terms you must accept first. To enable a model, open the Bedrock console, switch to the Region you plan to use (top-right Region selector — this matters, access is per-Region), and go to Model access in the left navigation. Select the specific models you intend to call — for a quickstart, one small chat model is enough — and submit the request. For most models access is granted within seconds to a couple of minutes; models that require accepting a EULA prompt you to do so before granting.
Two facts are worth burning in. First, access is per-Region: enabling Claude in us-east-1 does not enable it in eu-west-1: if you switch Regions later, request access again there. Second, availability varies by Region: a model you see in us-east-1 may not yet be offered in eu-central-1 or ap-southeast-1 at all. If a model you want is not listed in your chosen Region, either pick a Region that has it or use a cross-region inference profile (a related capability that lets one request be served from one of several Regions within a geography). You can confirm what is actually enabled for your account and Region from code with ListFoundationModels on the bedrock (control-plane) client.
Once the status for your chosen model reads Access granted, copy its exact model ID from the console — model IDs are specific strings (and some calls expect an inference-profile ID rather than a bare model ID). Using the precise current ID is what prevents the ValidationException that comes from a guessed or outdated identifier. With access granted and the ID in hand, you are ready to make a call.
With credentials resolving and model access granted, a working completion is about ten lines of Python. Use the Converse API, not the older InvokeModel — Converse gives you one consistent request and response schema across every chat model, so the same code works for Claude, Nova, Llama, Mistral, and Cohere with only the modelId changed.
You make completions on the bedrock-runtime client. The converse call takes three things you care about at the start: a modelId (the exact string you copied from the console), a messages list (each message has a role of user or assistant and a content list of blocks — a text block is {"text": "..."}), and an inferenceConfig (generation settings such as maxTokens and temperature). You can also pass a system prompt as a top-level argument to steer behavior. That uniform shape is the whole point of Converse: there is no provider-specific request body to assemble.
The example below is a complete, runnable first call. It creates the runtime client bound to your Region, sends one user message, and prints the model's text. The only line you ever change to switch models is modelId — everything else stays identical, which is what makes model evaluation and cost-driven model routing trivial later. Region and credentials are picked up from the environment / shared config you set in section III, so there are no secrets in the code.
To continue a conversation, append the model's reply (the output.message object) to your messages list and add the next user message, then call converse again. Bedrock is stateless between calls — you resend the running message history each turn, which is exactly the pattern that makes prompt caching worthwhile once histories grow long.
import boto3
brt = boto3.client("bedrock-runtime", region_name="us-east-1")
resp = brt.converse(
modelId="anthropic.claude-haiku", # paste the EXACT id from the console; swap to switch models
system=[{"text": "You are a concise assistant."}],
messages=[
{"role": "user", "content": [{"text": "Give me three uses for Amazon Bedrock."}]}
],
inferenceConfig={"maxTokens": 512, "temperature": 0.2},
)
print(resp["output"]["message"]["content"][0]["text"])
# The same call works for Nova, Llama, Mistral, Cohere — only modelId changes.
A robust setup reads more than just the text: it checks why the model stopped, tracks token usage for cost, switches to streaming for responsive UIs, and handles the handful of errors Bedrock actually throws. All of this is in the same Converse response shape.
The converse response is a structured object. The text lives at output.message.content[0].text. The stopReason tells you why generation ended — end_turn (the model finished), max_tokens (you hit your limit, the answer may be truncated, so raise maxTokens), or tool_use (the model wants to call a tool you defined). Crucially for cost, usage reports inputTokens, outputTokens, and totalTokens for the call — log these from day one, because they are how you reason about and forecast spend. A response that ends with max_tokens is the most common "why is my output cut off?" surprise in early setup.
For anything user-facing, switch to streaming so tokens render as they are generated instead of after a multi-second wait. Streaming is the same request via converse_stream; the response carries a stream of events, and you iterate it, pulling text from the contentBlockDelta events as they arrive. The request body is byte-for-byte the same as the non-streaming call — only the method name and the way you read the response change — so adding streaming to a working first call is a tiny diff. (The matching IAM action is bedrock:ConverseStream, already in the section III policy.)
Finally, handle the small set of errors Bedrock surfaces so failures are legible rather than cryptic. The four you will meet most are AccessDeniedException (missing IAM permission or model access not enabled — see sections III/IV), ValidationException (a bad request, most often a wrong or stale modelId), ThrottlingException (you exceeded the model's request/token rate — back off and retry, or move steady high volume to Provisioned Throughput), and ModelTimeoutException / ModelErrorException (a transient model-side issue — retry with backoff). Wrapping calls in a try/except and retrying throttles and timeouts with exponential backoff is the difference between a demo and something that survives real traffic.
# Streaming: same request, iterate the event stream
stream = brt.converse_stream(
modelId="anthropic.claude-haiku",
messages=[{"role": "user", "content": [{"text": "Explain Bedrock in 5 lines."}]}],
inferenceConfig={"maxTokens": 512},
)["stream"]
for event in stream:
if "contentBlockDelta" in event:
print(event["contentBlockDelta"]["delta"]["text"], end="", flush=True)
# Non-streaming: inspect stopReason + token usage for cost
print(resp["stopReason"]) # end_turn | max_tokens | tool_use
print(resp["usage"]) # {inputTokens, outputTokens, totalTokens}
A working Converse call is the foundation, not the finish line. The path from here splits into capability (what you build on top) and economics (what it costs and how to fund it). Both matter from the first week.
On capability, Bedrock turns raw inference into applications through managed features you configure rather than build. The most common next moves: add tool use (function calling) so the model can call your code, then graduate to Agents for multi-step task orchestration; point a Knowledge Base at documents in S3 for managed retrieval-augmented generation with citations; wrap the model in Guardrails to filter harmful content and redact PII; and use model evaluation to choose the right model on your own data instead of vendor benchmarks. Deep dives live at Bedrock Agents, Knowledge Bases, RAG on AWS, and Guardrails. For the full platform map, the flagship reference is Amazon Bedrock — the complete guide, with model-specific setup at Claude on Bedrock and access mechanics at how to access Amazon Bedrock.
On economics, Bedrock is cheap per call and expensive in aggregate. A quickstart costs pennies; a retrieval-augmented assistant that resends a large system prompt and retrieved context on every turn, serving thousands of users, can move into five or six figures a month faster than teams expect — especially if every call hits a frontier model. The levers are concrete: route cheap, high-volume calls to small models and escalate only hard steps to frontier models; run latency-tolerant work as batch (~50% cheaper); enable prompt caching so repeated context is not re-billed at full price every turn; and reserve Provisioned Throughput only once volume is high and steady. The API mechanics live at the Bedrock API reference and the numbers at the pricing breakdown in the main guide.
The other lever is funding the bill with AWS's money rather than your own. AWS runs credit programs built precisely for teams standing up generative AI on Bedrock — dedicated Bedrock / GenAI proof-of-concept funding ($10K–$50K) for a defined build, Activate Portfolio (up to $100K) for institutionally-funded startups, and the competitive Generative AI Accelerator (up to $1M) for AI-first companies. These pools are largely partner-filed and invisible on the public Activate page. This is exactly what CloudRoute does: we route you to a vetted AWS partner who files the credit application and, if you want hands, builds the Bedrock workload with you — and because AWS funds both the credits and the partner engagement, you pay $0. See AWS credits for generative-AI startups, AWS PoC / Bedrock POC funding, and $100K AWS credits.
| Phase | What you do | Tooling | Done when |
|---|---|---|---|
| Prereqs | Account, pick Region + starter model | AWS console | Region chosen, model decided |
| Install | AWS CLI v2 + boto3 | CLI, pip, venv | aws --version + import boto3 work |
| Credentials | IAM creds + least-privilege policy | aws configure / SSO, IAM | aws sts get-caller-identity prints your ARN |
| Model access | Request access per-model, per-Region | Bedrock console | Status reads "Access granted" |
| First call | converse() on bedrock-runtime | boto3 | You print a completion |
| Harden | Streaming, usage logging, error handling | boto3, CloudTrail | Streams, retries throttles, logs tokens |
| Scale | Routing, batch, caching, credits | Bedrock features + AWS credits | Cost controlled and/or credit-funded |
During setup the first real choice is which API to call. For a chat-style first call the answer is almost always Converse — it removes provider-specific request bodies and makes switching models a one-line change. This is the scannable rule.
| Dimension | Converse API (recommended) | ConverseStream | InvokeModel (low-level) |
|---|---|---|---|
| Request/response shape | One schema across all chat models | One schema, streamed | Provider-specific JSON per model |
| Switching models | Change modelId only | Change modelId only | Rewrite the body per provider |
| Multi-turn + system prompt | First-class | First-class | You assemble it manually |
| Tool use (function calling) | Built-in | Built-in | Provider-specific |
| Streaming output | No (use ConverseStream) | Yes | Via InvokeModelWithResponseStream |
| Best for at setup | Your first chat call + most apps | Responsive / user-facing UIs | Image & embedding models, edge params |
| IAM action | bedrock:Converse | bedrock:ConverseStream | bedrock:InvokeModel |
Situation: The team had a working Bedrock quickstart running on one engineer's laptop — a single Converse call against a small model — but no path from there to something they could ship. They needed proper IAM (not a personal access key), a Region decision, streaming for the IDE plugin, retrieval over a customer's repo, and a handle on cost before they pointed real traffic at it. With two backend engineers and no ML or AWS specialist, the gap between "first call works" and "production-ready" was the blocker, and they had no budget to burn on inference while they figured it out.
What CloudRoute did: Routed within 19 hours to a US-East AWS partner with a GenAI + developer-tools track record. The partner hardened the setup on Amazon Bedrock: a least-privilege IAM role scoped to specific model ARNs (replacing the laptop access key), ConverseStream wired into the IDE plugin, a Knowledge Base over the repo in S3 for grounded code review, Guardrails for secret/PII redaction, and model routing (a small model for triage, a frontier model only for the hard review steps) with prompt caching on the shared system prompt. In parallel the partner filed a Bedrock/GenAI proof-of-concept credit application and an Activate Portfolio application.
Outcome: GenAI POC credits ($25K) approved in under two weeks and Portfolio ($100K) shortly after — the first several months of inference were fully credit-funded while the product found traction. Streaming code-review assistant in private beta in 4 weeks, cost per review down roughly 60% from routing plus caching versus the all-frontier-model prototype. CloudRoute's commission was paid by the partner from AWS engagement funding; the customer paid $0.
time-to-match: < 24h · credits secured: $125K · cost/review: ~60% lower · cost to customer: $0
CloudRoute routes you to a vetted AWS partner who files your Bedrock/GenAI credit application (Bedrock/GenAI POC $10K–$50K, Activate Portfolio up to $100K, GenAI Accelerator up to $1M) and, if you need hands, builds the workload with you. AWS funds the credits and the engagement. You pay $0.