A complete, neutral reference for the safety and policy layer on Amazon Bedrock: what Guardrails do, every filter type — content filters (hate, insults, sexual, violence, misconduct, and prompt-attack), denied topics, word and profanity filters, sensitive-information/PII detection and redaction, and contextual grounding + relevance for hallucination control — how to configure and version them, how to apply one guardrail to any model (and to Agents) via the ApplyGuardrail API, how to test, the HIPAA/PCI/SOC 2 compliance angle, what it costs, the real limitations, and how AWS credits make it $0 to build.
Foundation models are powerful and, left alone, unconstrained — they will answer questions you do not want answered, repeat sensitive data, and occasionally make things up. Guardrails are the configurable policy layer that makes a model's behavior safe and predictable enough to put in front of real users.
An Amazon Bedrock Guardrail is a named, configurable set of safety and policy controls that Bedrock applies to model interactions. You define the policy — what content to block, which topics are off-limits, what words to filter, what sensitive data to detect and redact, and how strictly to ground answers in source material — and Bedrock evaluates both the user input (the prompt) and the model output (the response) against it. Anything that violates the policy is blocked (replaced with a message you configure) or, for sensitive data, redacted.
The critical architectural point is that a guardrail is independent of the model. It is not a property of Claude or Nova or Llama; it is a separate object you create once and then apply. That means the same guardrail enforces the same policy no matter which model a request goes to — and you can swap the underlying model without re-implementing your safety rules. It also means the guardrail is a single, auditable place where your AI policy lives, which is exactly what a compliance or security reviewer wants to see.
Guardrails address several distinct risks at once: harmful content (hate, violence, sexual, etc.), off-topic or out-of-scope use (a banking assistant being steered into giving medical advice), prompt-injection / jailbreak attempts, sensitive-data leakage (PII in either direction), and hallucination (answers not grounded in the provided source). Most production deployments need several of these, and a single guardrail can enforce all of them simultaneously.
A guardrail is also the safety boundary for higher-level Bedrock capabilities. You can attach one to a Bedrock Agent (so an autonomous agent that takes actions and reads untrusted documents stays in policy) and use it alongside Knowledge Bases (so retrieved content and generated answers are screened and grounded). See the amazon-bedrock-agents and amazon-bedrock-knowledge-bases siblings.
A Bedrock Guardrail is a model-independent policy object: define your safety rules once (content filters, denied topics, word filters, PII detection/redaction, grounding checks) and Bedrock enforces them on every input and output — for any model, for Agents, and even for models outside Bedrock via the ApplyGuardrail API.
A guardrail is built from several independent policy types, each targeting a different risk. You enable the ones you need and tune their strength. Understanding what each catches — and what it does not — is the core of configuring a guardrail well.
There are five main policy families, plus the prompt-attack control that lives within content filters. They compose: a single guardrail can run content filters, a denied-topics list, word filters, PII detection, and grounding checks all at once, on both the prompt and the response.
Content filters detect and block harmful content across categories — typically hate, insults, sexual, violence, and misconduct — plus a dedicated prompt-attack filter (covered below). For each category you set a strength (e.g., none / low / medium / high): higher strength blocks more aggressively but risks false positives on legitimate content, lower strength is more permissive. Filters apply independently to the input (catching a user trying to elicit harmful output) and the output (catching the model producing it). Tuning these thresholds to your domain is the main calibration task — a gaming community and a children's-education app want very different settings.
A specific content-filter category targets prompt attacks: attempts to jailbreak the model or inject instructions that override your system prompt ("ignore your instructions and…"). This applies to user input and is one of the most important controls for anything public-facing, and essential for Agents, which additionally read tool outputs and retrieved documents that may carry injected instructions. It is not a complete defense on its own — prompt injection is an evolving threat — but it raises the bar materially and should be on for any untrusted-input scenario.
A denied topic is a subject you define in natural language (a short name plus a description, optionally with example phrases) that the guardrail will refuse to engage with. This is how you keep an assistant on-mission: a banking chatbot can deny "investment advice" and "medical advice"; an internal HR bot can deny "legal advice." When a request or response touches a denied topic, the guardrail blocks it and returns your configured refusal message. Denied topics are about scope and policy, distinct from content filters' focus on harm — together they cover both "this is harmful" and "this is simply not something we do."
Word filters block specific terms. There are two flavors: a built-in profanity filter (a managed list Bedrock maintains) you can toggle on, and a custom word/phrase list you supply — competitor names, internal codenames, slurs specific to your domain, or any exact strings you never want surfaced. This is the bluntest, most deterministic control (it is literal matching, not model judgment), which makes it predictable and cheap but limited to terms you can enumerate. It complements the model-judgment filters rather than replacing them.
This policy detects sensitive information — personally identifiable information such as names, emails, phone numbers, addresses, Social Security / national-ID numbers, credit-card numbers, and more — in both input and output. For each type you choose an action: block the interaction entirely, or redact/mask the value (replacing it with a placeholder like {PHONE}) so the conversation can continue without exposing the data. You can also define custom patterns via regular expressions for organization-specific identifiers (account numbers, policy IDs, MRNs). This is the control most directly tied to privacy regulation — it prevents users from leaking PII into prompts and prevents the model from emitting it in responses.
The grounding policy is the hallucination control, and it is especially important for RAG. Contextual grounding checks whether the model's answer is actually supported by the source material provided (e.g., the passages a Knowledge Base retrieved); a low grounding score means the model is asserting things the source does not support. Relevance checks whether the answer is actually responsive to the user's question. You set a threshold for each; answers that fall below it can be blocked or flagged. This turns "the model made something up" from an invisible failure into an enforceable policy — you are no longer trusting the model to be grounded, you are measuring it and acting on the score.
| Filter type | What it catches | Applies to | Action | How it decides | Primary use |
|---|---|---|---|---|---|
| Content filters | Hate, insults, sexual, violence, misconduct | Input + output | Block | Model judgment (tunable strength) | Harmful content |
| Prompt-attack filter | Jailbreaks / prompt injection | Input | Block | Model judgment | Untrusted input, Agents |
| Denied topics | Out-of-scope subjects you define | Input + output | Block | Model judgment vs your descriptions | Keeping the assistant on-scope |
| Word / profanity filters | Exact terms + managed profanity list | Input + output | Block | Literal matching | Banned exact terms |
| Sensitive info (PII) | PII + custom regex patterns | Input + output | Block or redact | Detection + your regex | Privacy / data-leak prevention |
| Contextual grounding + relevance | Ungrounded / off-question answers | Output (vs source) | Block or flag (threshold) | Grounding + relevance scoring | Hallucination control (RAG) |
A guardrail is configured once and then versioned like any production artifact, so you can iterate safely and roll back. The configuration is deliberately declarative — you describe the policy, not the enforcement code.
You create a guardrail in the Bedrock console, via the API/SDK, or with infrastructure-as-code (CloudFormation/CDK/Terraform — the right choice for production, since the guardrail is a compliance artifact that belongs in version control). In the configuration you enable the policy types you need (content filters with per-category strengths, denied topics with descriptions, word/profanity lists, PII types with block-or-redact actions and any custom regex, grounding/relevance thresholds) and you set the blocked-message text that users see when the input or the output is blocked — separately for each direction.
Guardrails are versioned. As you edit, you work against a working draft; when the policy is right, you publish an immutable, numbered version. Versions never change, which is what makes them safe to run in production and auditable after the fact. To apply a guardrail at inference time you reference a specific guardrail ID and version, so updating your policy is a deliberate act — you publish a new version and point your application or agent at it — and rolling back is simply referencing the previous version. This versioning is also why guardrails make good evidence in an audit: you can show exactly which policy was in force at a given time.
Because a guardrail is a model-independent, versioned policy object, the production pattern is to define it in infrastructure-as-code, publish numbered versions, and reference a specific version at inference time. Your AI safety policy becomes a reviewable, auditable, roll-back-able artifact — not settings buried in a console.
The reach of Guardrails is what makes them strategically useful: one policy can protect direct model calls, autonomous agents, and even models that do not run on Bedrock at all. There are three ways to apply a guardrail.
The most common pattern: when you call a model through the Converse or InvokeModel API, you pass a guardrail ID and version as parameters. Bedrock evaluates the input against the guardrail before the model sees it and evaluates the output before returning it to you. Because the guardrail is model-independent, the same guardrail works across every foundation model on Bedrock — Claude, Amazon Nova, Llama, Mistral, and the rest — so you can change models without touching your safety policy.
You can attach a guardrail directly to a Bedrock Agent, so every interaction the agent has — user input, and the model output at each step of its orchestration loop — is screened. This is important precisely because agents are higher-risk: they take real actions and read untrusted tool outputs and retrieved documents that can carry prompt-injection. The attached guardrail enforces content, topic, PII, and prompt-attack policy throughout the agent's operation, making it a non-optional control for any production agent.
The ApplyGuardrail API decouples the guardrail from any specific model call entirely. You send a piece of text (input or output) to ApplyGuardrail and get back the guardrail's assessment — whether it was blocked, what was redacted, which policies triggered — without invoking a model. This is powerful for two reasons. First, you can screen content at any point in your own pipeline (validate user input before doing anything, screen content from another system, post-process an answer). Second, and notably, it lets you apply Bedrock Guardrails to models that are not hosted on Bedrock — a self-hosted open-weight model, a third-party API, or a model on SageMaker — by calling ApplyGuardrail on the text before and after you call that other model. Your safety policy becomes portable across your entire AI stack, not just the Bedrock-hosted parts.
| Method | How | Evaluates | Works with non-Bedrock models? | Best for |
|---|---|---|---|---|
| Inline on Converse / InvokeModel | Pass guardrail ID + version on the call | Input + output of that call | No | Standard model calls on Bedrock |
| Attached to an Agent | Configure the guardrail on the agent | Every agent input + output | No (agent runs on Bedrock) | Production agents |
| ApplyGuardrail API | Send text to the API directly | Any text you submit | Yes — model-agnostic | Custom pipelines, external/self-hosted models |
A guardrail is a control, and controls have to be tested and evidenced. Bedrock supports iterating on a guardrail before you ship it, and the resulting policy maps cleanly onto common regulatory and audit requirements.
Before relying on a guardrail, you test it. The Bedrock console provides a test window where you can submit sample inputs (and responses) against a draft guardrail and see exactly which policies trigger, what gets blocked, and what gets redacted — without wiring it into your application. The disciplined approach is to build a test set of representative and adversarial examples — benign requests that must pass, harmful requests that must block, jailbreak attempts, PII-laden inputs, off-topic questions — and run them against the guardrail as you tune thresholds, then keep that set as a regression suite (scriptable via the API, including ApplyGuardrail). The goal is the right balance: strict enough to catch real violations, loose enough not to block legitimate use. Both failure modes (false negatives that let violations through, false positives that frustrate users) are real, and only testing reveals where your thresholds sit.
Guardrails are frequently the control that makes a generative-AI feature deployable in a regulated context. The mapping is direct: PII detection and redaction supports privacy and data-minimization obligations and is central to handling protected health information under HIPAA and cardholder data under PCI DSS — you can detect and redact PHI or card numbers before they are stored, logged, or returned. Denied topics and content filters support acceptable-use and safety obligations. Versioning provides the auditable evidence a SOC 2 assessor expects — you can demonstrate which policy was in force and that it was enforced on every interaction.
Two honest caveats. First, a guardrail is a technical control, not a compliance certification — meeting HIPAA/PCI/SOC 2 is an organizational program (BAAs, scoping, access controls, logging, audits) of which a guardrail is one part, not the whole. Bedrock itself is offered with the AWS compliance posture (including services in scope for HIPAA eligibility and SOC reports — confirm current scope on AWS's compliance pages), but you still have to architect and operate the rest. Second, no automated filter is perfect; treat the guardrail as defense-in-depth alongside data handling, IAM least-privilege, and logging — not as a single point you can fully outsource the risk to.
A reviewer wants three things a guardrail provides: a single place the AI policy is defined, evidence it was enforced on every input and output, and a version history showing which policy was in force when. PII redaction maps to HIPAA/PCI; denied topics and content filters map to acceptable-use; versioning maps to SOC 2 change-control. The guardrail is a control, not a certificate — it is one part of the program.
Guardrails are inexpensive relative to the model they protect, but they are not free and not infallible. Here is the honest picture of both the bill and the boundaries.
Guardrails are billed on the text they evaluate, on top of the underlying model token cost — there is no flat platform fee. Pricing is generally per unit of text assessed, and it varies by policy type: the model-judgment policies (content filters, denied topics, contextual grounding) are priced per amount of text processed, while word filters and PII detection are typically cheaper or, in some cases, included. The contextual-grounding check tends to be the most notable line because it evaluates the answer against the source material. In practice the guardrail is usually a small fraction of total cost — the model tokens dominate — but it does scale with traffic and with how many policies you enable, so for very high volume it is worth measuring. As always, figures move; confirm current rates on the AWS Bedrock pricing page, and see the amazon-bedrock-pricing sibling for the model-token side.
Guardrails are strong but bounded, and deploying them well means knowing the edges:
Not every prototype needs a guardrail, but most things that face users or touch regulated data do. Here is a practical view of when to add one and how to start.
You almost certainly need a guardrail when the application is customer- or public-facing (you cannot control what users send), when it handles personal or regulated data (PII, PHI, cardholder data), when it is an autonomous agent (higher blast radius and untrusted inputs), or when you operate in a regulated industry and need auditable controls. You can often skip it for a purely internal, low-stakes prototype with trusted users and no sensitive data — though even then, grounding checks can be worth it for quality.
A sensible starting configuration for a customer-facing assistant: enable content filters at medium strength across categories and turn the prompt-attack filter on; add denied topics for anything clearly out of scope; turn on PII detection set to redact (not block) for common identifiers so conversations keep flowing while data is masked, with custom regex for your own identifier formats; and for any RAG-backed answers, enable contextual grounding and relevance with a moderate threshold. Then run your adversarial test set, watch for false positives in real traffic, and adjust strengths from there. Start slightly stricter and loosen with evidence — it is easier to relax a guardrail than to explain a leak.
Customer-facing assistant, day one: content filters at medium + prompt-attack on, denied topics for out-of-scope subjects, PII detection set to redact (plus custom regex for your IDs), and contextual grounding + relevance for any RAG answers. Test against an adversarial set, then tune with real-traffic evidence. Version it and reference the version explicitly.
A scannable summary of every guardrail policy type: what it catches, what it does to violations, how it makes the decision, and whether it leans on model judgment or literal matching. Use it to decide which policies your workload needs.
| Filter type | Catches | Action on violation | Decision method | Tunable? | Most important for |
|---|---|---|---|---|---|
| Content filters | Hate, insults, sexual, violence, misconduct | Block | Model judgment | Yes — per-category strength | Any user-facing app |
| Prompt-attack filter | Jailbreaks / prompt injection | Block | Model judgment | Yes (strength) | Public input + Agents |
| Denied topics | Out-of-scope subjects you define | Block | Model judgment vs descriptions | Yes — your definitions | Scoped assistants |
| Word / profanity filters | Exact terms + managed profanity list | Block | Literal matching | Yes — your word list | Banned exact terms |
| Sensitive info (PII) | PII + custom regex identifiers | Block or redact | Detection + regex | Yes — per type, regex | Privacy / HIPAA / PCI |
| Contextual grounding + relevance | Ungrounded / off-question answers | Block or flag (threshold) | Grounding + relevance scoring | Yes — thresholds | RAG / hallucination control |
Situation: The team wanted a patient-facing assistant that answered questions from their clinical knowledge base, but they were building in a HIPAA-conscious environment and could not ship anything that might emit PHI, drift into giving diagnoses (out of scope and a liability), or hallucinate medical claims not supported by their vetted content. They also needed something they could show a security reviewer, and they did not want to spend scarce seed runway on inference while still pre-revenue.
What CloudRoute did: CloudRoute matched them in under 24 hours to an AWS partner with healthcare and Bedrock experience. The partner built the assistant on a Knowledge Base over the clinical content and put a single Bedrock Guardrail in front of it: PII/PHI detection set to redact (with custom regex for medical-record numbers); denied topics covering "diagnosis" and "treatment recommendation" to keep it informational; content filters and the prompt-attack filter on; and contextual grounding + relevance thresholds so any answer not supported by the retrieved clinical passages was blocked. They built an adversarial test set, tuned thresholds in the console, versioned the guardrail in infrastructure-as-code, and referenced the version explicitly. The partner also filed a Bedrock POC credit application plus an Activate application to fund the build.
Outcome: The assistant answered only from grounded clinical content, redacted PHI in both directions, refused out-of-scope medical advice, and gave the team a single versioned policy artifact to put in front of their security reviewer. Inference, the knowledge base, and Guardrails were fully covered by the approved AWS credits, so the build ran at $0 out of pocket. CloudRoute's commission was paid by the partner from AWS engagement funding, not by the customer.
guardrail: PII-redact + denied topics + grounding + prompt-attack · versioned in IaC · credits: POC + Activate · out-of-pocket: $0
Whatever your guarded, compliant GenAI workload would cost on Bedrock, AWS credits can cover it. CloudRoute routes you to the right credit pool (Activate up to $100K, Bedrock POC $10K–$50K, GenAI Accelerator up to $1M) and a vetted AWS partner who builds it right — guardrails, PII redaction, grounding, the audit trail. Customer pays $0.