ai content moderation on aws · the 2026 build guide

AI content moderation on AWS (2026).

Moderating user-generated content — text, images, and video — is a pipeline problem, not a single API call. This is the full build guide: the reference architecture, which AWS service does what (Amazon Bedrock Guardrails plus LLM classification for text, Amazon Rekognition for image and video, Amazon Comprehend for PII), the pre-publish vs asynchronous decision, severity tiers and the actions they trigger, the human-review queue and appeals flow, policy configuration, the latency budget, and what it actually costs at scale.

modalities
text · image · video
core services
Guardrails · Rekognition · Comprehend
moderation modes
pre-publish + async
credits to fund it
up to $100K
TL;DR
  • Content moderation on AWS is a multi-service pipeline, not one product. For text, you combine Amazon Bedrock Guardrails (configurable content filters, denied topics, word filters) with an LLM classifier on Bedrock for nuanced, policy-aware judgment. For images and video, Amazon Rekognition Content Moderation returns a hierarchical taxonomy of labels with confidence scores. For PII, Amazon Comprehend detects and lets you redact personal data in text.
  • The central design choice is pre-publish (synchronous) vs asynchronous moderation. Pre-publish blocks content before it goes live — required for high-risk surfaces (a dating app, a children's product) but it adds latency to the post action. Async moderates after publish on an event-driven queue (S3 → EventBridge → Lambda) — lower friction, cheaper, right for lower-risk surfaces, but bad content is briefly visible. Most real systems run both: synchronous on the riskiest fields, async everywhere else.
  • The hard parts are policy design, severity tiers, and the human-review loop — not the API wiring. You map each detection to an action (allow / redact / hold-for-review / auto-reject), route borderline cases to a human queue (Amazon A2I or your own), and give users an appeals path. GenAI inference and per-image/minute moderation bills scale with volume; CloudRoute routes you to AWS credits (Activate Portfolio up to $100K, Bedrock/GenAI POC $10K–$50K, GenAI Accelerator up to $1M) and a vetted partner who builds the pipeline — you pay $0.
the core idea

IWhat "content moderation on AWS" actually means

Any product that lets users post text, upload images, or stream video inherits a moderation obligation: keep harmful, illegal, off-policy, and privacy-violating content off your surface without burying your team in manual review. On AWS this is assembled from a few managed services, each handling a different modality, behind a pipeline you design.

Moderation is fundamentally a classification-then-action problem. For every piece of user-generated content (UGC) you have to answer two questions: is this a policy violation, and how severe? and then what do we do about it — allow, redact, hold for a human, or reject? The first question is where AWS's AI services earn their place; the second is policy and workflow you own. Conflating the two is the most common design mistake — teams wire up a detection API, get a confidence score back, and then have nowhere to put the borderline cases.

No single AWS service moderates everything, and that is by design — the modalities are genuinely different problems. Text needs language understanding and policy nuance (sarcasm, context, coded language), which is why it pairs Amazon Bedrock Guardrails (a configurable policy layer) with an LLM classifier on Bedrock. Images and video need computer vision, which is Amazon Rekognition Content Moderation. Personal data hiding inside any text needs entity detection, which is Amazon Comprehend. A real product almost always needs at least two of these, and the architecture is the glue that turns three detection services into one coherent moderation decision.

It is worth being explicit about what moderation is not here. It is not the same as output safety on your own generative AI — that is what Bedrock Guardrails does for model responses, covered in the amazon-bedrock-guardrails sibling. The overlap is real (Guardrails is useful for both), but moderating user-submitted content at scale adds problems a generation-time guardrail does not face: images and video, asynchronous high-volume queues, human-review backlogs, appeals, and audit trails for trust-and-safety reporting. This page is about that broader UGC pipeline.

On AWS, every stage of that pipeline maps to a managed service, which is why it is a common place to build trust-and-safety infrastructure: you get the detection models, the event plumbing (S3, EventBridge, SQS, Lambda, Step Functions), the human-review tooling (Augmented AI), and the storage/audit layer in one account. The next section walks the full reference architecture.

the one-sentence definition

AI content moderation on AWS = classify every piece of user-generated content (text with Bedrock Guardrails + an LLM, images/video with Rekognition, PII with Comprehend), map each result to a severity tier and an action (allow / redact / hold-for-review / reject), and route borderline cases to a human queue with an appeals path.

who does what

IIThe four building blocks — Bedrock Guardrails, an LLM classifier, Rekognition, and Comprehend

Before the architecture, it helps to know exactly what each service contributes, what it returns, and where its judgment is strong or weak. These four are the moderation core; everything else in the pipeline is plumbing around them.

The split is by modality and by the kind of judgment required. Two services handle text (one for fast policy enforcement, one for nuanced classification), one handles vision, and one handles personal data. They compose: a single uploaded post with a caption and an image can pass through all four.

Amazon Bedrock Guardrails — fast, configurable text policy

Bedrock Guardrails is a configurable safety/policy layer: content filters across categories (hate, insults, sexual, violence, misconduct) each with a tunable strength, denied topics you define in natural language, word and profanity filters (a managed list plus your own exact terms), and sensitive-information detection. In a moderation pipeline its role is the first, deterministic-ish pass on text: it is fast, cheap relative to a full LLM call, and catches the clear-cut cases (explicit slurs, obvious harmful categories, banned terms) without you writing a prompt. You can run it standalone — without invoking a generation model — via the ApplyGuardrail API, which is exactly how you use it to screen incoming user text rather than a model's output. See the amazon-bedrock-guardrails sibling for the full filter breakdown.

An LLM classifier on Bedrock — nuanced, policy-aware judgment

Guardrails catch the obvious; a lot of harmful content is not obvious. Context-dependent harassment, coded hate speech, scams, self-harm content, spam, and "is this on-policy for our specific community" are judgment calls. The pattern is to send the text (and the relevant policy) to a foundation model on Bedrock — Claude, Amazon Nova, Llama, or Mistral — with a structured prompt: "Given this community policy, classify this content into {allow, review, reject}, assign a severity, name the violated rule, and return JSON." This LLM-as-a-moderator step is where your actual policy lives, encoded in the prompt, and it handles the nuance no fixed filter can. It is more expensive and slower than Guardrails, so you typically run it only on content Guardrails did not already clear or reject outright. The two together — Guardrails for the cheap clear-cut pass, an LLM for the nuanced remainder — is the standard text-moderation shape on AWS.

Amazon Rekognition Content Moderation — images and video

Amazon Rekognition is AWS's managed computer-vision service, and its Content Moderation feature detects inappropriate, unwanted, or offensive content in images and video. It returns a hierarchical taxonomy of moderation labels — top-level categories (e.g. Explicit, Non-Explicit Nudity, Violence, Visually Disturbing, Drugs & Tobacco, Alcohol, Hate Symbols, Gambling) each with more specific sub-labels — and a confidence score per label. For images you call DetectModerationLabels synchronously; for video you start an asynchronous StartContentModeration job (Rekognition Video) that processes frames over time and returns labels with timestamps. You set a minimum confidence threshold, and you decide which labels (and at which confidence) trigger which action. Crucially, the label taxonomy lets you be selective — an alcohol brand's community may allow the Alcohol label while still rejecting Explicit content.

Amazon Comprehend — PII detection and redaction in text

Amazon Comprehend is AWS's NLP service, and for moderation its relevant capability is PII detection: it finds personally identifiable information in text — names, emails, phone numbers, addresses, Social Security / national-ID numbers, credit-card numbers, bank accounts, and more — and returns the entity types and their locations, so you can redact or mask them. This matters in UGC for two reasons: users doxx each other (posting someone's private data is a policy and often legal violation), and users leak their own sensitive data that you do not want to store or display. Comprehend can run synchronously for real-time redaction or as an async batch job over large volumes. Bedrock Guardrails also does PII detection, so on text you can use either; Comprehend is the specialist when PII is the primary concern or when you are moderating text outside a Bedrock call.

aws moderation services · what each does · representative as of 2026
ServiceModalityWhat it returnsSync / asyncDecision stylePrimary role in the pipeline
Bedrock Guardrails (ApplyGuardrail)TextBlock/allow + which policy triggered + redactionsSynchronousConfigurable filters (tunable strength)Fast first-pass on the clear-cut cases
LLM classifier on BedrockTextStructured label + severity + rule + rationale (JSON)Synchronous (stream or not)Policy-aware model judgment (your prompt)Nuanced classification of the gray zone
Rekognition Content ModerationImage + videoHierarchical moderation labels + confidence (+ timestamps for video)Image: sync · Video: async jobVision model + your confidence thresholdAll visual moderation
Amazon ComprehendTextPII entity types + locations (to redact/mask)Sync or async batchNLP entity detectionPII / doxxing detection + redaction
These compose: one post with a caption + image can hit Guardrails + an LLM + Comprehend (text) and Rekognition (image). Guardrails and Comprehend both detect PII in text — pick one per path. Rekognition is the only one of the four that handles images and video.
end to end

IIIThe reference moderation architecture on AWS

A production moderation system is an event-driven pipeline: content arrives, it is classified by the relevant service(s), a decision engine maps the results to an action, and anything uncertain goes to humans. Understanding each stage is what lets you debug a system that is letting bad content through or over-blocking good content.

It helps to think of the pipeline as four logical stages: intake → classify → decide → act, with a human-review loop hanging off the "decide" stage for borderline cases. The same logical stages run whether you moderate synchronously (pre-publish) or asynchronously (post-publish); only the triggering and the latency budget change, which is the subject of the next section.

Intake — where content enters and how it is triggered

Text typically enters through your application API and is moderated inline (a call to Guardrails / an LLM before the content is committed) or dropped onto a queue. Images and video almost always land in an Amazon S3 bucket first (direct-to-S3 upload via a presigned URL), which is convenient because an S3 object-created event can fire the rest of the pipeline automatically. The two common triggers are a synchronous API call (your app calls the moderation step and waits) or an event (S3 → Amazon EventBridge or S3 notificationsAWS Lambda), which is the backbone of the asynchronous path.

Classify — fan out to the right service(s)

An orchestrator — a Lambda function for simple flows, or AWS Step Functions when there are multiple steps, retries, and branching — routes each item to the relevant detector(s): text to Guardrails and, if needed, the LLM classifier and Comprehend; images and video to Rekognition. Step Functions is the right tool once the flow has real structure (run Guardrails first, only call the expensive LLM on what Guardrails did not resolve, run Rekognition and Comprehend in parallel for a post that has both an image and a caption), because it gives you visual orchestration, built-in retries, and per-execution audit history for free.

Decide — the severity-and-action engine

The detectors return scores and labels; the decision engine (your code, usually in the orchestrating Lambda/Step Function) maps them to an action using your policy: thresholds per label, severity tiers, and the action each tier triggers (covered in detail in the next two sections). This is the heart of the system and the part no AWS service provides for you — it encodes your risk tolerance. Its output is a single decision per item: allow, redact-and-allow, hold-for-human-review, or reject.

Act + review — enforce the decision and loop in humans

On allow, the content publishes (or stays published). On redact, the PII or banned terms are masked and then it publishes. On reject, it is blocked and the user is notified. On hold-for-review, the item goes to a human-review queueAmazon Augmented AI (A2I) provides a managed human-review workflow that integrates with Rekognition and Comprehend, or you can route to your own internal moderation tool (an SQS queue feeding a review UI). Every decision is written to a durable store (DynamoDB for state, S3 for the audit log) so you have a complete trail for trust-and-safety reporting and appeals.

the moderation pipeline stages mapped to AWS services · representative as of 2026
StageWhat it doesTypical AWS services
IntakeContent enters; trigger the pipelineAPI Gateway / your app · S3 (uploads) · EventBridge / S3 notifications
OrchestrateRoute each item to the right detectorsLambda (simple) · Step Functions (multi-step, retries, branching)
Classify — textPolicy + nuanced classification + PIIBedrock Guardrails (ApplyGuardrail) · Bedrock LLM · Comprehend
Classify — image/videoVisual moderation labels + confidenceRekognition Content Moderation (DetectModerationLabels / StartContentModeration)
DecideMap scores/labels → severity tier → actionYour logic in Lambda / Step Functions
Human reviewRoute borderline items to a humanAmazon A2I (managed) · or SQS → your review UI
Act + recordEnforce decision; store state + audit trailDynamoDB (state) · S3 (audit log) · SNS/SQS (notify)
Synchronous (pre-publish) and asynchronous (post-publish) pipelines run the same stages — the difference is whether intake blocks on the decision. Step Functions is the natural orchestrator once the flow branches or needs retries; a single Lambda suffices for one detector.
the central decision

IVPre-publish (synchronous) vs asynchronous moderation

The first real architectural decision is not which model or which threshold — it is whether you moderate content <em>before</em> it is visible (synchronous, pre-publish) or <em>after</em> (asynchronous, post-publish). This single choice determines your latency budget, your cost, and your risk exposure, and most mature systems end up running both.

The honest framing: match the mode to the risk of the surface, not to a blanket rule. Pre-publish everything and you add latency and cost to every action and frustrate users; async everything and harmful content — including content that is illegal or dangerous — is briefly live for everyone to see. Neither extreme is right for a real product. You segment your surfaces by risk and apply the appropriate mode to each.

Pre-publish (synchronous) — block before it goes live

In the synchronous path, your application calls the moderation step and waits for the decision before the content becomes visible. A post is held until Guardrails / the LLM clears it; an uploaded image is not shown until Rekognition returns. This is the right choice for high-risk surfaces where even brief exposure is unacceptable: a profile photo on a dating or children's app, a public broadcast, a first message between strangers, anything where the harm of one visible violation is severe (legal exposure, user safety, brand-ending). The cost is latency — you are adding a model call (tens to low hundreds of milliseconds for Guardrails, more for an LLM, and image moderation on top) to the critical path of the user's action — and higher per-item cost, because you moderate everything up front, including the large majority that is benign.

Asynchronous — moderate after publish, on a queue

In the asynchronous path, content publishes immediately and moderation runs after on an event-driven queue: the upload lands in S3, an event fires a Lambda/Step Function, the detectors run, and if the content turns out to violate policy it is taken down retroactively (and the user actioned). This is the right choice for lower-risk, high-volume surfaces — comments on an established community, a second-degree feed, bulk back-catalog scanning — where the friction of pre-publish is not worth it and a short visibility window for the rare violation is tolerable. It is lower-latency for the user (their action is instant) and often cheaper to operate (you can batch, use lower-priority compute, and the async Rekognition Video and Comprehend batch APIs are built for it). The trade is the visibility window: bad content is live until the pipeline catches it, which for the worst categories is unacceptable — which is exactly why you do not use async for the riskiest surfaces.

The hybrid most systems actually run

In practice, mature moderation is a hybrid. You run synchronous moderation on the few highest-risk fields and surfaces (avatars, public-facing first posts, anything legally sensitive) and asynchronous moderation on the high-volume long tail. You can also tier within the synchronous path: run the cheap, fast Guardrails check inline (it is fast enough), and if it returns "uncertain," publish optimistically but enqueue the expensive LLM/image check async, taking the item down if it fails. This gives you the safety of pre-publish where it matters and the cost/latency of async everywhere else — the best of both, at the price of a more complex pipeline.

the pragmatic rule

Segment surfaces by risk. Synchronous (pre-publish) for anything where one visible violation is unacceptable — avatars, first contact, public broadcasts, regulated/illegal categories. Asynchronous for the high-volume long tail where instant posting matters and a short visibility window for rare violations is tolerable. Most systems run both, and tier the fast cheap check inline with the expensive check deferred.

turning scores into actions

VSeverity tiers and the actions they trigger

Detection gives you scores and labels; moderation requires decisions. The bridge is a <strong>severity-tier model</strong>: you bucket every possible detection into tiers, and each tier maps to exactly one automated action. This is the policy heart of the system, and getting the tiers and thresholds right is most of the work.

The reason you need tiers rather than a single block/allow line is that confidence and severity are continuous and your tolerance is not uniform. A 99%-confidence Explicit label and a 55%-confidence "Visually Disturbing" label should not get the same treatment; neither should an unambiguous slur and a borderline-sarcastic insult. Tiers let you act decisively on the clear cases and route the genuinely uncertain ones to humans, instead of forcing a binary on inherently graded signals. A common four-tier model:

  • Tier 0 — Allow — No violation detected, or detected below your action threshold. Content publishes (or stays published). The large majority of UGC lands here, which is why the cost of moderating it must be low.
  • Tier 1 — Redact-and-allow — The content is fine except for something removable — PII (Comprehend / Guardrails redacts it), a banned word (word filter masks it). You mask the offending span and publish. The user often does not even need to know.
  • Tier 2 — Hold for human review — The signal is in the uncertain band — moderate confidence on a serious label, a nuanced LLM "review" verdict, a context-dependent call. The item is withheld (or, in async, flagged) and routed to the human-review queue. This is the tier that protects you from both false positives and false negatives.
  • Tier 3 — Auto-reject — High-confidence detection of a clear, severe violation (high-confidence Explicit imagery, an unambiguous slur, a denied-topic hit on a zero-tolerance category). The content is blocked automatically and the user is notified; severe or repeat cases may also trigger account action and, where legally required, reporting.

Two principles make tiering work in practice. First, the thresholds are per-label, not global — your auto-reject confidence for Explicit imagery should be different from your auto-reject confidence for Alcohol, and some labels may have no auto-reject tier at all (always route to human). Second, start strict on the worst categories and loosen with evidence: it is easier to relax a threshold after observing false positives than to explain why dangerous content was visible. Calibrate the thresholds against a labeled sample of your real content and revisit them as the human-review queue tells you where the tiers are mis-set.

a four-tier severity model · representative mapping · 2026
TierTriggerActionHuman involved?User notified?Example
0 — AllowNo violation / below action thresholdPublishNoNoBenign comment or photo
1 — Redact-and-allowRemovable issue only (PII, banned term)Mask span, then publishNoSometimesA phone number in a comment
2 — Hold for reviewUncertain band / nuanced verdictWithhold or flag → human queueYesOn final decisionPossible harassment in context
3 — Auto-rejectHigh-confidence severe violationBlock + notify (+ account action)Optional (sampled audit)YesHigh-confidence explicit image
Thresholds are per-label, not global — auto-reject confidence for explicit imagery differs from alcohol or "visually disturbing," and some labels should never auto-reject (always route to Tier 2). Tier 2 is the pressure-relief valve that keeps both false positives and false negatives in check.
humans in the loop

VIThe human-review queue and the appeals flow

Automation handles the clear cases; humans handle the rest. A moderation system without a human path is either dangerously permissive or maddeningly over-strict. Two human workflows matter: the <strong>review queue</strong> for Tier-2 items, and the <strong>appeals flow</strong> for users who think an automated decision was wrong.

The human-review queue

Tier-2 items — the uncertain band — flow to a queue where a human moderator makes the call. On AWS, Amazon Augmented AI (A2I) provides a managed human-review workflow: it integrates directly with Rekognition and Comprehend, lets you define activation conditions (e.g. "send to a human when the moderation confidence is between 50% and 90%"), supplies a review UI, and routes work to a private workforce (your own team), a vendor workforce, or Amazon Mechanical Turk. Alternatively, many teams build their own: the orchestrator drops uncertain items onto an SQS queue that feeds an internal moderation console. Either way, the design goals are the same — prioritize by risk (a possible child-safety issue jumps the queue), give reviewers context (the content, the detector scores, the user's history, the specific policy in question), capture the decision and reason (for audit and for training data), and track queue depth and latency so the backlog never becomes the bottleneck that makes async moderation effectively never-happens.

The appeals flow

Automated moderation will get things wrong — false positives are inevitable — so a credible system gives users a way to contest a decision. The appeals flow is a second human-review path with a different entry point: a user whose content was rejected (or account actioned) requests review, the item re-enters a queue (usually with elevated priority and shown to a different reviewer than any who already touched it), and a human upholds or overturns the decision. Two things make appeals work: transparency (tell the user which policy their content was actioned under, so the appeal is informed rather than a shot in the dark) and a clean audit trail (every state transition — detected, actioned, appealed, re-reviewed, final — recorded, because appeals are exactly where you need to reconstruct what happened and why). Increasingly this is not optional: regulatory regimes (for example, online-safety and platform-transparency rules) expect platforms to offer notice and an appeals mechanism, so the appeals flow is both good product and a compliance requirement.

The feedback loop — human decisions improve the system

Human decisions are not just dispositions; they are labeled data. Every Tier-2 call and every overturned appeal is a signal about where your thresholds and your LLM-classifier prompt are mis-calibrated. The mature pattern feeds these back: track the rate at which humans overturn auto-decisions (a rising overturn rate on a label means the threshold is wrong), use confirmed violations and confirmed false-positives to refine the classifier prompt or policy, and keep a labeled regression set so a prompt or threshold change can be measured rather than guessed at. The human queue, in other words, is also your evaluation and improvement engine.

what a credible review system needs

Tier-2 items routed to a queue (Amazon A2I or your own SQS-fed console), prioritized by risk, with reviewers given full context (content + scores + user history + policy). A separate appeals path with a different reviewer and elevated priority. A complete audit trail of every state transition. And a feedback loop that turns human decisions into threshold and prompt improvements. Track queue depth and overturn rate as first-class metrics.

shipping it for real

VIIPolicy configuration, the latency budget, and cost at scale

Three operational realities separate a moderation demo from a production system: how you encode and version your policy, whether the synchronous path fits your latency budget, and whether the bill survives real volume. Each has a concrete AWS answer.

Policy configuration — encode it, version it

Your moderation policy is spread across several configurable artifacts, and the discipline is to treat all of them as versioned code. The Bedrock Guardrail is a versioned object — define it in infrastructure-as-code and reference a specific version. The LLM-classifier prompt (which encodes your community standards in natural language) belongs in version control and should be evaluated against a labeled set on every change. The Rekognition confidence thresholds and the label-to-tier mapping are configuration that should live in code or a config store, not be hard-coded or hand-set in a console. The payoff is the same as any change-control: you can roll back a policy change that caused a spike in false positives, and you can show an auditor exactly which policy was in force when a given decision was made. A vague policy spread across un-versioned console settings is the most common reason a moderation system drifts.

The latency budget (synchronous path)

For pre-publish moderation, latency is on the critical path of a user action, so you have to budget it. Rough shapes, representative as of 2026 (measure your own): a Guardrails text check is fast — typically tens to low-hundreds of milliseconds; an LLM classification call is the slow step — hundreds of milliseconds to a couple of seconds depending on model and content length; Rekognition image moderation is sub-second per image; Rekognition video is inherently asynchronous (it processes over the clip's duration) so it never belongs in a synchronous budget. The levers to stay in budget: run the cheap fast Guardrails check first and only call the LLM on what it does not resolve; pick a smaller/faster model (e.g. Amazon Nova Micro/Lite) for the classifier when nuance allows; run independent checks (image + caption) in parallel; and for anything that cannot meet the budget synchronously, push it to the async path and publish optimistically. If the synchronous budget cannot be met, that is itself a signal to move that surface to async.

Cost at scale

Moderation cost scales with volume across a few independent line items, and the surprise is usually the LLM-classifier and image volume rather than Guardrails. Bedrock Guardrails bills on the text evaluated; the LLM classifier bills on input+output tokens per call (the biggest text lever — which is why you gate it behind Guardrails and use a small model); Rekognition bills per image analyzed and per minute of video processed; Comprehend bills per unit of text. The levers at scale: moderate async and batch where you can (cheaper than per-call synchronous, and Bedrock batch inference is roughly half price for offline classification); gate the expensive steps (LLM only on the gray zone, not on everything Guardrails already cleared or rejected); down-sample where defensible (you may moderate every image but only a sample of low-risk text); and tier model choice (a cheap model for the first classification pass, a frontier model only for escalations). Figures move — confirm current rates on the AWS pricing pages for Bedrock, Rekognition, and Comprehend, and see the amazon-bedrock-pricing sibling for the model-token side.

moderation cost lines on aws · representative shape as of 2026 — check the AWS pricing pages for current rates
Cost lineBilled onTypically the cost driver?Main lever to control it
Bedrock GuardrailsText evaluated (per unit)Usually smallIt is already the cheap first pass — keep it first
LLM classifier (Bedrock)Input + output tokens per callOften the largest text lineGate behind Guardrails; small model; batch offline (~½ price)
Rekognition imagePer image analyzedLarge at high upload volumeModerate every image but cache/skip duplicates
Rekognition videoPer minute of video processedLarge for video-heavy productsSample frames / segments where policy allows; async only
Comprehend (PII)Per unit of textUsually modestBatch async; or use Guardrails PII on the Bedrock path instead
The biggest savings come from gating the expensive steps (run the cheap Guardrails check first, call the LLM only on the gray zone), batching async workloads (~50% on Bedrock batch), and choosing a small classification model. Vision cost scales with raw upload volume, so deduplicate where you can.
the build, in order

VIIIA step-by-step build outline

Here is the fastest credible path from zero to a working text-and-image moderation pipeline on AWS, structured so you can ship a synchronous text check first and layer in images, async, and human review as you go.

  • Step 1 — Write the policy first — Before any code, write down what you allow, what you redact, what you reject, and what goes to a human — per category. This document becomes your LLM-classifier prompt, your Rekognition label-to-tier mapping, and your Guardrail config. Skipping this is why most moderation builds thrash.
  • Step 2 — Stand up the text fast-pass (Guardrails) — Create a Bedrock Guardrail (content filters at a sensible strength, denied topics for your zero-tolerance subjects, word/profanity filters, PII detection set to redact). Call it via the ApplyGuardrail API on incoming user text. This alone catches the clear-cut cases synchronously.
  • Step 3 — Add the LLM classifier for the gray zone — For text Guardrails did not resolve, call a Bedrock model (start with a small one — Nova or Claude Haiku-class) with a structured prompt that returns JSON: {decision, severity, violated_rule, rationale}. Encode your policy in the prompt. This is your nuanced text judgment.
  • Step 4 — Add image moderation (Rekognition) — Have clients upload images to S3 via presigned URLs. Call Rekognition DetectModerationLabels (sync, for pre-publish) or wire an S3 event → Lambda (async). Map the returned labels + confidence to your severity tiers. For video, use StartContentModeration (async) and handle the completion event.
  • Step 5 — Build the decision engine + severity tiers — In your orchestrator (Lambda for one detector, Step Functions for several), combine the detector outputs and apply your tier mapping to produce a single action per item: allow / redact-and-allow / hold-for-review / reject. Keep thresholds in config, not hard-coded.
  • Step 6 — Wire the human-review queue — Route Tier-2 (hold-for-review) items to Amazon A2I with activation conditions on confidence, or to an SQS queue feeding your own console. Give reviewers full context, capture their decision + reason, and prioritize by risk. Track queue depth and latency.
  • Step 7 — Add appeals, audit trail, and the feedback loop — Let users contest rejections into a separate review path (different reviewer, elevated priority). Record every state transition to DynamoDB/S3 for audit and transparency reporting. Track the human overturn rate per label and feed confirmed decisions back to tune thresholds and the classifier prompt.
  • Step 8 — Choose sync vs async per surface, then load-test — Apply synchronous moderation to your high-risk surfaces and async to the long tail. Measure the synchronous latency budget under load and the cost per 1,000 items at projected volume. Move anything that blows the latency or cost budget to async.
text vs image moderation, side by side

Text moderation vs image/video moderation on AWS

The two modalities are different enough that they shape the whole architecture — different services, different sync/async defaults, different cost units, different failure modes. This comparison is the one that decides how you split your pipeline.

DimensionText moderationImage / video moderation
Primary AWS service(s)Bedrock Guardrails + an LLM classifier on Bedrock (+ Comprehend for PII)Amazon Rekognition Content Moderation
What it returnsBlock/allow + a structured {decision, severity, rule} verdict; PII entitiesHierarchical moderation labels + confidence (+ timestamps for video)
Decision styleConfigurable filters + policy-aware model judgment (your prompt)Vision model output + your per-label confidence thresholds
Sync vs async defaultSynchronous-friendly (Guardrails is fast; LLM is the slow part)Image: synchronous · Video: inherently asynchronous (job-based)
Latency shapeGuardrails: tens–hundreds ms · LLM: hundreds ms–secondsImage: sub-second · Video: processes over the clip duration
Cost unitPer unit of text (Guardrails / Comprehend) + per token (LLM)Per image · per minute of video
Hardest partNuance, context, coded language, sarcasm — prompt + policy designThreshold calibration per label; benign-context false positives
Common pitfallRunning the expensive LLM on everything instead of gating behind GuardrailsOne global threshold for all labels; not deduplicating uploads
Most products need both columns. The pipeline runs them in parallel for content that has both text and an image, and merges the two verdicts into a single severity-tier decision. PII detection in text can come from either Comprehend or Bedrock Guardrails — pick one per path.
building trust & safety for real?
Have a vetted AWS partner build your moderation pipeline — and let AWS credits pay for it
Get matched in 24h →
a recent match

A UGC marketplace moderating text + photos at launch — anonymized

inquiry · seed-stage consumer marketplace, user-generated listings, US
Seed-stage consumer marketplace, 16 people, user-posted listings with free-text descriptions and up to 8 photos each, scaling toward thousands of new listings a day

Situation: Every listing combined free-text and multiple images, and the team needed to keep explicit imagery, harassment, scams, and contact-info doxxing off the marketplace before it damaged trust — without hiring a moderation team on day one. They wanted high-risk fields (the primary listing photo and the first public message) checked pre-publish, but could not afford to add seconds of latency to every post or to run an expensive model on all of the benign long tail. They also had no human-review or appeals story, and the projected Bedrock + Rekognition bill at their target volume made the founder hesitate to start.

What CloudRoute did: CloudRoute matched them in under 24 hours to an AWS partner with a trust-and-safety / GenAI track record. The partner built an event-driven pipeline orchestrated with Step Functions: incoming text hit a Bedrock Guardrail (content filters + denied topics + profanity + PII-redact) as a fast synchronous first pass, with a small Bedrock model (Nova-class) classifying only the gray-zone text into {allow, review, reject} against the marketplace policy; photos uploaded to S3 and were run through Rekognition Content Moderation, synchronously for the primary photo and asynchronously (S3 → EventBridge → Lambda) for the rest; Amazon Comprehend backstopped PII/contact-info detection. A four-tier severity engine mapped results to allow / redact / hold / reject, Tier-2 items routed to Amazon A2I for human review with confidence-based activation, and an appeals path plus a full DynamoDB/S3 audit trail were wired in. The partner filed a Bedrock POC credit application plus an Activate application to fund the build and the first months of inference.

Outcome: The marketplace launched with explicit imagery and clear-cut text violations auto-rejected, PII and contact info redacted, the genuinely borderline cases flowing to a small human queue instead of slipping through, and users able to appeal. High-risk fields were moderated pre-publish without adding latency to the benign majority (gated behind the fast Guardrails pass and the async photo path). The build and the early inference ran entirely on the approved AWS credits, so it cost $0 out of pocket. CloudRoute's commission was paid by the partner from AWS engagement funding, not by the customer.

pipeline: Guardrails + Nova classifier + Rekognition + Comprehend + A2I · sync on high-risk, async on the tail · four-tier severity + appeals · credits: POC + Activate · out-of-pocket: $0

faq

Common questions

What is the best way to do AI content moderation on AWS?
There is no single service — you build a pipeline from the ones that match your modalities. For text, combine Amazon Bedrock Guardrails (a fast, configurable filter pass for the clear-cut cases) with a foundation-model classifier on Bedrock (Claude, Nova, Llama, or Mistral) for nuanced, policy-aware judgment, plus Amazon Comprehend for PII. For images and video, use Amazon Rekognition Content Moderation. An orchestrator (Lambda or Step Functions) routes each item to the right detectors, a decision engine maps the results to a severity tier and an action (allow / redact / hold-for-review / reject), and borderline cases go to a human-review queue (Amazon A2I or your own) with an appeals path. Most systems run high-risk surfaces synchronously (pre-publish) and the long tail asynchronously.
Which AWS service moderates images and video?
Amazon Rekognition, via its Content Moderation feature. For images you call DetectModerationLabels synchronously and get back a hierarchical taxonomy of moderation labels (top-level categories like Explicit, Violence, Drugs & Tobacco, Hate Symbols, each with finer sub-labels) and a confidence score per label. For video you start an asynchronous StartContentModeration job that processes the clip over time and returns labels with timestamps. You set a minimum confidence threshold and decide which labels at which confidence trigger which action — the taxonomy lets you be selective (for example, allowing an Alcohol label while still rejecting Explicit content).
How do Bedrock Guardrails and an LLM classifier work together for text moderation?
They are a two-stage pass. Bedrock Guardrails runs first as the fast, cheap, configurable filter — content filters, denied topics, word/profanity filters, PII detection — and resolves the clear-cut cases (obvious slurs, banned terms, explicit categories, PII to redact) without you writing a prompt. You run Guardrails standalone on user text via the ApplyGuardrail API. Whatever Guardrails does not definitively allow or reject — the nuanced gray zone (context-dependent harassment, coded language, scams) — you pass to a foundation model on Bedrock with a structured prompt that encodes your community policy and returns a JSON verdict (decision, severity, violated rule). Gating the expensive LLM behind the cheap Guardrails pass is both more accurate and much cheaper than running the model on everything.
Should I moderate content before or after it is published?
Match the mode to the risk of the surface. Pre-publish (synchronous) moderation blocks content before it is visible — it is the right choice for high-risk surfaces where even brief exposure is unacceptable (avatars, first contact between strangers, public broadcasts, regulated or illegal categories), at the cost of added latency on the user action and higher per-item cost (you moderate everything up front). Asynchronous moderation publishes immediately and checks afterward on an event-driven queue (S3 → EventBridge → Lambda) — lower friction and cheaper, right for lower-risk high-volume surfaces, but harmful content is briefly visible until caught. Most mature systems run both: synchronous on the few riskiest fields, async on the long tail, often with the fast Guardrails check inline and the expensive LLM/image check deferred.
How do I handle borderline cases and human review on AWS?
Route them with a severity-tier model. Detections that fall in an uncertain confidence band, or that an LLM classifier marks "review," become Tier-2 "hold for human review" items. On AWS, Amazon Augmented AI (A2I) gives you a managed human-review workflow that integrates with Rekognition and Comprehend, lets you set activation conditions on confidence (for example, send to a human between 50% and 90%), and routes work to your own team or a vendor workforce. Alternatively you build your own with an SQS queue feeding an internal console. Either way, prioritize the queue by risk, give reviewers full context (content, scores, user history, the policy), capture the decision and reason for audit, and track queue depth so the backlog does not become the bottleneck.
Do I need an appeals process, and how does it work?
Yes — automated moderation produces false positives, and a credible system (increasingly, a legally required one under online-safety and platform-transparency regimes) lets users contest decisions. The appeals flow is a second human-review path: a user whose content was rejected requests review, the item re-enters a queue with elevated priority and is shown to a different reviewer than any who already handled it, and a human upholds or overturns the decision. Make it work with transparency (tell the user which policy their content was actioned under) and a complete audit trail (every state transition recorded — detected, actioned, appealed, re-reviewed, final). Overturned appeals are also valuable labeled data for re-calibrating your thresholds and classifier prompt.
How can I detect and redact PII in user-generated content on AWS?
Two services do this for text. Amazon Comprehend has dedicated PII detection that finds names, emails, phone numbers, addresses, national-ID and credit-card numbers, bank accounts, and more, returning the entity types and locations so you can mask or redact them — it runs synchronously for real-time redaction or as an async batch job. Amazon Bedrock Guardrails also detects sensitive information (PII) and can block or redact it, plus supports custom regex for organization-specific identifiers. If you are already moderating text through a Bedrock call, use Guardrails; if PII is the primary concern or you are processing large text volumes outside Bedrock, Comprehend is the specialist. Both matter for UGC because users post other people's private data (doxxing) and leak their own.
What does AI content moderation on AWS cost at scale, and can AWS credits cover it?
Cost scales with volume across independent lines: Bedrock Guardrails (per unit of text — usually small), the LLM classifier (per input+output token — often the largest text line, which is why you gate it behind Guardrails and use a small model), Rekognition (per image and per minute of video — large for media-heavy products), and Comprehend (per unit of text). The biggest levers are gating the expensive steps, batching async workloads (Bedrock batch inference is roughly half price), choosing a small classification model, and deduplicating uploads. Figures are representative as of 2026 — check the AWS pricing pages for Bedrock, Rekognition, and Comprehend for current rates. The whole workload is AWS-credit-eligible: CloudRoute routes you to the right credit pool (Activate up to $100K, Bedrock/GenAI POC $10K–$50K, GenAI Accelerator up to $1M) and a vetted AWS partner who builds the pipeline, so it costs $0 to build.

Build your content-moderation pipeline on AWS — funded by AWS credits

CloudRoute routes you to a vetted AWS trust-and-safety / GenAI partner who designs and ships the whole pipeline — Bedrock Guardrails and an LLM classifier for text, Rekognition for image and video, Comprehend for PII, the severity engine, the human-review queue, and appeals. AWS credits fund the build and the inference. You pay $0.

matched within< 24h
credits to fund itup to $100K
cost to you$0
AI content moderation on AWS (2026) — the full build guide · CloudRoute