Moderating user-generated content — text, images, and video — is a pipeline problem, not a single API call. This is the full build guide: the reference architecture, which AWS service does what (Amazon Bedrock Guardrails plus LLM classification for text, Amazon Rekognition for image and video, Amazon Comprehend for PII), the pre-publish vs asynchronous decision, severity tiers and the actions they trigger, the human-review queue and appeals flow, policy configuration, the latency budget, and what it actually costs at scale.
Any product that lets users post text, upload images, or stream video inherits a moderation obligation: keep harmful, illegal, off-policy, and privacy-violating content off your surface without burying your team in manual review. On AWS this is assembled from a few managed services, each handling a different modality, behind a pipeline you design.
Moderation is fundamentally a classification-then-action problem. For every piece of user-generated content (UGC) you have to answer two questions: is this a policy violation, and how severe? and then what do we do about it — allow, redact, hold for a human, or reject? The first question is where AWS's AI services earn their place; the second is policy and workflow you own. Conflating the two is the most common design mistake — teams wire up a detection API, get a confidence score back, and then have nowhere to put the borderline cases.
No single AWS service moderates everything, and that is by design — the modalities are genuinely different problems. Text needs language understanding and policy nuance (sarcasm, context, coded language), which is why it pairs Amazon Bedrock Guardrails (a configurable policy layer) with an LLM classifier on Bedrock. Images and video need computer vision, which is Amazon Rekognition Content Moderation. Personal data hiding inside any text needs entity detection, which is Amazon Comprehend. A real product almost always needs at least two of these, and the architecture is the glue that turns three detection services into one coherent moderation decision.
It is worth being explicit about what moderation is not here. It is not the same as output safety on your own generative AI — that is what Bedrock Guardrails does for model responses, covered in the amazon-bedrock-guardrails sibling. The overlap is real (Guardrails is useful for both), but moderating user-submitted content at scale adds problems a generation-time guardrail does not face: images and video, asynchronous high-volume queues, human-review backlogs, appeals, and audit trails for trust-and-safety reporting. This page is about that broader UGC pipeline.
On AWS, every stage of that pipeline maps to a managed service, which is why it is a common place to build trust-and-safety infrastructure: you get the detection models, the event plumbing (S3, EventBridge, SQS, Lambda, Step Functions), the human-review tooling (Augmented AI), and the storage/audit layer in one account. The next section walks the full reference architecture.
AI content moderation on AWS = classify every piece of user-generated content (text with Bedrock Guardrails + an LLM, images/video with Rekognition, PII with Comprehend), map each result to a severity tier and an action (allow / redact / hold-for-review / reject), and route borderline cases to a human queue with an appeals path.
Before the architecture, it helps to know exactly what each service contributes, what it returns, and where its judgment is strong or weak. These four are the moderation core; everything else in the pipeline is plumbing around them.
The split is by modality and by the kind of judgment required. Two services handle text (one for fast policy enforcement, one for nuanced classification), one handles vision, and one handles personal data. They compose: a single uploaded post with a caption and an image can pass through all four.
Bedrock Guardrails is a configurable safety/policy layer: content filters across categories (hate, insults, sexual, violence, misconduct) each with a tunable strength, denied topics you define in natural language, word and profanity filters (a managed list plus your own exact terms), and sensitive-information detection. In a moderation pipeline its role is the first, deterministic-ish pass on text: it is fast, cheap relative to a full LLM call, and catches the clear-cut cases (explicit slurs, obvious harmful categories, banned terms) without you writing a prompt. You can run it standalone — without invoking a generation model — via the ApplyGuardrail API, which is exactly how you use it to screen incoming user text rather than a model's output. See the amazon-bedrock-guardrails sibling for the full filter breakdown.
Guardrails catch the obvious; a lot of harmful content is not obvious. Context-dependent harassment, coded hate speech, scams, self-harm content, spam, and "is this on-policy for our specific community" are judgment calls. The pattern is to send the text (and the relevant policy) to a foundation model on Bedrock — Claude, Amazon Nova, Llama, or Mistral — with a structured prompt: "Given this community policy, classify this content into {allow, review, reject}, assign a severity, name the violated rule, and return JSON." This LLM-as-a-moderator step is where your actual policy lives, encoded in the prompt, and it handles the nuance no fixed filter can. It is more expensive and slower than Guardrails, so you typically run it only on content Guardrails did not already clear or reject outright. The two together — Guardrails for the cheap clear-cut pass, an LLM for the nuanced remainder — is the standard text-moderation shape on AWS.
Amazon Rekognition is AWS's managed computer-vision service, and its Content Moderation feature detects inappropriate, unwanted, or offensive content in images and video. It returns a hierarchical taxonomy of moderation labels — top-level categories (e.g. Explicit, Non-Explicit Nudity, Violence, Visually Disturbing, Drugs & Tobacco, Alcohol, Hate Symbols, Gambling) each with more specific sub-labels — and a confidence score per label. For images you call DetectModerationLabels synchronously; for video you start an asynchronous StartContentModeration job (Rekognition Video) that processes frames over time and returns labels with timestamps. You set a minimum confidence threshold, and you decide which labels (and at which confidence) trigger which action. Crucially, the label taxonomy lets you be selective — an alcohol brand's community may allow the Alcohol label while still rejecting Explicit content.
Amazon Comprehend is AWS's NLP service, and for moderation its relevant capability is PII detection: it finds personally identifiable information in text — names, emails, phone numbers, addresses, Social Security / national-ID numbers, credit-card numbers, bank accounts, and more — and returns the entity types and their locations, so you can redact or mask them. This matters in UGC for two reasons: users doxx each other (posting someone's private data is a policy and often legal violation), and users leak their own sensitive data that you do not want to store or display. Comprehend can run synchronously for real-time redaction or as an async batch job over large volumes. Bedrock Guardrails also does PII detection, so on text you can use either; Comprehend is the specialist when PII is the primary concern or when you are moderating text outside a Bedrock call.
| Service | Modality | What it returns | Sync / async | Decision style | Primary role in the pipeline |
|---|---|---|---|---|---|
| Bedrock Guardrails (ApplyGuardrail) | Text | Block/allow + which policy triggered + redactions | Synchronous | Configurable filters (tunable strength) | Fast first-pass on the clear-cut cases |
| LLM classifier on Bedrock | Text | Structured label + severity + rule + rationale (JSON) | Synchronous (stream or not) | Policy-aware model judgment (your prompt) | Nuanced classification of the gray zone |
| Rekognition Content Moderation | Image + video | Hierarchical moderation labels + confidence (+ timestamps for video) | Image: sync · Video: async job | Vision model + your confidence threshold | All visual moderation |
| Amazon Comprehend | Text | PII entity types + locations (to redact/mask) | Sync or async batch | NLP entity detection | PII / doxxing detection + redaction |
A production moderation system is an event-driven pipeline: content arrives, it is classified by the relevant service(s), a decision engine maps the results to an action, and anything uncertain goes to humans. Understanding each stage is what lets you debug a system that is letting bad content through or over-blocking good content.
It helps to think of the pipeline as four logical stages: intake → classify → decide → act, with a human-review loop hanging off the "decide" stage for borderline cases. The same logical stages run whether you moderate synchronously (pre-publish) or asynchronously (post-publish); only the triggering and the latency budget change, which is the subject of the next section.
Text typically enters through your application API and is moderated inline (a call to Guardrails / an LLM before the content is committed) or dropped onto a queue. Images and video almost always land in an Amazon S3 bucket first (direct-to-S3 upload via a presigned URL), which is convenient because an S3 object-created event can fire the rest of the pipeline automatically. The two common triggers are a synchronous API call (your app calls the moderation step and waits) or an event (S3 → Amazon EventBridge or S3 notifications → AWS Lambda), which is the backbone of the asynchronous path.
An orchestrator — a Lambda function for simple flows, or AWS Step Functions when there are multiple steps, retries, and branching — routes each item to the relevant detector(s): text to Guardrails and, if needed, the LLM classifier and Comprehend; images and video to Rekognition. Step Functions is the right tool once the flow has real structure (run Guardrails first, only call the expensive LLM on what Guardrails did not resolve, run Rekognition and Comprehend in parallel for a post that has both an image and a caption), because it gives you visual orchestration, built-in retries, and per-execution audit history for free.
The detectors return scores and labels; the decision engine (your code, usually in the orchestrating Lambda/Step Function) maps them to an action using your policy: thresholds per label, severity tiers, and the action each tier triggers (covered in detail in the next two sections). This is the heart of the system and the part no AWS service provides for you — it encodes your risk tolerance. Its output is a single decision per item: allow, redact-and-allow, hold-for-human-review, or reject.
On allow, the content publishes (or stays published). On redact, the PII or banned terms are masked and then it publishes. On reject, it is blocked and the user is notified. On hold-for-review, the item goes to a human-review queue — Amazon Augmented AI (A2I) provides a managed human-review workflow that integrates with Rekognition and Comprehend, or you can route to your own internal moderation tool (an SQS queue feeding a review UI). Every decision is written to a durable store (DynamoDB for state, S3 for the audit log) so you have a complete trail for trust-and-safety reporting and appeals.
| Stage | What it does | Typical AWS services |
|---|---|---|
| Intake | Content enters; trigger the pipeline | API Gateway / your app · S3 (uploads) · EventBridge / S3 notifications |
| Orchestrate | Route each item to the right detectors | Lambda (simple) · Step Functions (multi-step, retries, branching) |
| Classify — text | Policy + nuanced classification + PII | Bedrock Guardrails (ApplyGuardrail) · Bedrock LLM · Comprehend |
| Classify — image/video | Visual moderation labels + confidence | Rekognition Content Moderation (DetectModerationLabels / StartContentModeration) |
| Decide | Map scores/labels → severity tier → action | Your logic in Lambda / Step Functions |
| Human review | Route borderline items to a human | Amazon A2I (managed) · or SQS → your review UI |
| Act + record | Enforce decision; store state + audit trail | DynamoDB (state) · S3 (audit log) · SNS/SQS (notify) |
The first real architectural decision is not which model or which threshold — it is whether you moderate content <em>before</em> it is visible (synchronous, pre-publish) or <em>after</em> (asynchronous, post-publish). This single choice determines your latency budget, your cost, and your risk exposure, and most mature systems end up running both.
The honest framing: match the mode to the risk of the surface, not to a blanket rule. Pre-publish everything and you add latency and cost to every action and frustrate users; async everything and harmful content — including content that is illegal or dangerous — is briefly live for everyone to see. Neither extreme is right for a real product. You segment your surfaces by risk and apply the appropriate mode to each.
In the synchronous path, your application calls the moderation step and waits for the decision before the content becomes visible. A post is held until Guardrails / the LLM clears it; an uploaded image is not shown until Rekognition returns. This is the right choice for high-risk surfaces where even brief exposure is unacceptable: a profile photo on a dating or children's app, a public broadcast, a first message between strangers, anything where the harm of one visible violation is severe (legal exposure, user safety, brand-ending). The cost is latency — you are adding a model call (tens to low hundreds of milliseconds for Guardrails, more for an LLM, and image moderation on top) to the critical path of the user's action — and higher per-item cost, because you moderate everything up front, including the large majority that is benign.
In the asynchronous path, content publishes immediately and moderation runs after on an event-driven queue: the upload lands in S3, an event fires a Lambda/Step Function, the detectors run, and if the content turns out to violate policy it is taken down retroactively (and the user actioned). This is the right choice for lower-risk, high-volume surfaces — comments on an established community, a second-degree feed, bulk back-catalog scanning — where the friction of pre-publish is not worth it and a short visibility window for the rare violation is tolerable. It is lower-latency for the user (their action is instant) and often cheaper to operate (you can batch, use lower-priority compute, and the async Rekognition Video and Comprehend batch APIs are built for it). The trade is the visibility window: bad content is live until the pipeline catches it, which for the worst categories is unacceptable — which is exactly why you do not use async for the riskiest surfaces.
In practice, mature moderation is a hybrid. You run synchronous moderation on the few highest-risk fields and surfaces (avatars, public-facing first posts, anything legally sensitive) and asynchronous moderation on the high-volume long tail. You can also tier within the synchronous path: run the cheap, fast Guardrails check inline (it is fast enough), and if it returns "uncertain," publish optimistically but enqueue the expensive LLM/image check async, taking the item down if it fails. This gives you the safety of pre-publish where it matters and the cost/latency of async everywhere else — the best of both, at the price of a more complex pipeline.
Segment surfaces by risk. Synchronous (pre-publish) for anything where one visible violation is unacceptable — avatars, first contact, public broadcasts, regulated/illegal categories. Asynchronous for the high-volume long tail where instant posting matters and a short visibility window for rare violations is tolerable. Most systems run both, and tier the fast cheap check inline with the expensive check deferred.
Detection gives you scores and labels; moderation requires decisions. The bridge is a <strong>severity-tier model</strong>: you bucket every possible detection into tiers, and each tier maps to exactly one automated action. This is the policy heart of the system, and getting the tiers and thresholds right is most of the work.
The reason you need tiers rather than a single block/allow line is that confidence and severity are continuous and your tolerance is not uniform. A 99%-confidence Explicit label and a 55%-confidence "Visually Disturbing" label should not get the same treatment; neither should an unambiguous slur and a borderline-sarcastic insult. Tiers let you act decisively on the clear cases and route the genuinely uncertain ones to humans, instead of forcing a binary on inherently graded signals. A common four-tier model:
Two principles make tiering work in practice. First, the thresholds are per-label, not global — your auto-reject confidence for Explicit imagery should be different from your auto-reject confidence for Alcohol, and some labels may have no auto-reject tier at all (always route to human). Second, start strict on the worst categories and loosen with evidence: it is easier to relax a threshold after observing false positives than to explain why dangerous content was visible. Calibrate the thresholds against a labeled sample of your real content and revisit them as the human-review queue tells you where the tiers are mis-set.
| Tier | Trigger | Action | Human involved? | User notified? | Example |
|---|---|---|---|---|---|
| 0 — Allow | No violation / below action threshold | Publish | No | No | Benign comment or photo |
| 1 — Redact-and-allow | Removable issue only (PII, banned term) | Mask span, then publish | No | Sometimes | A phone number in a comment |
| 2 — Hold for review | Uncertain band / nuanced verdict | Withhold or flag → human queue | Yes | On final decision | Possible harassment in context |
| 3 — Auto-reject | High-confidence severe violation | Block + notify (+ account action) | Optional (sampled audit) | Yes | High-confidence explicit image |
Automation handles the clear cases; humans handle the rest. A moderation system without a human path is either dangerously permissive or maddeningly over-strict. Two human workflows matter: the <strong>review queue</strong> for Tier-2 items, and the <strong>appeals flow</strong> for users who think an automated decision was wrong.
Tier-2 items — the uncertain band — flow to a queue where a human moderator makes the call. On AWS, Amazon Augmented AI (A2I) provides a managed human-review workflow: it integrates directly with Rekognition and Comprehend, lets you define activation conditions (e.g. "send to a human when the moderation confidence is between 50% and 90%"), supplies a review UI, and routes work to a private workforce (your own team), a vendor workforce, or Amazon Mechanical Turk. Alternatively, many teams build their own: the orchestrator drops uncertain items onto an SQS queue that feeds an internal moderation console. Either way, the design goals are the same — prioritize by risk (a possible child-safety issue jumps the queue), give reviewers context (the content, the detector scores, the user's history, the specific policy in question), capture the decision and reason (for audit and for training data), and track queue depth and latency so the backlog never becomes the bottleneck that makes async moderation effectively never-happens.
Automated moderation will get things wrong — false positives are inevitable — so a credible system gives users a way to contest a decision. The appeals flow is a second human-review path with a different entry point: a user whose content was rejected (or account actioned) requests review, the item re-enters a queue (usually with elevated priority and shown to a different reviewer than any who already touched it), and a human upholds or overturns the decision. Two things make appeals work: transparency (tell the user which policy their content was actioned under, so the appeal is informed rather than a shot in the dark) and a clean audit trail (every state transition — detected, actioned, appealed, re-reviewed, final — recorded, because appeals are exactly where you need to reconstruct what happened and why). Increasingly this is not optional: regulatory regimes (for example, online-safety and platform-transparency rules) expect platforms to offer notice and an appeals mechanism, so the appeals flow is both good product and a compliance requirement.
Human decisions are not just dispositions; they are labeled data. Every Tier-2 call and every overturned appeal is a signal about where your thresholds and your LLM-classifier prompt are mis-calibrated. The mature pattern feeds these back: track the rate at which humans overturn auto-decisions (a rising overturn rate on a label means the threshold is wrong), use confirmed violations and confirmed false-positives to refine the classifier prompt or policy, and keep a labeled regression set so a prompt or threshold change can be measured rather than guessed at. The human queue, in other words, is also your evaluation and improvement engine.
Tier-2 items routed to a queue (Amazon A2I or your own SQS-fed console), prioritized by risk, with reviewers given full context (content + scores + user history + policy). A separate appeals path with a different reviewer and elevated priority. A complete audit trail of every state transition. And a feedback loop that turns human decisions into threshold and prompt improvements. Track queue depth and overturn rate as first-class metrics.
Three operational realities separate a moderation demo from a production system: how you encode and version your policy, whether the synchronous path fits your latency budget, and whether the bill survives real volume. Each has a concrete AWS answer.
Your moderation policy is spread across several configurable artifacts, and the discipline is to treat all of them as versioned code. The Bedrock Guardrail is a versioned object — define it in infrastructure-as-code and reference a specific version. The LLM-classifier prompt (which encodes your community standards in natural language) belongs in version control and should be evaluated against a labeled set on every change. The Rekognition confidence thresholds and the label-to-tier mapping are configuration that should live in code or a config store, not be hard-coded or hand-set in a console. The payoff is the same as any change-control: you can roll back a policy change that caused a spike in false positives, and you can show an auditor exactly which policy was in force when a given decision was made. A vague policy spread across un-versioned console settings is the most common reason a moderation system drifts.
For pre-publish moderation, latency is on the critical path of a user action, so you have to budget it. Rough shapes, representative as of 2026 (measure your own): a Guardrails text check is fast — typically tens to low-hundreds of milliseconds; an LLM classification call is the slow step — hundreds of milliseconds to a couple of seconds depending on model and content length; Rekognition image moderation is sub-second per image; Rekognition video is inherently asynchronous (it processes over the clip's duration) so it never belongs in a synchronous budget. The levers to stay in budget: run the cheap fast Guardrails check first and only call the LLM on what it does not resolve; pick a smaller/faster model (e.g. Amazon Nova Micro/Lite) for the classifier when nuance allows; run independent checks (image + caption) in parallel; and for anything that cannot meet the budget synchronously, push it to the async path and publish optimistically. If the synchronous budget cannot be met, that is itself a signal to move that surface to async.
Moderation cost scales with volume across a few independent line items, and the surprise is usually the LLM-classifier and image volume rather than Guardrails. Bedrock Guardrails bills on the text evaluated; the LLM classifier bills on input+output tokens per call (the biggest text lever — which is why you gate it behind Guardrails and use a small model); Rekognition bills per image analyzed and per minute of video processed; Comprehend bills per unit of text. The levers at scale: moderate async and batch where you can (cheaper than per-call synchronous, and Bedrock batch inference is roughly half price for offline classification); gate the expensive steps (LLM only on the gray zone, not on everything Guardrails already cleared or rejected); down-sample where defensible (you may moderate every image but only a sample of low-risk text); and tier model choice (a cheap model for the first classification pass, a frontier model only for escalations). Figures move — confirm current rates on the AWS pricing pages for Bedrock, Rekognition, and Comprehend, and see the amazon-bedrock-pricing sibling for the model-token side.
| Cost line | Billed on | Typically the cost driver? | Main lever to control it |
|---|---|---|---|
| Bedrock Guardrails | Text evaluated (per unit) | Usually small | It is already the cheap first pass — keep it first |
| LLM classifier (Bedrock) | Input + output tokens per call | Often the largest text line | Gate behind Guardrails; small model; batch offline (~½ price) |
| Rekognition image | Per image analyzed | Large at high upload volume | Moderate every image but cache/skip duplicates |
| Rekognition video | Per minute of video processed | Large for video-heavy products | Sample frames / segments where policy allows; async only |
| Comprehend (PII) | Per unit of text | Usually modest | Batch async; or use Guardrails PII on the Bedrock path instead |
Here is the fastest credible path from zero to a working text-and-image moderation pipeline on AWS, structured so you can ship a synchronous text check first and layer in images, async, and human review as you go.
The two modalities are different enough that they shape the whole architecture — different services, different sync/async defaults, different cost units, different failure modes. This comparison is the one that decides how you split your pipeline.
| Dimension | Text moderation | Image / video moderation |
|---|---|---|
| Primary AWS service(s) | Bedrock Guardrails + an LLM classifier on Bedrock (+ Comprehend for PII) | Amazon Rekognition Content Moderation |
| What it returns | Block/allow + a structured {decision, severity, rule} verdict; PII entities | Hierarchical moderation labels + confidence (+ timestamps for video) |
| Decision style | Configurable filters + policy-aware model judgment (your prompt) | Vision model output + your per-label confidence thresholds |
| Sync vs async default | Synchronous-friendly (Guardrails is fast; LLM is the slow part) | Image: synchronous · Video: inherently asynchronous (job-based) |
| Latency shape | Guardrails: tens–hundreds ms · LLM: hundreds ms–seconds | Image: sub-second · Video: processes over the clip duration |
| Cost unit | Per unit of text (Guardrails / Comprehend) + per token (LLM) | Per image · per minute of video |
| Hardest part | Nuance, context, coded language, sarcasm — prompt + policy design | Threshold calibration per label; benign-context false positives |
| Common pitfall | Running the expensive LLM on everything instead of gating behind Guardrails | One global threshold for all labels; not deduplicating uploads |
Situation: Every listing combined free-text and multiple images, and the team needed to keep explicit imagery, harassment, scams, and contact-info doxxing off the marketplace before it damaged trust — without hiring a moderation team on day one. They wanted high-risk fields (the primary listing photo and the first public message) checked pre-publish, but could not afford to add seconds of latency to every post or to run an expensive model on all of the benign long tail. They also had no human-review or appeals story, and the projected Bedrock + Rekognition bill at their target volume made the founder hesitate to start.
What CloudRoute did: CloudRoute matched them in under 24 hours to an AWS partner with a trust-and-safety / GenAI track record. The partner built an event-driven pipeline orchestrated with Step Functions: incoming text hit a Bedrock Guardrail (content filters + denied topics + profanity + PII-redact) as a fast synchronous first pass, with a small Bedrock model (Nova-class) classifying only the gray-zone text into {allow, review, reject} against the marketplace policy; photos uploaded to S3 and were run through Rekognition Content Moderation, synchronously for the primary photo and asynchronously (S3 → EventBridge → Lambda) for the rest; Amazon Comprehend backstopped PII/contact-info detection. A four-tier severity engine mapped results to allow / redact / hold / reject, Tier-2 items routed to Amazon A2I for human review with confidence-based activation, and an appeals path plus a full DynamoDB/S3 audit trail were wired in. The partner filed a Bedrock POC credit application plus an Activate application to fund the build and the first months of inference.
Outcome: The marketplace launched with explicit imagery and clear-cut text violations auto-rejected, PII and contact info redacted, the genuinely borderline cases flowing to a small human queue instead of slipping through, and users able to appeal. High-risk fields were moderated pre-publish without adding latency to the benign majority (gated behind the fast Guardrails pass and the async photo path). The build and the early inference ran entirely on the approved AWS credits, so it cost $0 out of pocket. CloudRoute's commission was paid by the partner from AWS engagement funding, not by the customer.
pipeline: Guardrails + Nova classifier + Rekognition + Comprehend + A2I · sync on high-risk, async on the tail · four-tier severity + appeals · credits: POC + Activate · out-of-pocket: $0
CloudRoute routes you to a vetted AWS trust-and-safety / GenAI partner who designs and ships the whole pipeline — Bedrock Guardrails and an LLM classifier for text, Rekognition for image and video, Comprehend for PII, the severity engine, the human-review queue, and appeals. AWS credits fund the build and the inference. You pay $0.