AWS gives you two translation engines, and most of the cost and quality outcome rides on picking the right one for each job. This is the full how-to: Amazon Translate (purpose-built neural machine translation — fast, cheap, 75+ languages) versus an LLM on Amazon Bedrock (Claude, Nova, Llama — context-aware, tone-controlled, document-aware), exactly when each one wins, the hybrid pattern that uses both, how to lock quality with custom terminology and glossaries and human review, how to translate millions of strings in batch for a fraction of the cost, the reference architecture end to end, and how to localize a real multilingual app.
Translation on AWS is not one service. There are two fundamentally different engines, and almost every cost overrun or quality complaint traces back to using the wrong one for the job at hand.
Amazon Translate is a purpose-built neural machine translation (NMT) service. It does exactly one thing — convert text from a source language to a target language — and it does it fast, cheaply, and at scale across 75+ languages and thousands of language pairs. You call a TranslateText API for real-time work, or submit an asynchronous batch job over a folder of documents in Amazon S3. It supports custom terminology (force specific terms to translate a fixed way), parallel-data customization (Active Custom Translation, which adapts output to your own example translations), automatic source-language detection, and formality and profanity-masking controls on supported pairs.
An LLM on Amazon Bedrock — Claude, Amazon Nova, Llama, Mistral and others — is a general foundation model that translates as one of many things it can do. You do not call a translate endpoint; you write a prompt: "Translate the following from English to German. Maintain a formal tone, keep the brand name unchanged, use this glossary, and preserve the Markdown formatting." Because the model reasons over the whole input, it can handle context across sentences, choose register and tone, follow a domain glossary expressed in natural language, resolve ambiguity from surrounding text, translate inside structured formats (HTML, Markdown, JSON, XLIFF) without breaking the markup, and explain or flag uncertain choices.
The two engines sit at different points on the same trade-off curve. Amazon Translate optimizes for throughput, latency and cost — it is the right tool when you have a lot of text, you need it now or cheaply, and the content is relatively plain. A Bedrock LLM optimizes for nuance and control — it is the right tool when the quality of a smaller, higher-stakes set of translations depends on context, tone, terminology adherence, or document structure that a sentence-level NMT engine does not see.
This is not Translate-versus-Bedrock as a company-wide religious choice. The mature pattern is to route each job to the engine that fits it, and often to combine them. The next section is the decision rule; the section after that is the hybrid pattern that uses both.
Use Amazon Translate when volume, speed and cost dominate and the text is plain; use a Bedrock LLM when context, tone, formality, glossary nuance or document structure decide the outcome; use a hybrid when you have both at once — Translate for the bulk, an LLM for the hard slice.
The choice is per job, driven by the content and the stakes — not by the company. Five questions settle almost every case: how much text, how fast, how cheap, how nuanced, and how structured.
Amazon Translate wins on the first three axes (volume, latency, cost); a Bedrock LLM wins on the last two (nuance, structure). Map your job against the axes and the engine usually picks itself.
Volume is high and cost matters. Translating millions of support tickets, product reviews, user-generated content, or a large content catalogue is where Translate's per-character pricing and batch mode crush LLM token costs — often by an order of magnitude or more.
Latency matters. Real-time use cases — live chat translation, in-app translate buttons, instant ticket routing — need a fast, low-latency call. Amazon Translate returns in milliseconds; a frontier LLM is slower per request.
The text is relatively plain and self-contained. Sentence-level content where each segment can be translated without deep cross-document context (chat messages, short reviews, UI strings with adequate context, notifications) is squarely Translate's sweet spot — especially with a custom-terminology glossary applied.
You want a managed, deterministic service. Translate is a single-purpose API with predictable behaviour, simple scaling, and customization via parallel data — no prompt engineering, no model selection.
Context across the document changes the right answer. Pronoun resolution, ambiguous terms, callbacks to earlier sentences, and consistency of a chosen translation across a long document are things an LLM sees and a sentence-by-sentence NMT engine can miss.
Tone, formality and brand voice matter. Marketing copy, brand messaging, and customer-facing content where register (formal vs informal "you", e.g. Sie/du, vous/tu), idiom, and house style are part of the deliverable — the LLM takes a natural-language style brief and a glossary in the same prompt.
The content is structured or mixed. Translating inside Markdown, HTML, JSON, XLIFF or code-adjacent text while leaving tags, placeholders ({{name}}, %s), URLs and code untouched is something an instructed LLM does well and a naive pipeline mangles.
Domain nuance and glossary fidelity are critical. Legal, medical, and financial text where a mistranslated term has real consequences benefits from an LLM that can be told the domain, handed a glossary, and asked to flag low-confidence passages for human review.
The language is low-resource or the task is more than translation. When you also need transcreation (adapt, don't literally translate), summarise-then-translate, or translate-plus-explain, the LLM does it in one pass.
Per job, ask: would a fluent bilingual clerk handle this fine (high-volume, plain, fast) → Amazon Translate; or does it need a bilingual copywriter / domain expert who reads the whole thing and cares about tone and terms (nuanced, structured, high-stakes) → Bedrock LLM. When a corpus contains both, split it and route each part.
The highest-value production systems rarely pick one engine for everything. They use Amazon Translate for the economical bulk and a Bedrock LLM for the slice where nuance pays — getting most of the cost advantage of NMT and most of the quality advantage of an LLM.
There are three established ways to combine the engines. They are not mutually exclusive; a large localization system often uses all three on different content types.
Run the whole corpus through Amazon Translate for a fast, cheap first pass, then send the output (plus the source) to a Bedrock LLM with an instruction to polish: fix tone, enforce the glossary, smooth idiom, and adapt register. This is sometimes called MT post-editing done by a model instead of a human. You pay full Translate cost on everything but only LLM cost on a refinement pass — and because the LLM is editing rather than translating from scratch, you can often use a smaller, cheaper model and shorter outputs.
A variant routes only the segments most likely to need help to the LLM: marketing strings, long passages, or anything where a quality classifier or simple heuristic (length, presence of idiom, low Translate confidence) flags risk. The plain majority ships straight from Translate untouched.
Classify each piece of content before translating and send it to the engine that fits: support tickets, UGC, logs and chat → Amazon Translate; marketing pages, legal clauses, product names and brand copy → Bedrock LLM. The routing key can be as simple as the content's source system or a metadata tag, or as rich as an LLM/classifier deciding per item. This keeps the expensive engine off the 90% of volume that does not need it.
Use Amazon Translate to produce the translation and a Bedrock LLM as an automated reviewer: ask the model to score the translation for accuracy, fluency, terminology adherence and tone, and to flag or correct only where it falls short. The model becomes an LLM-as-a-judge over MT output, cheaply triaging which translations are good enough to ship and which need correction or human review — far cheaper than human-reviewing everything.
A pure-LLM pipeline over millions of strings is expensive and slow; a pure-Translate pipeline misses tone and nuance on the content where it matters most. The hybrid — Translate for the bulk, an LLM for the hard slice (refine, route, or judge) — captures ~90% of the cost advantage and ~90% of the quality advantage at once. Most serious localization on AWS lands here.
Translation quality is not something you hope for; it is something you engineer. Four levers do most of the work: a glossary the engine must obey, customization on your own data, a confidence-aware human-in-the-loop, and an evaluation set you score on every change.
Every brand has terms that must translate a fixed way (or not at all): product names, feature names, legal terms, units, and house-style choices. On Amazon Translate this is custom terminology — you upload a CSV/TMX term list and Translate forces those mappings on every job, in real-time and batch. On a Bedrock LLM the same glossary goes into the prompt ("Use exactly these translations for these terms; never translate the brand name"), and because the model reads context it can apply terms more flexibly (right inflection, right case) than a literal find-replace. A shared, version-controlled glossary feeding both engines is what keeps a hybrid pipeline consistent.
If you have past human translations, use them. Amazon Translate's Active Custom Translation (parallel-data customization) adapts output toward your example translations at request time without training a bespoke model — point it at parallel data in S3 and translations shift toward your style and terminology. With a Bedrock LLM, the analogue is few-shot prompting (include several of your best source→target examples in the prompt) or light fine-tuning for a consistent house style at scale. Either way, your existing translation memory is an asset — feed it in rather than starting cold.
No automated translation is perfect, and the right move is not to human-review everything — it is to review the slice where errors are expensive. Tag content by risk (a legal disclaimer is high-stakes; a forum post is not), auto-ship the low-risk majority, and route the high-risk minority to human reviewers. Amazon Augmented AI (A2I) provides a managed human-review workflow you can wire into the pipeline; a confidence signal (an LLM judge's score, Translate output heuristics, or back-translation agreement) decides what gets escalated. This is how you get near-human quality on the content that matters without paying for human review on everything.
Build a fixed evaluation set: a few hundred representative source segments with reference translations (and ideally human ratings). Score every pipeline change so you know whether a new glossary, a model swap, or a chunking tweak actually helped. Use both automatic metrics (BLEU, chrF, COMET for reference-based scoring) and an LLM-as-a-judge on Bedrock for accuracy/fluency/terminology, and keep a small human spot-check because automatic metrics miss domain-specific errors. The discipline mirrors any ML system: a golden set, automated scoring, and a number that moves when you turn a knob.
In order of leverage: (1) a shared glossary both engines obey · (2) customization on your own parallel data / few-shot examples · (3) a confidence signal that routes the risky slice to human review (A2I) · (4) a fixed evaluation set scored on every change. Most "the translations are wrong" complaints are a missing glossary or no review routing — not the engine.
Translation bills scale directly with volume, so two things decide the number: which engine you run and whether you run the bulk in batch. Get both right and large-scale localization is surprisingly affordable; get them wrong and an LLM pass over millions of strings is eye-watering.
For bulk work, asynchronous batch is the default, not an optimization. Amazon Translate offers an async batch API that translates a whole folder of documents in S3 in one job — ideal for back-catalogues, document sets, and nightly content syncs. Amazon Bedrock offers batch inference (submit a large job, get results back asynchronously) at roughly 50% of on-demand token price — the right way to run any LLM translation that does not need a real-time answer. Reserve real-time/on-demand calls for genuinely interactive translation.
The figures below are representative as of 2026 to show the shape of the bill, not a quote — always check the AWS pricing page (and per-model Bedrock pricing) for current rates. The headline: Amazon Translate is priced per character and is dramatically cheaper per unit of text; a Bedrock LLM is priced per token and buys you nuance at a higher unit cost. Batch and prompt caching are the main levers on the LLM side.
Route by need: send the plain majority to Amazon Translate and reserve the LLM for the slice that needs it — the single biggest saving. Batch everything that can wait: Translate async batch and Bedrock batch inference (~50% off). Cache: Bedrock prompt caching means the static system prompt and glossary are not re-billed on every call. Right-size the model: a smaller Nova/Claude tier handles refinement and judging cheaply; reserve a frontier model for genuinely hard passages. Don't re-translate: cache results and maintain a translation memory so unchanged strings are never paid for twice.
| Engine / mode | Priced by | Relative unit cost | Latency | Best for |
|---|---|---|---|---|
| Amazon Translate — real-time | Per character (~$15 / million chars) | Lowest | Milliseconds | Live chat, in-app translate, ticket routing |
| Amazon Translate — async batch | Per character (same rate, bulk job) | Lowest | Minutes–hours (job) | Back-catalogues, document sets, bulk content |
| Amazon Translate — custom (ACT) | Per character (higher tier) + parallel data | Low–moderate | Milliseconds | On-brand bulk translation with your own examples |
| Bedrock LLM — on-demand | Per input + output token, per model | Higher | Seconds | Interactive, nuanced, structured, high-stakes |
| Bedrock LLM — batch inference | Per token, ~50% of on-demand | Moderate | Async (job) | Bulk nuanced translation that can wait |
A production translation pipeline is a small set of managed services wired together. The same skeleton serves both real-time and bulk paths; you choose the engine per route and add the quality controls from section IV.
It helps to separate the real-time path (a user clicks "translate", a chat message arrives) from the bulk path (a folder of documents, a nightly content sync). They share glossary, evaluation, and storage; they differ in how work is triggered and which engine mode they call.
1. Request in. A client calls an Amazon API Gateway endpoint backed by AWS Lambda (or your existing service). The request carries the text, source/target languages, and content type.
2. Route. Lambda decides the engine from the content type and stakes: plain/short → Amazon Translate TranslateText (with custom terminology applied); nuanced/structured/high-stakes → a Bedrock LLM with the glossary and style brief in the prompt.
3. Glossary + post-process. Apply the shared glossary (custom terminology on Translate, or in-prompt on Bedrock), restore any placeholders/markup, and optionally run an LLM judge for a confidence score.
4. Escalate or return. Low-confidence high-stakes items go to an Amazon A2I human-review queue; everything else returns to the caller. Cache the result (e.g. DynamoDB / ElastiCache keyed on source+languages) so repeats are free.
1. Land in S3. Source documents/strings arrive in an S3 bucket (a content export, a CMS sync, an upload). Parse non-text formats (PDF/Word/HTML) to clean text first — Amazon Textract for scanned PDFs and tables.
2. Trigger a batch job. An S3 event (or a schedule via EventBridge) kicks off an Amazon Translate async batch job over the folder, or a Bedrock batch inference job for the nuanced subset, orchestrated with AWS Step Functions.
3. Glossary, review, store. Apply the glossary, route flagged segments to A2I, write translated output back to S3 (and into your translation memory / CMS). Score a sample against the golden set.
4. Publish. Push approved translations to the destination — a localized CMS, an app's string catalogue, or a data store — with each translation versioned and traceable to its source.
| Concern | Real-time path | Bulk path |
|---|---|---|
| Entry | API Gateway + Lambda | S3 drop + EventBridge / Step Functions |
| Plain text | Amazon Translate (TranslateText) | Amazon Translate (async batch) |
| Nuanced / structured | Bedrock LLM (on-demand) | Bedrock LLM (batch inference, ~50% off) |
| Glossary | Custom terminology / in-prompt | Custom terminology / in-prompt |
| Document parsing | — | Amazon Textract (scanned/structured) |
| Human review | A2I on flagged items | A2I on flagged segments |
| Storage / cache | DynamoDB / ElastiCache + S3 | S3 + translation memory / CMS |
The most common reason teams build translation on AWS is to localize a product into many languages without a manual translation agency for every release. Here is the practical flow, and where each engine fits.
App localization is mostly about strings with structure — UI labels, marketing pages, emails, and docs — full of placeholders, plurals, and formatting that must survive translation. That is why localization is a textbook hybrid: Amazon Translate for the high-volume, low-risk strings, and a Bedrock LLM for the brand-facing, structured, or nuance-heavy ones.
Don't translate your whole app with a frontier LLM — it is slow and expensive and most strings don't need it. Externalize strings, route the bulk through Amazon Translate with a glossary, send only the brand-facing and structured strings to a Bedrock LLM, protect placeholders, and keep a translation memory so each release only pays for what changed.
Most translation projects on AWS fail in the same handful of ways. None is exotic; each has a concrete fix that the reference architecture above already accounts for.
This is the comparison that decides each translation job. Read it as "default to Amazon Translate for plain bulk; reach for a Bedrock LLM when a row in the right column is what the job hinges on; combine them when both are true."
| Dimension | Amazon Translate (NMT) | Bedrock LLM (Claude / Nova / Llama) |
|---|---|---|
| What it is | Purpose-built neural machine translation API | General foundation model, prompted to translate |
| Languages | 75+ languages, thousands of pairs | Broad; varies by model (often strongest on high-resource) |
| Cost | Per character (~$15 / M chars) — lowest | Per token, per model — higher (batch ~50% off) |
| Latency | Milliseconds — real-time friendly | Seconds — slower per request |
| Context awareness | Largely sentence/segment level | Whole-document context, ambiguity resolution |
| Tone / formality / brand voice | Limited (some formality controls) | Strong — natural-language style brief in the prompt |
| Glossary / terminology | Custom terminology (CSV/TMX) | Glossary in-prompt, applied with context |
| Customization on your data | Active Custom Translation (parallel data) | Few-shot examples or fine-tuning |
| Structured content (HTML/MD/JSON) | Needs pre/post-processing | Handles markup + placeholders when instructed |
| Beyond translation (transcreate, summarize+translate) | No | Yes, in one pass |
| Best for | High-volume, fast, cheap, plain text | Nuanced, structured, high-stakes, brand-facing |
Situation: Needed to launch the product, marketing site, and support flow in 12 languages on a tight timeline. A manual translation agency quote was far over budget and too slow for continuous releases, while a first in-house attempt that pushed everything through a single LLM was expensive, slow on the support volume, and kept mangling placeholders in the UI strings. Brand and legal copy also read flat and off-tone. The one engineer who could build a real pipeline was committed to the core product, and the projected Bedrock + Translate bill made the founder hesitate to start.
What CloudRoute did: Routed within 24 hours to an EU-region AWS partner with a GenAI/ML and localization track record. The partner built a hybrid pipeline in eu-central-1: Amazon Translate (async batch, with a shared custom-terminology glossary) for the 40k UI strings and the high-volume support inbox; a Bedrock LLM (Claude, batch inference, placeholder-protected, with a style brief and glossary in-prompt) for the marketing site and legal copy; Amazon A2I human review on the brand and legal slice; an LLM-as-a-judge confidence score routing what got escalated; a translation memory so each release only paid for changed strings; and a 300-segment golden set scored with COMET plus an LLM judge. The whole engagement was funded by AWS credits the partner filed for — Activate Portfolio plus a Bedrock POC allocation.
Outcome: All 12 languages live in under 6 weeks, with continuous localization wired into CI so new strings auto-translate on every release. Support-inbox translation ran in real time at a fraction of the all-LLM cost; brand and legal copy cleared the team's tone bar after human review; placeholder breakage went to zero. The build and the first months of translation/inference ran on AWS credits — the customer paid $0. CloudRoute's commission was paid by the partner from AWS engagement funding.
engagement window: ~6 weeks · founder time: ~7 hours · stack: Amazon Translate (batch + custom terminology) + Bedrock LLM (Claude, batch) + A2I + translation memory · cost to customer: $0
CloudRoute routes you to a vetted AWS GenAI/ML partner who designs and ships the pipeline — Amazon Translate, a Bedrock LLM, or the hybrid; glossaries and custom terminology; batch for the bulk; human-in-the-loop where it counts; and continuous localization. AWS credits fund the build and the translation/inference. You pay $0.