bedrock fine-tuning · customization · 2026

Amazon Bedrock fine-tuning — the full 2026 guide.

A complete, neutral reference for customizing models on Amazon Bedrock: when to fine-tune versus use RAG, prompt engineering, continued pre-training, or model distillation; which models support fine-tuning; how to format and prepare JSONL training data; how to run a fine-tuning job; the Provisioned Throughput requirement to host a custom model (and the standing cost it implies); how to evaluate the result; and what the whole thing costs. Plus how AWS credits fund the training and the hosting so the build costs you $0.

training cost
one-time, $-hundreds
hosting a custom model
Provisioned Throughput
data format
JSONL
cost with credits
$0
TL;DR
  • Fine-tuning is one of five ways to adapt a model on Bedrock — and usually not the first one to reach for. The ladder is: prompt engineering → RAG (Knowledge Bases) → fine-tuning → continued pre-training → model distillation. Most teams should exhaust prompt engineering and RAG before fine-tuning, because those change behaviour with no training cost and no standing hosting bill.
  • Fine-tuning teaches a base model a style, format, or narrow skill from labelled examples — supplied as a JSONL file in S3 (prompt/completion pairs). You run a managed training job, get a private custom model in your account, then evaluate it against the base model. The training itself is a modest one-time charge (commonly tens to low-hundreds of dollars for typical datasets).
  • The cost that surprises everyone is hosting: serving a fine-tuned model on Bedrock requires Provisioned Throughput — a flat hourly charge that runs continuously while the model is deployed, whether or not you send it traffic. That standing cost often dwarfs the training. AWS credits (Activate up to $100K, Bedrock/GenAI POC $10K–$50K, GenAI Accelerator up to $1M) cover both training and hosting — CloudRoute routes you to the credit pool and a vetted AWS partner, so you pay $0.
definition

IWhat fine-tuning on Amazon Bedrock actually is

Fine-tuning takes a pre-trained foundation model and continues training it on your own labelled examples so it learns a specific style, format, or narrow task better than the base model does out of the box. On Bedrock the whole process is managed — you supply data, AWS runs the training, and you get a private custom model that lives in your own account.

A foundation model arrives knowing a great deal about language in general but nothing about your domain conventions, your output format, your tone of voice, or the specific way you want a task done. Fine-tuning closes that gap by showing the model many examples of the input it will see and the output you want, and nudging its weights toward producing that output. The result is a custom model: a private copy, derived from the base model, that behaves the way your examples taught it to.

On Bedrock this is delivered as a managed service. You do not provision GPUs, write a training loop, or manage checkpoints. You upload a training dataset to Amazon S3, point a fine-tuning job at a supported base model, set a few hyperparameters, and Bedrock handles the rest. When the job finishes, the custom model appears in your account — and, crucially, your training data and the resulting model stay private to your AWS account and region; they are not used to train the base model or shared with the model provider. That privacy posture is one of the main reasons teams fine-tune on Bedrock rather than on a general-purpose API.

It is important to separate fine-tuning from two things it is often confused with. It is not retrieval-augmented generation (RAG): RAG gives a model new facts at request time by retrieving documents and putting them in the prompt, while fine-tuning changes the model's behaviour and is poor at teaching fresh facts. And it is not the same as continued pre-training: fine-tuning learns from labelled prompt/response pairs (supervised), whereas continued pre-training learns from large volumes of unlabelled domain text. The next two sections lay out the full menu and how to choose.

One caveat, stated once and meant throughout: exact dollar figures, the precise list of fine-tunable models, and feature availability change frequently on Bedrock. The numbers here are representative as of 2026 to convey relative cost and the shape of the work. Always confirm the current fine-tunable model list and pricing on the official AWS Bedrock documentation and pricing pages before committing.

the one-line definition

Fine-tuning = continue training a base model on your labelled examples to change how it behaves (style, format, narrow skill). It does not reliably teach new facts — that is RAG's job. The output is a private custom model in your account, which then needs Provisioned Throughput to serve.

the menu

IIThe five ways to customize a model on Bedrock

Fine-tuning is one tool among five. Picking the wrong one is the most common and most expensive mistake teams make — fine-tuning to add facts (should have been RAG), or fine-tuning to fix a prompt (should have been prompt engineering). Here is the full menu, cheapest and lightest first.

Read these as a ladder. In most projects you climb it: start with prompt engineering, add RAG when the model needs your facts, and only fine-tune when behaviour still is not right after both. Continued pre-training and distillation are specialist rungs you reach for in specific situations, not defaults.

1. Prompt engineering — change behaviour with words, not weights

The lightest lever: improve the instructions, add few-shot examples, structure the system prompt, and constrain the output format. No training, no custom model, no hosting cost — you are simply asking the base model more precisely. With modern models, careful prompting plus a handful of in-context examples solves a surprising share of "we need to fine-tune" requests. Always exhaust this first; it is free to iterate and instant to deploy. On Bedrock, Prompt Management and the Converse API's system/tool fields are the tools here, and prompt caching keeps a long fixed prompt cheap.

2. RAG — give the model your facts at request time

Retrieval-augmented generation retrieves relevant chunks from your own documents and inserts them into the prompt so the model answers from your knowledge. This is the correct tool whenever the problem is "the model does not know our content / our docs / our latest data." On Bedrock, Knowledge Bases provide managed RAG (ingestion, chunking, embeddings, a vector store, and retrieval) so you do not build the pipeline yourself. RAG adds no training cost and no standing hosting bill — you pay for embeddings, the vector store, and the extra input tokens of the retrieved context. See the rag-on-aws and amazon-bedrock-knowledge-bases siblings.

3. Fine-tuning — teach a style, format, or narrow skill

Supervised fine-tuning trains the base model on labelled prompt/completion pairs to lock in a behaviour: a consistent JSON output shape, a brand voice, a domain-specific classification, a structured extraction the base model keeps getting subtly wrong. It is the right tool when you have good labelled examples of the exact task and prompting alone cannot make the behaviour reliable enough. It carries a one-time training cost and — the part to plan for — a standing hosting cost via Provisioned Throughput. The rest of this page is mostly about this rung.

4. Continued pre-training — adapt to a whole domain from raw text

Continued pre-training feeds the model large volumes of unlabelled domain text (legal corpora, medical literature, internal documentation) so it absorbs the vocabulary, patterns, and register of a specialist field. Unlike fine-tuning it does not need labelled input/output pairs — just a lot of representative text. It is heavier and more expensive than fine-tuning and is worth it when a domain's language is genuinely different from general text and you have a large corpus. Many teams pair it with a subsequent fine-tune (first teach the domain, then teach the task).

5. Model distillation — make a small model imitate a big one

Distillation uses a large, capable "teacher" model to generate high-quality outputs that train a smaller, cheaper "student" model to mimic it on a specific task. The payoff is at inference time: you get close to the big model's quality on a narrow task at the small model's cost and latency. Bedrock offers managed distillation that can even synthesize training data from your prompts. This is the right tool for a high-volume, narrow workload where frontier-model quality is needed but frontier-model per-token cost is not affordable at scale.

the decision

IIIWhich customization method should you use?

The single most useful thing on this page is a clear decision rule. Match the method to the <em>kind</em> of problem you have — not to which technique sounds most sophisticated. Fine-tuning is rarely the first answer and almost never the only one.

Diagnose by asking what is actually wrong with the base model's output, then read across to the right tool:

  • The model does not know our facts / content / latest data → RAG — If answers are wrong because the model lacks your knowledge (your docs, your product, recent events), retrieval is the fix. Fine-tuning is bad at teaching facts and goes stale the moment your data changes. Use Knowledge Bases.
  • The instructions or examples are not good enough yet → prompt engineering — If the model can clearly do the task but does it inconsistently, improve the prompt and add few-shot examples first. This is free and instant. Most "we need fine-tuning" tickets die here.
  • We need a consistent style / format / narrow skill, and prompting is not reliable enough → fine-tuning — If you have labelled examples and you need the behaviour locked in (a strict output schema, a brand voice, a niche classification), fine-tune. This is the genuine fine-tuning use case.
  • The whole domain language is alien to the base model, and we have a big text corpus → continued pre-training — If the field's vocabulary and register are far from general text (specialist legal, clinical, scientific) and you have lots of unlabelled domain text, continued pre-training adapts the base. Often followed by a fine-tune.
  • We need big-model quality on a narrow task but cannot afford big-model cost at scale → model distillation — High-volume, narrow workloads where a frontier model is too expensive per token: distill its behaviour into a small, cheap student model.
  • Combine them — the methods are not mutually exclusive — A common production shape is RAG for facts + light fine-tuning for output format + prompt caching for cost. Reach for the lightest combination that meets the bar, not the heaviest single technique.
the rule of thumb

Climb the ladder: prompt engineering → RAG → fine-tuning → continued pre-training → distillation. Each rung up costs more and adds operational weight. Fine-tune only when the cheaper rungs cannot make the behaviour reliable — and never to add facts (that is RAG).

methods at a glance

IVThe five methods, compared on what matters

The same five methods, lined up against the dimensions that drive the decision: what each one actually changes, whether it needs labelled data, what it costs to run, and whether it leaves you with a standing hosting bill.

bedrock customization methods compared · 2026
MethodWhat it changesData neededTeaches new facts?Run costStanding hosting cost?
Prompt engineeringBehaviour, via instructionsNone (maybe few-shot examples)Only what you paste inFree to iterateNo
RAG (Knowledge Bases)Facts available at request timeYour documents (unlabelled)Yes — your retrieved contentEmbeddings + vector store + extra input tokensNo (pay-per-use)
Fine-tuningBehaviour: style, format, narrow skillLabelled prompt/completion pairs (JSONL)Poorly — not its jobOne-time training chargeYes — Provisioned Throughput
Continued pre-trainingDomain language / registerLarge unlabelled domain corpusSomewhat (absorbs domain text)Higher one-time training chargeYes — Provisioned Throughput
Model distillationSmall model imitates a big onePrompts (teacher can synthesize)Inherits teacher's behaviourTraining charge (teacher + student)Depends on how the student is served
The standing-hosting column is the one teams overlook. Prompt engineering and RAG have no standing model-hosting bill; fine-tuning and continued pre-training produce a custom model that must be served on Provisioned Throughput, an hourly charge that runs whether or not you use it (see §VII). Representative for 2026 — confirm specifics on the AWS Bedrock docs.
what you can tune

VWhich models support fine-tuning on Bedrock

Not every model on Bedrock is fine-tunable, and the set changes as providers add and retire support. The durable rule: fine-tuning support is most common on Amazon's own models and several open-weight families; for some third-party frontier models, customization is offered through different mechanisms or not at all.

As a practical 2026 guide, fine-tuning (and in several cases continued pre-training) has typically been available for Amazon's own models — the Amazon Nova family and the Amazon Titan text models — and for open-weight families such as Meta Llama and Cohere models. Support for these is the most stable bet. Amazon-built models also tend to expose both supervised fine-tuning and continued pre-training, which is why they are a frequent starting point when a team specifically wants to own a customized model.

For some third-party frontier models, the provider may not expose weight-level fine-tuning on Bedrock at all, or may offer customization only through a separate managed path. In those cases the right move is usually to get the behaviour you need through prompting, RAG, and (where supported) distillation rather than classic fine-tuning. If your plan depends on tuning a specific named model, verify it is on the current fine-tunable list before you design around it — this is the most common place a customization plan breaks.

There is also a capacity dimension. Because a fine-tuned model is served on Provisioned Throughput (covered in §VII), the model you choose to tune affects the hosting cost: a small, efficient base model costs less per hour to host than a large one. For high-volume narrow tasks this is a reason to tune (or distill into) a smaller model rather than a frontier one — you get the customized behaviour and a far cheaper standing bill.

check the list first

The fine-tunable model set changes. Amazon Nova, Amazon Titan, and open-weight families (Llama, Cohere) have been the most reliable to fine-tune on Bedrock; some frontier models offer customization only via other paths. Confirm your target model is currently fine-tunable on the AWS docs before building your plan around it.

training data

VITraining data: format, preparation, and JSONL

Fine-tuning quality is mostly a data problem. The model can only learn what your examples demonstrate, so the format and the curation of the dataset matter more than any hyperparameter. Bedrock expects training data as a JSONL file in Amazon S3 — one labelled example per line.

The format is JSONL — "JSON Lines" — a plain-text file where each line is a single, self-contained JSON object describing one training example. For supervised fine-tuning of a text model, each line is a prompt/completion pair: the input the model will see and the exact output you want it to produce. Conceptually each line looks like {"prompt": "<the input text>", "completion": "<the desired output>"} (the exact field names and structure depend on the model and the task — chat-formatted models use a messages-style schema; check the AWS docs for the schema your chosen model expects). You upload the file (and usually a separate validation file) to an S3 bucket in the same region as the job.

The examples must mirror production. The single biggest determinant of a good fine-tune is that the training prompts look like the prompts the model will actually receive, and the completions look exactly like the output you want back — same format, same length profile, same tone. If production prompts include a system instruction and retrieved context, your training examples should too. A clean dataset of a few hundred to a few thousand high-quality, representative pairs typically beats a much larger noisy one.

Curation and hygiene matter. De-duplicate examples, remove contradictions (two near-identical prompts with different desired outputs confuse the model), balance the classes or formats you care about, and strip anything you would not want the model to imitate. Hold out a validation set the model does not train on so you can measure generalization rather than memorization. And because the data leaves your hands into a training process, scrub or tokenize PII and secrets you do not want baked into a model.

Practically, getting from "raw logs / spreadsheets / documents" to a clean JSONL dataset is where most of the human effort in a fine-tuning project goes. It is also exactly the kind of work a vetted AWS ML partner does efficiently — and, because the engagement is credit-funded, without the customer paying for it (see §IX).

  • One JSON object per line; no commas between lines; UTF-8 encoded.
  • A prompt/completion (or messages) pair per line, matching the schema your chosen model expects.
  • Training prompts that look like real production prompts — including any system instruction and context.
  • A separate validation file held out from training to measure generalization.
  • De-duplicated, contradiction-free, class-balanced examples; PII and secrets removed.
  • Files uploaded to an Amazon S3 bucket in the same region as the fine-tuning job.
the job + the catch

VIIRunning a fine-tuning job — and the Provisioned Throughput catch

With a clean dataset in S3, running the job itself is straightforward. The part that determines whether fine-tuning is economically sensible is what happens after the job succeeds: to actually serve your custom model, Bedrock requires you to buy Provisioned Throughput — a standing hourly cost.

Running the job

In the Bedrock console (or via API/SDK) you create a model customization job: choose the base model, point it at your training (and validation) data in S3, name the output custom model, set a few hyperparameters — typically the number of epochs (passes over the data), learning-rate multiplier, and batch size — and give the job an IAM role that can read your S3 data and write the result. Bedrock provisions the training infrastructure, runs the job, and reports training and (if you supplied validation data) validation loss metrics when it finishes. Jobs run from minutes to hours depending on dataset size and epochs. The output is a private custom model registered in your account.

The Provisioned Throughput requirement — the big cost implication

Here is the part to internalize before you start: you cannot call a fine-tuned model on the cheap on-demand, per-token path the way you call base models. To serve a custom model, Bedrock requires Provisioned Throughput — you purchase dedicated model capacity (measured in "model units") and pay a flat hourly rate for it continuously, the entire time the model is deployed, regardless of how many requests you send. A custom model sitting idle on Provisioned Throughput still bills every hour.

The consequence is stark: the fine-tuning training charge might be tens or low-hundreds of dollars one time, but hosting the result can cost far more per month than that — and far more than the equivalent on-demand inference would have cost on a base model. This single fact flips the economics of most casual fine-tuning ideas. It is the reason the honest default is: do not fine-tune-and-host unless the volume and quality gains clearly justify a standing hourly bill. For low or spiky traffic, a base model with good prompting and RAG is almost always cheaper overall.

Where fine-tuning does pay off, it is usually high, steady volume on a narrow task: at that point you are running enough traffic that (a) the quality/consistency gain is worth a lot and (b) the reserved capacity is busy enough to be efficient. If you can commit to a 1- or 6-month Provisioned Throughput term, the hourly rate drops, improving the math further. See the amazon-bedrock-provisioned-throughput sibling for the capacity mechanics, and amazon-bedrock-pricing for how PT sits among the four pricing modes.

the cost that surprises everyone

Training a custom model is a small one-time charge. Hosting it is the real cost: a fine-tuned model can only be served on Provisioned Throughput, a flat hourly charge that accrues 24/7 whether or not the model is used. Budget for the standing hosting bill — not just the training — and only fine-tune-and-host for high, steady volume.

did it work?

VIIIEvaluating the tuned model — and when it is worth it

A finished fine-tune is a hypothesis, not a result. Before you put a standing Provisioned Throughput bill behind it, prove it actually beats the base model on your task — and that the improvement justifies the cost and the operational weight.

Start with the training and validation loss the job reports: validation loss falling alongside training loss is a healthy sign; training loss falling while validation loss rises means it is overfitting (memorizing rather than generalizing) — usually a cue to reduce epochs or get more/cleaner data. But loss numbers are only a proxy. The real test is task performance on a held-out evaluation set the model has never seen.

Run a head-to-head: the same prompts through the base model and through your custom model, scored on the metric that matters for your task — exact-match or schema-validity for structured extraction, a rubric or LLM-as-judge score for style/quality, accuracy/F1 for classification. Bedrock's model evaluation tooling can run automated and human-in-the-loop evaluations to make this systematic. The bar to clear is not "is the custom model good" — it is "is it enough better than the base model (with good prompting and RAG) to justify a standing hosting cost."

That framing is the honest test of whether fine-tuning is worth it at all. Tally the full cost: one-time training + ongoing Provisioned Throughput hosting + the human effort to build and maintain the dataset (a fine-tune drifts as your task evolves and may need re-training). Set it against the measured quality and consistency gain over the cheaper alternatives. Fine-tuning wins cleanly when the task is narrow, stable, high-volume, and format/behaviour-sensitive; it loses when traffic is low or spiky, the task keeps changing, or the real gap was missing facts (RAG) or a weak prompt (prompt engineering) all along.

  • Check loss curves first — Falling validation loss with training loss = healthy. Rising validation loss = overfitting; cut epochs or improve data.
  • Evaluate on a held-out set — Same unseen prompts through base vs custom model, scored on a task-appropriate metric. Use Bedrock model evaluation to systematize it.
  • Compare against the cheap alternatives — The bar is beating a base model with good prompting and RAG — not beating nothing.
  • Tally the full cost — Training + standing PT hosting + dataset maintenance. Fine-tuning is worth it for narrow, stable, high-volume, format-sensitive tasks; rarely otherwise.
how it becomes $0

IXHow AWS credits fund the training — and the hosting

Everything above prices fine-tuning if you pay AWS directly. For most startups and many companies the relevant number is different, because AWS will frequently fund the work with credits — and both the one-time training charge and the ongoing Provisioned Throughput hosting draw those credits down before they ever touch your card.

Fine-tuning, continued pre-training, custom-model hosting on Provisioned Throughput, the embeddings and vector store behind any RAG you pair with it, and the S3 storage for your datasets are all credit-eligible, and AWS credits apply automatically against your bill until exhausted. The relevant pools are AWS Activate (general startup credits, commonly up to $100K for institutionally-funded startups), a dedicated Bedrock / Generative-AI POC pool ($10K–$50K) aimed specifically at proving out a GenAI use case — which is exactly what a fine-tuning experiment is — and the competitive Generative AI Accelerator (awards up to $1M for a small cohort of AI-first startups).

This matters more for fine-tuning than for plain inference precisely because of the standing hosting cost. The line item that makes teams hesitate to fine-tune — Provisioned Throughput running 24/7 — is fully covered by credits during the build and proof-out period. That changes the calculus: you can stand up a custom model, run a proper head-to-head evaluation against the base model, and only commit real money to hosting once you have proven the gain and the volume justify it.

The practical mechanic is that most of these pools are partner-filed — requested through the AWS Partner Network (the ACE program), not a public self-serve form — which is why teams route through an AWS partner rather than applying alone. That is the gap CloudRoute fills. CloudRoute matches you to the right credit pool for your stage and to a vetted AWS ML partner who both files the credit application and does the work: curating the JSONL dataset, running the fine-tuning job, setting up evaluation, and deciding honestly whether fine-tuning or a cheaper RAG/prompt approach is the right answer. The customer pays $0 — AWS funds the credit pool, AWS pays the partner through engagement-funding programs, and the partner pays CloudRoute a routing commission. You never see an invoice. Related: AWS credits for generative-AI startups and Bedrock POC funding.

pick the right tool

Fine-tuning vs RAG vs prompt engineering vs continued pre-training vs distillation

The headline decision, on one screen. Match the row to the problem you actually have. The pattern to notice: the lighter tools (prompt engineering, RAG) have no standing model-hosting bill, while fine-tuning and continued pre-training do. Representative 2026 guidance, not quotes.

MethodBest when…EffortAdds facts?Changes behaviour?Cost shapeReach for it…
Prompt engineeringThe prompt/examples just are not good enough yetLowestOnly inlineYes (via instructions)Free to iterate; no hostingFirst, always
RAG (Knowledge Bases)The model lacks your facts / docs / latest dataLow–mediumYesNoEmbeddings + vector store + tokens; no model hostingSecond, for any knowledge gap
Fine-tuningNeed a locked-in style/format/narrow skill, prompting unreliableMediumNoYesOne-time training + standing PT hostingThird, with good labelled data
Continued pre-trainingWhole domain language is alien; large unlabelled corpusHighSomewhatYes (domain register)Higher one-time training + standing PT hostingSpecialist domains
Model distillationNeed big-model quality at small-model cost, high volumeMedium–highInherits teacherYes (mimics teacher)Training; cheap student inferenceHigh-volume narrow tasks
These combine. A common production stack is RAG for facts + a light fine-tune for output format + prompt caching for cost. Reach for the lightest combination that clears your quality bar — and remember fine-tuning/continued pre-training add a standing Provisioned Throughput hosting bill (§VII).
before you commit to a standing hosting bill
Get AWS credits that cover training AND Provisioned Throughput hosting — and a partner to build it (you pay $0)
Get matched in 24h →
a recent match

A fine-tune that turned out to be a RAG problem — built on $0 — anonymized

inquiry · Series-A legal-tech SaaS, Berlin
Series-A legal-tech SaaS, 20 people, building a contract-analysis assistant on AWS

Situation: The team was convinced they needed to fine-tune a model on their contract corpus because the base model "did not know their clauses." They had budgeted for a fine-tuning project and were worried about the cost of training and hosting a custom model — and about spending scarce runway to find out whether it would even work.

What CloudRoute did: CloudRoute matched them in under 24 hours to a German AWS ML partner. On a discovery call the partner diagnosed the real problem: the model lacked <em>facts</em> (the clauses), not the right <em>behaviour</em> — a RAG problem, not a fine-tuning one. The partner built a Bedrock Knowledge Base over the contract corpus (managed RAG), then added a small fine-tune purely to lock the output into the firm's structured review format, hosted on a single Provisioned Throughput unit. They filed a Bedrock POC credit application plus an Activate Portfolio application to fund the whole build.

Outcome: The assistant shipped with RAG carrying the facts and a narrow fine-tune carrying the format — far cheaper and more accurate than fine-tuning everything would have been. Training, the PT hosting, embeddings, and the vector store were all covered by the approved credits, so the team paid $0 during the build and proof-out. CloudRoute's commission was paid by the partner from AWS engagement funding, not by the customer.

method: RAG for facts + narrow fine-tune for format · credits secured: POC + Activate · out-of-pocket during build: $0

faq

Common questions

What is fine-tuning on Amazon Bedrock?
Fine-tuning continues training a base foundation model on your own labelled examples (prompt/completion pairs supplied as a JSONL file in S3) so it learns a specific style, format, or narrow skill better than the base model does. On Bedrock the process is fully managed — you supply data and AWS runs the training — and the resulting private custom model stays in your account and region; your data is not used to train the base model. It changes how the model behaves, not what facts it knows (that is RAG).
When should I fine-tune versus use RAG or prompt engineering?
Use prompt engineering first — it is free and instant and solves a lot. Use RAG (Bedrock Knowledge Bases) when the problem is missing facts/knowledge: the model needs your documents or latest data. Use fine-tuning when you need a consistent style, output format, or narrow skill locked in and prompting alone is not reliable enough, and you have good labelled examples. Fine-tuning is poor at adding facts and goes stale when your data changes, so it is rarely the first tool to reach for.
What is the difference between fine-tuning and continued pre-training?
Fine-tuning is supervised — it learns from labelled prompt/completion pairs to change behaviour on a specific task. Continued pre-training is unsupervised — it feeds the model large volumes of unlabelled domain text so it absorbs a specialist field's vocabulary and register. Fine-tuning is lighter and task-focused; continued pre-training is heavier and domain-focused, and is worth it when a field's language is genuinely far from general text. Teams sometimes do both: continued pre-training to learn the domain, then fine-tuning to learn the task.
Which models can I fine-tune on Bedrock?
The set changes, but as of 2026 fine-tuning (and often continued pre-training) has been most reliably available for Amazon's own models (the Amazon Nova family and Amazon Titan text models) and several open-weight families (Meta Llama, Cohere). Some third-party frontier models do not expose weight-level fine-tuning on Bedrock or offer customization only through other mechanisms. Always confirm your target model is on the current fine-tunable list in the AWS Bedrock documentation before designing around it.
What format does Bedrock fine-tuning training data need to be in?
JSONL ("JSON Lines") — a plain-text file with one self-contained JSON object per line, uploaded to Amazon S3 in the same region as the job. For supervised fine-tuning each line is a prompt/completion pair (chat-formatted models use a messages-style schema); the exact field names depend on the model, so check the AWS docs for your chosen model's schema. The examples should mirror real production prompts and outputs, and you should hold out a separate validation file to measure generalization.
How much does fine-tuning on Bedrock cost?
There are two costs. (1) A one-time training charge based on the volume of training data processed (commonly priced per 1,000 training tokens × the number of epochs) — typically tens to low-hundreds of dollars for typical datasets. (2) The cost most teams miss: hosting. Serving a fine-tuned model requires Provisioned Throughput, a flat hourly charge that accrues continuously while the model is deployed, whether or not it is used. The hosting usually dwarfs the training. Figures are representative for 2026 — confirm current rates on the AWS Bedrock pricing page.
Why does a fine-tuned model require Provisioned Throughput?
Custom (fine-tuned) models cannot be served on the shared on-demand, per-token path that base models use. To call a custom model on Bedrock you must purchase Provisioned Throughput — dedicated model capacity billed at a flat hourly rate for as long as the model is deployed, regardless of traffic. This is the single biggest economic consideration in fine-tuning: a custom model sitting idle still bills every hour, so fine-tuning only makes sense for high, steady volume on a narrow task. A 1- or 6-month commitment lowers the hourly rate.
Is fine-tuning worth it — when does it pay off?
Fine-tuning pays off when a task is narrow, stable, high-volume, and behaviour/format-sensitive — enough traffic that the quality and consistency gains are valuable and the reserved Provisioned Throughput capacity stays busy. It does not pay off for low or spiky traffic (the standing hosting cost is wasted), for tasks that keep changing (the fine-tune drifts and needs re-training), or when the real gap was missing facts (use RAG) or a weak prompt (use prompt engineering). Always evaluate the custom model head-to-head against a base model with good prompting before committing to hosting.
Can AWS credits cover fine-tuning and the custom-model hosting?
Yes — fine-tuning, continued pre-training, custom-model hosting on Provisioned Throughput, plus the embeddings, vector store, and S3 storage around it are all credit-eligible, and credits apply automatically against your AWS bill. The relevant pools are AWS Activate (up to $100K), a Bedrock/GenAI POC pool ($10K–$50K), and the GenAI Accelerator (up to $1M). These are largely partner-filed via the AWS Partner Network. CloudRoute routes you to the right pool and a vetted AWS ML partner who files the application and does the work (dataset prep, training, evaluation) — customer pays $0, AWS funds it.

Fine-tune on AWS's budget, not your runway

Training is cheap; hosting a custom model on Provisioned Throughput is the standing cost that makes teams hesitate. AWS credits cover both. CloudRoute routes you to the right credit pool (Activate up to $100K, Bedrock POC $10K–$50K, GenAI Accelerator up to $1M) and a vetted AWS ML partner who preps the data, runs the job, and tells you honestly whether to fine-tune at all. Customer pays $0.

matched within< 24h
GenAI credit ceilingup to $1M
cost to you$0
Amazon Bedrock fine-tuning — the full 2026 guide · CloudRoute