fine-tune an llm on aws · the 2026 how-to

How to fine-tune an LLM on AWS (2026).

A neutral, end-to-end reference for fine-tuning a large language model on AWS — across all three paths. First the decision that should come before any of them: fine-tune vs RAG vs prompt engineering. Then the three ways to actually do it — Amazon Bedrock fine-tuning (fully managed), Amazon SageMaker training (full control), and SageMaker JumpStart (the guided middle) — with how data prep works, the workflow on each path, the GPU and AWS Trainium cost of the training run, how you host the tuned model afterwards (Bedrock Provisioned Throughput vs a SageMaker endpoint), how to evaluate it, and a decision table across all three. Plus how AWS credits fund the GPU/Trainium training and the hosting so the build costs you $0.

paths to a tuned LLM
3
training silicon
GPU or Trainium
the real cost
hosting, not training
cost with credits
$0
TL;DR
  • There are three ways to fine-tune an LLM on AWS, and choosing among them is most of the decision. Amazon Bedrock fine-tuning is fully managed (you supply a JSONL dataset; AWS runs everything) — fastest, least control. Amazon SageMaker training is full control (your own training script on GPU or Trainium instances) — most flexible, most work. SageMaker JumpStart is the guided middle — pre-built recipes that fine-tune popular open models with minimal code.
  • Before any of them, run the cheaper decision: most "we need to fine-tune" problems are actually a RAG problem (the model lacks your facts) or a prompt problem (the instructions are weak). Fine-tuning teaches behaviour — a style, a format, a narrow skill — not facts, and it goes stale when your data changes. Exhaust prompt engineering and RAG first; fine-tune only when you have good labelled examples and a behaviour that prompting cannot make reliable.
  • The cost that surprises everyone is not the training run — it is hosting the result. A Bedrock custom model can only be served on Provisioned Throughput (a flat hourly charge, 24/7); a SageMaker-tuned model runs on an always-on endpoint (the instance bills by the hour). Both usually dwarf the one-time GPU/Trainium training cost. AWS credits (Activate up to $100K, Bedrock/GenAI POC $10K–$50K, GenAI Accelerator up to $1M) cover training and hosting — CloudRoute routes you to the credit pool and a vetted AWS ML partner, so you pay $0.
the prior decision

IFine-tuning an LLM — what it is, and when you actually need it

Fine-tuning continues training a pre-trained LLM on your own labelled examples so it learns a specific style, format, or narrow skill better than the base model does out of the box. The most valuable thing to settle before picking an AWS path is whether you need fine-tuning at all — because most teams who think they do actually have a RAG or prompt problem.

A foundation model arrives fluent in language in general but knowing nothing about your output format, your tone, your domain conventions, or the precise way you want a task done. Fine-tuning closes that gap by showing the model many examples of the input it will see and the output you want, and nudging its weights toward producing that output. The result is a custom model: a private derivative of the base model that behaves the way your examples taught it to. Critically, fine-tuning changes how the model behaves, not what facts it knows.

That distinction drives the whole decision. If your model gives wrong answers because it does not know your content — your docs, your product, last night's data — that is a facts problem, and the fix is retrieval-augmented generation (RAG): retrieve the relevant passages and put them in the prompt at request time. Fine-tuning is poor at teaching facts and goes stale the moment your data changes. If the model can clearly do the task but does it inconsistently, that is usually a prompt problem — better instructions and a few in-context examples — which is free and instant to iterate. Only when you have good labelled examples and a behaviour that prompting still cannot make reliable does fine-tuning become the right tool.

When fine-tuning is right, it tends to be for a narrow, stable, behaviour-sensitive task: a strict JSON output schema the base model keeps subtly breaking, a consistent brand voice, a domain-specific classification or extraction. Those are the cases where locking the behaviour into the weights pays off. The companion sibling rag-on-aws covers the retrieval path in depth; the rest of this page assumes you have made the decision to fine-tune and focuses on how to do it on AWS — across the three paths.

One caveat, stated once and meant throughout: exact dollar figures, the precise list of fine-tunable models, the available training instances, and feature details change frequently across Bedrock and SageMaker. Every number here is representative as of 2026 to convey relative cost and the shape of the work. Always confirm the current model list, instance types, and pricing on the official AWS documentation and pricing pages before committing.

the decision in one line

Missing facts → RAG. Weak instructions → prompt engineering. A behaviour (style/format/narrow skill) that prompting cannot make reliable, and you have good labelled examples → fine-tune. Climb the ladder; fine-tune last, and never to add facts.

the three ways

IIThe three paths to a fine-tuned LLM on AWS

AWS gives you three distinct ways to fine-tune an LLM, trading control for convenience. Picking the right one for your team and task is most of the work — the wrong choice means either fighting a managed service that cannot do what you need, or hand-building infrastructure you did not have to. Here they are, most-managed first.

The honest framing: start as managed as your requirements allow, and move toward full control only when a specific need forces it. Most teams should try Bedrock fine-tuning or a JumpStart recipe before writing a custom SageMaker training job, because those reach a working custom model in a fraction of the time. Conversely, teams with a hard requirement — a model Bedrock cannot tune, a custom training objective, or full ownership of the artifact — will be happier going to SageMaker from the start than fighting the managed path.

Path A — Amazon Bedrock fine-tuning (fully managed)

The least-effort path. You upload a JSONL training dataset to Amazon S3, point a Bedrock model customization job at a supported base model, set a few hyperparameters, and AWS runs the entire training process on managed infrastructure — you never see a GPU, a training loop, or a checkpoint. The output is a private custom model in your account, and your training data and the resulting model stay in your account and region (not used to train the base model). Choose this when: your target model is on Bedrock's fine-tunable list (Amazon Nova, Amazon Titan, and open-weight families like Llama and Cohere have been the most reliable), and you want the shortest path to a working custom model. The trade is the least control and that serving the result requires Provisioned Throughput (covered in §VI). The amazon-bedrock-fine-tuning sibling is the deep dive.

Path B — Amazon SageMaker training (full control)

The most flexible path. You write (or adapt) a training script — typically using Hugging Face Transformers, PyTorch, and parameter-efficient methods like LoRA/QLoRA — and run it as a SageMaker training job on GPU instances (the ml.p and ml.g families) or AWS Trainium instances (ml.trn, via the Neuron SDK). You control the model, the objective, the data pipeline, distributed-training strategy, and every hyperparameter; you own the resulting weights as an artifact in S3. Choose this when: you need a model or technique Bedrock does not offer, full ownership of the weights, a custom training objective, or large-scale distributed training. The trade is real ML engineering effort. The amazon-sagemaker sibling covers the platform; aws-trainium covers the cheaper-than-GPU training silicon.

Path C — SageMaker JumpStart (the guided middle)

The middle path: most of SageMaker's flexibility, far less of the boilerplate. JumpStart is a hub of pre-trained open models (Llama, Mistral, Falcon, and many more) with built-in fine-tuning recipes — you select a model, point the recipe at your dataset, set a handful of parameters, and JumpStart handles the training script, the instance selection, and the deployment scaffolding for you. You get a SageMaker endpoint hosting your tuned open model without writing the training loop yourself. Choose this when: the model you want is a popular open one available in JumpStart, you want more control and model choice than Bedrock but do not want to hand-write a training job. It is the pragmatic default for fine-tuning open-weight LLMs on AWS.

the rule of thumb

Need it managed and your model is on Bedrock's list → Bedrock fine-tuning. Want an open model with minimal code → SageMaker JumpStart. Need full control, custom techniques, or to own the weights → SageMaker training. Move toward control only when a concrete requirement forces it.

paths at a glance

IIIThe three paths, compared on what matters

The same three paths, lined up against the dimensions that actually drive the choice: how much you build, how much control and model choice you get, what silicon the training runs on, and — the part teams overlook — how you host the result and what that standing cost looks like.

fine-tuning an llm on aws — three paths compared · representative as of 2026
DimensionBedrock fine-tuning (managed)SageMaker JumpStart (guided)SageMaker training (full control)
EffortLowest — upload JSONL, run a jobLow–medium — pick a recipe, set paramsHighest — write/adapt a training script
ControlMinimal (a few hyperparameters)Moderate (recipe parameters)Total (objective, data, distribution)
Model choiceBedrock fine-tunable list (Nova, Titan, Llama, Cohere…)JumpStart open models (Llama, Mistral, Falcon…)Any open model you can train
Training siliconManaged (abstracted away)GPU or Trainium (ml.p/ml.g/ml.trn)GPU or Trainium, your choice (ml.p/ml.g/ml.trn)
MethodsManaged SFT (+ continued pre-training on some models)Recipe-driven (often LoRA/QLoRA)Anything — full SFT, LoRA/QLoRA, custom
Who owns the weightsPrivate custom model in BedrockModel artifact in your S3 + endpointModel artifact in your S3 — fully yours
How you host itProvisioned Throughput (hourly)SageMaker endpoint (hourly instance)SageMaker endpoint (hourly instance)
Best forFastest path; managed modelsOpen models with minimal codeCustom needs, full ownership, scale
All three produce a fine-tuned model; they differ in how much you build and how you serve the result. The hosting row is the one that drives ongoing cost (§VI): a Bedrock custom model needs Provisioned Throughput; a SageMaker/JumpStart model runs on an always-on endpoint. Representative for 2026 — confirm models, instances, and pricing on the AWS docs.
the data

IVData preparation — the part that actually determines quality

Fine-tuning quality is mostly a data problem, and this is true on all three paths. The model can only learn what your examples demonstrate, so curation and format matter more than any hyperparameter. The shape differs slightly by path, but the discipline is identical.

The format. On Bedrock, training data is a JSONL file ("JSON Lines") in Amazon S3 — one self-contained JSON object per line, each a prompt/completion pair for supervised fine-tuning (chat-formatted models use a messages-style schema; the exact field names depend on the model). On SageMaker training and JumpStart, the format is whatever your script or the chosen recipe expects — commonly JSONL or CSV with an instruction/response (or chat) structure, also staged in S3. In all cases each example pairs the input the model will see with the exact output you want it to produce.

The examples must mirror production. The single biggest determinant of a good fine-tune is that the training prompts look like the prompts the model will actually receive, and the completions look exactly like the output you want back — same format, same length profile, same tone. If production prompts include a system instruction and retrieved context, your training examples should too. A clean dataset of a few hundred to a few thousand high-quality, representative pairs typically beats a much larger noisy one.

Curation and hygiene. De-duplicate examples, remove contradictions (two near-identical prompts with different desired outputs confuse the model), balance the classes or formats you care about, and strip anything you would not want the model to imitate. Always hold out a validation set the model does not train on, so you can measure generalization rather than memorization. And because the data feeds a training process that bakes patterns into weights, scrub or tokenize PII and secrets you do not want learned. This work is path-independent — it matters exactly as much whether you fine-tune on Bedrock, JumpStart, or a custom SageMaker job.

Practically, getting from raw logs, spreadsheets, and documents to a clean training set is where most of the human effort in any fine-tuning project goes — far more than running the job itself. It is also exactly the kind of work a vetted AWS ML partner does efficiently, and, because the engagement is credit-funded, without the customer paying for it (see §VIII).

  • Bedrock: JSONL prompt/completion (or messages) pairs in S3, matching your chosen model's schema.
  • SageMaker / JumpStart: JSONL or CSV instruction/response (or chat) data in S3, matching the recipe or script.
  • Training examples that mirror real production prompts — including any system instruction and context.
  • A separate validation set held out from training, to measure generalization not memorization.
  • De-duplicated, contradiction-free, class-balanced examples; PII and secrets removed.
  • Everything staged in an Amazon S3 bucket in the same Region as the training job.
the workflow on each path

VRunning the fine-tune — the workflow on each path

With a clean dataset in S3, the act of running the fine-tune differs by path: a managed job on Bedrock, a recipe on JumpStart, or a training job you script on SageMaker. Here is what each looks like, end to end.

Bedrock fine-tuning — a managed customization job

In the Bedrock console (or via API/SDK) you create a model customization job: choose the base model, point it at your training (and validation) data in S3, name the output custom model, set a few hyperparameters — typically epochs (passes over the data), learning-rate multiplier, and batch size — and grant an IAM role that can read your S3 data and write the result. Bedrock provisions the training infrastructure, runs the job, and reports training and validation loss when it finishes. Jobs run from minutes to hours depending on dataset size and epochs. The output is a private custom model registered in your account, ready to evaluate. No infrastructure to manage.

SageMaker JumpStart — a recipe-driven fine-tune

In SageMaker Studio's JumpStart hub you select a fine-tunable open model (e.g. a Llama or Mistral variant), point the built-in fine-tuning recipe at your dataset in S3, choose a training instance (a GPU ml.g/ml.p or a Trainium ml.trn type, depending on the recipe), and set the exposed parameters — epochs, learning rate, and usually whether to use a parameter-efficient method like LoRA. JumpStart supplies the training script and orchestration; you launch the job and it produces a tuned model artifact plus the scaffolding to deploy it to a SageMaker endpoint. You get most of SageMaker's flexibility without writing the training loop.

SageMaker training — your own training job

For full control you write or adapt a training script (Hugging Face Transformers + PyTorch is the common stack), choose the instance type and count — GPU (ml.p/ml.g) or Trainium (ml.trn) — and launch a SageMaker training job that spins the cluster up, runs your script against your S3 data, writes the model artifact back to S3, and tears the cluster down. You decide the technique: full supervised fine-tuning or, far more commonly for LLMs, a parameter-efficient method (LoRA / QLoRA) that trains a small set of adapter weights instead of the whole model — dramatically cheaper in memory and compute, and the standard way to fine-tune large open models affordably. For multi-GPU or multi-node runs you configure distributed training. The resulting weights are entirely yours.

LoRA / QLoRA — why it matters

Full fine-tuning updates every weight in the model — expensive in GPU memory and time for a large LLM. Parameter-efficient fine-tuning (LoRA, and quantized QLoRA) trains a small set of adapter weights instead, cutting training cost and memory by a large factor while keeping most of the quality. On SageMaker and JumpStart it is the default way to fine-tune large open models on modest hardware; it is a big reason the GPU/Trainium bill for a fine-tune is often far smaller than people expect.

the two costs

VIThe cost: GPU/Trainium training vs hosting the tuned model

Fine-tuning has two very different costs, and confusing them is the most common budgeting mistake. The training run is a one-time charge driven by the silicon and how long it runs. Hosting the result is an ongoing charge that, on every path, usually dwarfs the training.

The training run — GPU or Trainium, billed by the hour

A fine-tuning job runs on accelerated compute, and you pay for the instance-time it consumes. On SageMaker and JumpStart you choose the instance and pay its hourly rate for the duration of the job: GPU instances (the ml.p family — high-end NVIDIA for the largest jobs; the ml.g family — cheaper GPUs for smaller fine-tunes) or AWS Trainium instances (ml.trn), AWS's custom training silicon, which is typically meaningfully cheaper per unit of training throughput than comparable GPUs and is accessed via the Neuron SDK (well-supported by JumpStart recipes and Hugging Face). On Bedrock the training infrastructure is abstracted away — you are billed for the customization itself (commonly priced per 1,000 training tokens × epochs) rather than for instance-hours.

The encouraging part: for most fine-tunes — especially LoRA/QLoRA on an open model, or a managed Bedrock job on a typical dataset — the training run is a modest one-time cost, frequently tens to low-hundreds of dollars. Two levers cut it further: parameter-efficient methods (LoRA/QLoRA) slash the compute needed, and Trainium or EC2 Spot-backed training capacity lowers the hourly rate. Training is rarely the line item that makes or breaks the economics.

Hosting the tuned model — the cost that surprises everyone

The real cost is serving the model, and the mechanism differs by path. A Bedrock custom model cannot be called on the cheap on-demand per-token path that base models use — to serve it you must buy Provisioned Throughput: dedicated capacity billed at a flat hourly rate, continuously, the entire time the model is deployed, regardless of traffic. A SageMaker or JumpStart tuned model is deployed to a real-time endpoint — one or more instances (often a GPU ml.g/ml.p) that bill by the hour for as long as the endpoint is up, whether or not requests arrive. Either way, a tuned model sitting idle still bills every hour.

The consequence is stark: the training might be a few hundred dollars once, but hosting the result can cost far more per month than that — and far more than equivalent on-demand inference on a base model would have. This single fact flips the economics of most casual fine-tuning ideas. The honest default: do not fine-tune-and-host unless the volume and quality gains clearly justify a standing hourly bill. For low or spiky traffic, a base model with good prompting and RAG is almost always cheaper overall. Where fine-tuning does pay off, it is high, steady volume on a narrow task — enough traffic that the quality gain is worth a lot and the reserved capacity stays busy. (Bedrock 1- or 6-month Provisioned Throughput commitments, and SageMaker Savings Plans or endpoint auto-scaling, lower the hosting math.) The amazon-bedrock-provisioned-throughput sibling covers the Bedrock side; amazon-sagemaker-pricing covers endpoints.

the cost that surprises everyone

Training a tuned LLM is a small one-time charge — smaller still with LoRA/QLoRA and Trainium. Hosting it is the real cost: Bedrock custom models need Provisioned Throughput; SageMaker/JumpStart models need an always-on endpoint — both flat hourly charges that accrue 24/7 whether or not the model is used. Budget for the standing hosting bill, and only fine-tune-and-host for high, steady volume.

did it work?

VIIEvaluating the tuned model — before you pay to host it

A finished fine-tune is a hypothesis, not a result. Before you put a standing hosting bill behind it on any path, prove it actually beats the base model on your task — and that the improvement justifies the cost and the operational weight.

Start with the training and validation loss the job reports (Bedrock and SageMaker both surface these): validation loss falling alongside training loss is healthy; training loss falling while validation loss rises means overfitting (memorizing rather than generalizing) — a cue to reduce epochs or get more/cleaner data. But loss is only a proxy. The real test is task performance on a held-out evaluation set the model has never seen.

Run a head-to-head: the same prompts through the base model and through your tuned model, scored on the metric that matters for your task — exact-match or schema-validity for structured extraction, a rubric or LLM-as-judge score for style/quality, accuracy/F1 for classification. On the Bedrock path, Bedrock model evaluation can run automated and human-in-the-loop evaluations; on the SageMaker path, SageMaker Clarify / model-evaluation tooling and open-source frameworks do the same. The bar to clear is not "is the tuned model good" — it is "is it enough better than the base model (with good prompting and RAG) to justify a standing hosting cost."

That framing is the honest test of whether fine-tuning was worth it at all. Tally the full cost: one-time GPU/Trainium (or Bedrock) training + ongoing hosting (Provisioned Throughput or endpoint) + the human effort to build and maintain the dataset (a fine-tune drifts as the task evolves and may need re-training). Set it against the measured gain over the cheaper alternatives. Fine-tuning wins cleanly when the task is narrow, stable, high-volume, and format/behaviour-sensitive; it loses when traffic is low or spiky, the task keeps changing, or the real gap was missing facts (RAG) or a weak prompt (prompt engineering) all along.

  • Check loss curves first — Falling validation loss with training loss = healthy. Rising validation loss = overfitting; cut epochs or improve data.
  • Evaluate on a held-out set — Same unseen prompts through base vs tuned model, scored on a task-appropriate metric. Use Bedrock model evaluation or SageMaker Clarify to systematize it.
  • Compare against the cheap alternatives — The bar is beating a base model with good prompting and RAG — not beating nothing.
  • Tally the full cost before hosting — Training + standing hosting (PT or endpoint) + dataset maintenance. Fine-tuning is worth it for narrow, stable, high-volume tasks; rarely otherwise.
how it becomes $0

VIIIHow AWS credits fund the GPU/Trainium training — and the hosting

Everything above prices a fine-tune if you pay AWS directly. For most startups and many companies the relevant number is different, because AWS will frequently fund the work with credits — and the GPU/Trainium training run, the Bedrock customization charge, and the ongoing hosting all draw those credits down before they ever touch your card.

Across all three paths the cost lines are credit-eligible: Bedrock model customization and Provisioned Throughput hosting; SageMaker training jobs on GPU or Trainium instances and the real-time endpoints that host the result; the S3 storage for your datasets and artifacts; and the embeddings and vector store behind any RAG you pair with it. AWS credits apply automatically against your bill until exhausted. The relevant pools are AWS Activate (general startup credits, commonly up to $100K for institutionally-funded startups), a dedicated Bedrock / Generative-AI POC pool ($10K–$50K) aimed at proving out a GenAI use case — which is exactly what a fine-tuning experiment is — and the competitive Generative AI Accelerator (awards up to $1M for a small cohort of AI-first startups).

This matters more for fine-tuning than for plain inference precisely because of the standing hosting cost. The line item that makes teams hesitate — Provisioned Throughput or an always-on endpoint running 24/7 — is fully covered by credits during the build and proof-out period. That changes the calculus: you can run the GPU/Trainium training, stand up the tuned model, run a proper head-to-head evaluation against the base, and only commit real money to hosting once you have proven the gain and the volume justify it.

The practical mechanic is that most of these pools are partner-filed — requested through the AWS Partner Network (the ACE program), not a public self-serve form — which is why teams route through an AWS partner rather than applying alone. That is the gap CloudRoute fills. CloudRoute matches you to the right credit pool for your stage and to a vetted AWS ML partner who both files the credit application and does the work: choosing the path (Bedrock vs JumpStart vs SageMaker), curating the dataset, running the fine-tune on GPU or Trainium, setting up evaluation, and deciding honestly whether fine-tuning — or a cheaper RAG/prompt approach — is the right answer at all. The customer pays $0 — AWS funds the credit pool, AWS pays the partner through engagement-funding programs, and the partner pays CloudRoute a routing commission. You never see an invoice. Related: AWS credits for generative-AI startups and Bedrock POC funding.

before you pick a path

Fine-tune vs RAG vs prompt — and which fine-tuning path if you do

Two decisions on one screen. First (top three rows): should you fine-tune at all, or is this really a RAG or prompt problem? Then (bottom three rows): if fine-tuning is genuinely right, which of the three AWS paths fits. Representative 2026 guidance, not quotes.

ApproachBest when…EffortTeaches facts?Changes behaviour?Ongoing cost shape
Prompt engineeringThe prompt/examples just are not good enough yetLowestOnly inlineYes (via instructions)None — free to iterate, no hosting
RAG (Knowledge Bases)The model lacks your facts / docs / latest dataLow–mediumYesNoEmbeddings + vector store + tokens; no model hosting
Fine-tuning (any path below)Need a locked-in style/format/skill; prompting unreliable; good labelled dataMedium–highNoYesOne-time training + standing hosting
Bedrock fine-tuningFastest path; model is on Bedrock's list; want it managedLowNoYesTraining charge + Provisioned Throughput (hourly)
SageMaker JumpStartPopular open model; want control with minimal codeMediumNoYesGPU/Trainium training + endpoint (hourly)
SageMaker trainingCustom technique/model; full ownership; scaleHighNoYesGPU/Trainium training + endpoint (hourly)
Exhaust prompt engineering and RAG before fine-tuning — they have no standing model-hosting bill. If you do fine-tune, every path leaves you with an ongoing hosting cost (Provisioned Throughput on Bedrock; an always-on endpoint on SageMaker/JumpStart). LoRA/QLoRA and Trainium cut the training cost; AWS credits cover both training and hosting (§VIII).
before you commit to a standing hosting bill
Get AWS credits that cover GPU/Trainium training AND the hosting — and a partner who picks the right path and builds it (you pay $0)
Get matched in 24h →
a recent match

A SageMaker LoRA fine-tune on Trainium — built on $0 — anonymized

inquiry · Series-A developer-tools startup, Toronto
Series-A developer-tools startup, 22 people, building an AI code-review assistant on AWS

Situation: The team needed an open LLM that produced code-review comments in their exact house format and rubric — a behaviour problem prompting had not made reliable. They wanted to fine-tune an open model (not a Bedrock-only one), owned end to end, but had no ML-platform engineer, were nervous about GPU training cost, and worried a standing hosting bill would burn runway before they knew it even worked.

What CloudRoute did: CloudRoute matched them in under 24 hours to a North-American AWS ML partner. The partner diagnosed it as a genuine behaviour/format problem (not facts), chose the <strong>SageMaker</strong> path for full ownership of an open model, and fine-tuned it with <strong>QLoRA</strong> on <strong>Trainium (ml.trn)</strong> instances to keep the training run cheap. They paired it with a small Bedrock Knowledge Base so the assistant could cite the team's style guide, deployed the tuned model to a SageMaker endpoint with auto-scaling, and built an evaluation harness scoring schema-validity and a review rubric against the base model. They filed a Bedrock/GenAI POC credit application plus an Activate Portfolio application to fund the whole build.

Outcome: The tuned model cleared the team's rubric and schema-validity bar head-to-head against the base; QLoRA-on-Trainium kept the training run to a small one-time cost. The training, the endpoint hosting through the proof-out period, S3, and the RAG embeddings were all covered by the approved credits, so the team paid $0 during the build. CloudRoute's commission was paid by the partner from AWS engagement funding, not by the customer.

path: SageMaker · method: QLoRA on Trainium + small RAG · credits secured: POC + Activate · out-of-pocket during build: $0

faq

Common questions

How do you fine-tune an LLM on AWS?
There are three paths. (1) Amazon Bedrock fine-tuning — fully managed: upload a JSONL dataset to S3, run a model customization job against a supported base model, and AWS handles the training; you get a private custom model. (2) SageMaker JumpStart — the guided middle: pick a popular open model, point a built-in fine-tuning recipe at your data, and it runs the training and deploys an endpoint with minimal code. (3) SageMaker training — full control: write a training script (often LoRA/QLoRA with Hugging Face) and run it as a training job on GPU or Trainium instances, owning the resulting weights. Choose based on how much control and model choice you need versus how much you want to build.
Should I fine-tune an LLM, or use RAG or prompt engineering instead?
Fine-tune last. Use prompt engineering first — it is free and instant and solves a lot of "the model is inconsistent" problems. Use RAG (Bedrock Knowledge Bases) when the issue is missing facts: the model needs your documents or latest data — fine-tuning is poor at teaching facts and goes stale when your data changes. Fine-tune only when you need a consistent style, output format, or narrow skill locked in, prompting alone is not reliable enough, and you have good labelled examples. Many production systems combine them: RAG for facts, a light fine-tune for format.
Bedrock fine-tuning vs SageMaker fine-tuning — which should I use?
Use Bedrock fine-tuning when your target model is on Bedrock's fine-tunable list (Amazon Nova, Titan, and open families like Llama and Cohere have been the most reliable) and you want the fastest, fully-managed path — no infrastructure, just a JSONL file and a job. Use SageMaker (a custom training job, or a JumpStart recipe) when you need a model or technique Bedrock does not offer, full ownership of the weights, custom training objectives, or large-scale distributed training. SageMaker JumpStart is the middle ground: open-model fine-tuning with pre-built recipes and far less code than a hand-written training job.
What is SageMaker JumpStart, and when is it the right path?
JumpStart is a SageMaker hub of pre-trained open models (Llama, Mistral, Falcon, and many more) with built-in fine-tuning recipes. You select a model, point the recipe at your dataset in S3, set a few parameters (epochs, learning rate, often LoRA), and JumpStart supplies the training script, picks instances, and scaffolds deployment to a SageMaker endpoint. It is the right path when the model you want is a popular open one and you want more control and model choice than Bedrock without hand-writing a training job — the pragmatic default for fine-tuning open-weight LLMs on AWS.
What instances and silicon does LLM fine-tuning use — GPU or Trainium?
On SageMaker and JumpStart you run training on accelerated instances: NVIDIA GPU families (ml.p for the largest jobs, ml.g for smaller/cheaper fine-tunes) or AWS Trainium (ml.trn), AWS's custom training chips accessed via the Neuron SDK. Trainium is typically meaningfully cheaper per unit of training throughput than comparable GPUs and is well-supported by JumpStart recipes and Hugging Face. On Bedrock the training infrastructure is abstracted away — you do not pick instances; you are billed for the customization itself. Parameter-efficient methods (LoRA/QLoRA) and EC2 Spot capacity cut the training bill further.
How much does it cost to fine-tune an LLM on AWS?
Two costs. (1) The training run — a one-time charge: on SageMaker/JumpStart you pay the GPU or Trainium instance hourly rate for the job's duration; on Bedrock you pay per training token × epochs. With LoRA/QLoRA on an open model, or a typical Bedrock job, this is often only tens to low-hundreds of dollars. (2) The cost most teams miss — hosting: a Bedrock custom model requires Provisioned Throughput (a flat hourly charge, 24/7), and a SageMaker/JumpStart model runs on an always-on endpoint (hourly instance cost). Hosting usually dwarfs the training. Figures are representative for 2026 — confirm current rates on the AWS pricing pages.
How do I host a fine-tuned model on AWS after training?
It depends on the path. A Bedrock custom model cannot use the cheap on-demand per-token path — you must buy Provisioned Throughput, dedicated capacity billed at a flat hourly rate for as long as the model is deployed, regardless of traffic. A SageMaker- or JumpStart-tuned model is deployed to a real-time endpoint: one or more instances (often GPU) that bill by the hour while the endpoint is up. Both are standing costs that accrue even when idle, so they usually dominate the total cost of a fine-tune. Bedrock 1-/6-month commitments, SageMaker Savings Plans, and endpoint auto-scaling reduce the hosting math; only host for high, steady volume.
What is LoRA / QLoRA and why is it used for fine-tuning LLMs on AWS?
LoRA (Low-Rank Adaptation) and its quantized variant QLoRA are parameter-efficient fine-tuning methods: instead of updating every weight in a large model, they train a small set of adapter weights, cutting GPU memory and training time by a large factor while keeping most of the quality. This makes it practical to fine-tune large open LLMs on modest hardware. On SageMaker and JumpStart, LoRA/QLoRA is the standard, cost-effective way to fine-tune open models — it is a big reason the training run is often far cheaper than teams expect. (Bedrock's managed fine-tuning abstracts the method away.)
How do I evaluate a fine-tuned LLM to know if it was worth it?
First check loss curves: validation loss falling with training loss is healthy; rising validation loss means overfitting. Then run a head-to-head on a held-out set the model never saw — the same prompts through the base model and your tuned model, scored on a task-appropriate metric (schema-validity or exact-match for extraction, a rubric/LLM-as-judge for style, accuracy/F1 for classification). Use Bedrock model evaluation or SageMaker Clarify to systematize it. The bar is beating a base model with good prompting and RAG — by enough to justify the standing hosting cost. Tally training + hosting + dataset maintenance against the measured gain.
Can AWS credits cover the GPU/Trainium training and the hosting?
Yes — across all three paths the costs are credit-eligible: Bedrock customization and Provisioned Throughput; SageMaker training jobs on GPU or Trainium and the endpoints that host the result; S3 storage; and the embeddings/vector store behind any RAG you add. Credits apply automatically against your bill. The pools are AWS Activate (up to $100K), a Bedrock/GenAI POC pool ($10K–$50K), and the GenAI Accelerator (up to $1M), and they are largely partner-filed via the AWS Partner Network. CloudRoute routes you to the right pool and a vetted AWS ML partner who files the application and does the work — choosing the path, prepping data, running the fine-tune, and evaluating it. Customer pays $0; AWS funds it.

Fine-tune your LLM on AWS's budget, not your runway

Training on GPU or Trainium is cheap; hosting a tuned model — Provisioned Throughput on Bedrock or an always-on SageMaker endpoint — is the standing cost that makes teams hesitate. AWS credits cover both. CloudRoute routes you to the right credit pool (Activate up to $100K, Bedrock POC $10K–$50K, GenAI Accelerator up to $1M) and a vetted AWS ML partner who picks the path (Bedrock, JumpStart, or SageMaker), preps the data, runs the fine-tune, and tells you honestly whether to fine-tune at all. Customer pays $0.

matched within< 24h
GenAI credit ceilingup to $1M
cost to you$0
How to fine-tune an LLM on AWS (2026) — Bedrock vs SageMaker · CloudRoute