JumpStart is the model hub inside SageMaker: a catalogue of hundreds of pretrained foundation and task-specific models — Llama, Mistral, Falcon, Stable Diffusion, and more — plus one-click solution templates. It is the fastest way to deploy an open model to your own SageMaker endpoint, or to fine-tune one on your data. This guide covers what JumpStart is, how deployment and fine-tuning work, how it differs from Amazon Bedrock (self-managed endpoint vs fully-managed API), what it costs, when to use it, and how AWS credits fund the endpoints.
SageMaker JumpStart is the model hub and solution catalogue inside Amazon SageMaker — a curated library of hundreds of pretrained models and prebuilt solution templates you can deploy or fine-tune in a few clicks, rather than building from a blank notebook.
The cleanest one-line definition: JumpStart is the in-platform hub that lets you take a pretrained foundation or task model and turn it into a running SageMaker endpoint — or a fine-tuned model — without writing the deployment or training boilerplate yourself. It is the accelerator layer of SageMaker: where a raw training job assumes you bring your own model and code, JumpStart hands you a vetted model, a tested container, and sensible default configuration so the first deploy takes minutes instead of days.
JumpStart contains three kinds of content. First, foundation models — large language and multimodal models such as Meta's Llama family, Mistral and Mixtral, Falcon, and image models like Stable Diffusion, drawn from open-weights providers and proprietary partners. Second, task-specific models — hundreds of smaller models for image classification, object detection, text classification, named-entity recognition, tabular regression, and similar, many built on PyTorch, TensorFlow, and Hugging Face. Third, solution templates — prebuilt, end-to-end reference architectures (for example fraud detection, demand forecasting, churn prediction, document understanding) that deploy a whole CloudFormation-backed stack, not just a single model.
Everything in JumpStart is reached from SageMaker Studio — JumpStart is a panel inside the Studio IDE — or programmatically through the SageMaker Python SDK. In Studio you browse a card-based catalogue, read each model's description, licence, and supported instance types, and click to deploy or fine-tune. Through the SDK you reference the same models by a stable model ID and drive deployment and tuning from code, which is how teams move a JumpStart prototype into a repeatable pipeline.
The point of JumpStart is speed without lock-out. The models are real, the endpoint is a normal SageMaker endpoint you fully own, and the fine-tuning produces an ordinary model artifact in your S3 bucket. You are not in a walled garden — you are getting a curated starting point on top of the full SageMaker platform, and you can drop down to raw training jobs and custom containers whenever the prebuilt path stops fitting.
JumpStart is a capability inside SageMaker, not a separate service. Think of the layering as: SageMaker is the end-to-end ML platform; Studio is its IDE; JumpStart is the model-and-solution hub you open from Studio to deploy or fine-tune a pretrained model fast. The endpoint it creates is a standard SageMaker endpoint, billed and managed like any other.
JumpStart is easiest to understand by what it offers. The catalogue breaks into foundation models, task-specific models, and full solution templates — each a different level of "prebuilt."
You will not use every category on every project. A team that wants a chat model deploys one foundation model; a computer-vision team fine-tunes a task model; a team that wants a complete reference architecture deploys a solution template. Knowing the full catalogue tells you which starting point is closest to your goal.
The defining property of a JumpStart open model is that it runs on your SageMaker endpoint — your account, your chosen instance, your VPC, your data path. Nothing leaves to a shared multi-tenant API. That control is the main reason teams pick JumpStart over a managed API when they need to own the deployment.
The headline JumpStart workflow is deployment: take an open model such as Llama or Mistral and stand it up as a live SageMaker endpoint you can call. Here is what actually happens, and the choices that matter.
When you deploy a JumpStart model, SageMaker pulls a pre-built inference container that already includes the model weights and serving stack, provisions the instance type you select, loads the model, and exposes a real-time endpoint you invoke like any other SageMaker endpoint. From Studio this is a few clicks; from the SDK it is a few lines referencing the model ID. The result is a standard endpoint — the same object covered on the SageMaker endpoints page — so everything you know about real-time, serverless, and asynchronous modes applies.
The single biggest decision is instance type. A large language model needs an accelerated instance with enough GPU memory to hold the weights; JumpStart's model card lists the supported and recommended instances for each model, and an undersized instance simply will not load the model. Bigger models need bigger (or multiple) GPUs, and that choice drives both whether the model runs at all and what it costs per hour. Some models can also run on AWS's own inference silicon (Inferentia) via the Neuron stack for a lower cost-per-inference than general-purpose GPUs.
The second decision is endpoint mode. A real-time endpoint is always on and bills per instance-hour whether or not requests arrive — the right choice for steady, latency-sensitive traffic, and the classic source of surprise cost when left idle. For spiky or intermittent traffic, deploying behind serverless or asynchronous inference (where supported for the model size) lets the endpoint scale toward zero so you are not paying for idle GPU time. Large LLMs often exceed serverless limits and must run real-time, which is why their hosting cost needs scoping up front.
For very large models, JumpStart supports optimized serving — large-model-inference containers that shard a model across multiple GPUs, plus options for quantization and compilation that reduce the memory footprint and improve throughput. These let you host a model that would not fit on a single accelerator, or serve a given model more cheaply, without writing the distributed-serving plumbing yourself.
In practice the flow is: open JumpStart in Studio, find the model, read its card for licence and recommended instance, and click deploy (or call JumpStartModel(...).deploy(...) in the SDK). SageMaker provisions the instance, pulls the container, loads the weights, and returns an endpoint name in a handful of minutes for smaller models, longer for very large ones.
You then invoke the endpoint with your prompts or inputs, validate quality and latency, and — critically — delete the endpoint when you are done testing. A forgotten GPU-backed real-time endpoint is the most common way to run up an unexpected SageMaker bill. Once the model is proven, you wrap deployment in a Pipeline so it is repeatable and versioned.
Beyond deploying a model as-is, JumpStart lets you fine-tune many of its models on your own dataset — adapting an open foundation or task model to your domain without writing the training loop.
Fine-tuning in JumpStart follows the same few-clicks-or-few-lines pattern as deployment. You point a fine-tuning-enabled model at your training data in S3, set a small number of hyperparameters (epochs, learning rate, and for LLMs often the adapter configuration), and choose a training instance. SageMaker runs a managed training job behind the scenes, writes the fine-tuned model artifact back to S3, and you deploy that artifact to an endpoint exactly as you would the base model. You never assemble the training script yourself; JumpStart supplies a tested one for each supported model.
Two broad fine-tuning styles show up. Parameter-efficient fine-tuning (commonly LoRA-style adapter tuning) trains a small set of additional weights on top of a frozen base model — much cheaper and faster, ideal for adapting an LLM's tone, format, or domain knowledge on a modest dataset. Full fine-tuning updates the base weights themselves — more expensive and data-hungry, used when you need to shift the model more deeply. JumpStart's model cards indicate which style each model supports.
For task-specific models the same mechanism does transfer learning: you take a model pretrained on a large generic dataset (say image classification on a broad corpus) and fine-tune it on your labelled images, which typically needs far less data and compute than training from scratch. This is the bread-and-butter use of JumpStart for classical computer-vision and NLP teams.
The cost shape of fine-tuning is the same as any SageMaker training job: you pay for the training instance for the seconds the job runs, then it disappears. Parameter-efficient tuning of an LLM on a small dataset can be inexpensive; full fine-tuning of a large model on a large dataset can be substantial. Spot instances and right-sized instance selection are the main levers, and the resulting artifact then carries the ongoing hosting cost once deployed.
Fine-tuning is not always the right first move. If you mainly need the model to use your facts, retrieval-augmented generation (RAG) is often cheaper and easier to keep current than fine-tuning. If you need it to follow a format or tone, parameter-efficient fine-tuning fits well. Start with prompting, add RAG for knowledge, and fine-tune when prompting and RAG genuinely fall short.
This is the question most teams arrive with, because JumpStart and Bedrock can both get you to "an open model answering prompts." The difference is who manages the infrastructure — and it changes cost, control, and operations.
SageMaker JumpStart gives you a self-managed endpoint. When you deploy Llama or Mistral from JumpStart, the model runs on an instance you chose, in your account and VPC, that you keep running and scale. You control the instance type, the serving configuration, the networking, and the deployment lifecycle. You can fine-tune deeply, inspect the model, and run it in fully isolated environments. The trade-off is that you own the operations: you pick the GPU, you manage scaling, and you pay per instance-hour for as long as the endpoint exists — including while it sits idle.
Amazon Bedrock gives you a fully-managed, serverless API. You call a foundation model — Anthropic's Claude, Meta's Llama, Mistral, Amazon's Nova and Titan, Cohere, and more — through one consistent interface, and pay per token. There is no instance, no container, no GPU to manage, and nothing to keep running between calls; if you send no requests, you pay nothing for inference. Bedrock also layers on Agents, Knowledge Bases (managed RAG), Guardrails, and managed fine-tuning. The trade-off is less low-level control: you use the models and customization options Bedrock exposes, not an arbitrary open-weights checkpoint on an instance you hand-tuned.
A subtle point that confuses people: some of the same model families appear in both. Llama and Mistral are available through Bedrock as a managed API and deployable via JumpStart to your own endpoint. The model can be the same; the operational model is not. Through Bedrock you call it and pay per token; through JumpStart you host it and pay per instance-hour. Which is cheaper depends entirely on utilization — see the cost section.
The deciding questions: Do you need a specific open-weights checkpoint, deep control over the serving environment, or full network isolation? If yes, JumpStart's self-managed endpoint is the right tool. Do you want the simplest path, no infrastructure to run, and pay-per-use economics for variable traffic? If yes, Bedrock's managed API is shorter and cheaper to start. And the two are not mutually exclusive — a team might call Claude on Bedrock for its main assistant while hosting a fine-tuned open model on a JumpStart endpoint for a specialized task. The dedicated Bedrock vs SageMaker page goes deeper on the platform-level comparison.
If your answer to "do I need to host and control the model myself?" is no, start with Bedrock (managed API, per token, scales to zero). If it is yes — a specific open checkpoint, deep control, full isolation — use JumpStart (your endpoint, your instance, per instance-hour). Many teams use both.
JumpStart itself adds no licence fee. You pay normal SageMaker rates for the compute it provisions — and because a JumpStart endpoint is real infrastructure, the bill behaves very differently from a per-token API.
There are three cost lines. Inference (hosting) is the big one: a deployed model runs on an instance billed per instance-hour, and for a real-time endpoint that meter runs 24/7 regardless of traffic. Fine-tuning (training) is the spiky one: you pay for the training instance only for the seconds the job runs. Supporting costs are the small ones: S3 storage for model artifacts and data, plus any data transfer. The dominant lever on all of it is GPU instance choice — a large accelerated instance can cost many times a small one per hour.
The economic difference from Bedrock is worth stating plainly. Bedrock bills per token, so cost tracks usage and falls to zero when idle. A JumpStart real-time endpoint bills per instance-hour, so cost is roughly flat regardless of how many requests you serve. That means a self-hosted JumpStart endpoint can be cheaper than Bedrock at high, steady utilization (you are amortizing a fixed instance cost over heavy traffic), and much more expensive at low or bursty utilization (you pay for idle GPU time). The break-even depends on the specific model and instance — model the two against your real traffic before committing.
The levers to control JumpStart hosting cost: pick the smallest instance that holds the model and meets your latency target; use serverless or asynchronous inference for intermittent traffic so the endpoint can scale toward zero; consider Inferentia instances for a lower cost-per-inference on supported models; apply quantization/compilation via the large-model-inference containers to fit a smaller instance; use SageMaker Savings Plans for steady, predictable hosting; and — most important — delete endpoints you are not using. For fine-tuning, Spot instances and right-sizing are the main savings.
For exact, current per-instance rates, worked hosting and fine-tuning examples, the Inferentia and Savings Plans math, and the full set of cost levers, see the dedicated SageMaker pricing breakdown — and verify live rates on the AWS pricing page, since GPU pricing in particular moves. The representative figures here are directional as of 2026, not a quote.
JumpStart is the right tool in a specific band: when you want a pretrained model fast but still need to own the deployment. Knowing the edges of that band saves wasted setup.
JumpStart shines when you want the speed of a prebuilt model with the control of your own endpoint. The honest fit assessment:
Self-hosted model endpoints and fine-tuning runs are exactly the spend AWS credit programs are designed to absorb. A funded team can host open models, fine-tune them, and run experiments on credits rather than burning cash on GPU hours — which is where CloudRoute fits (next section).
Going from zero to a deployed open model is a short, well-trodden path. Here is the realistic sequence for a team's first JumpStart project.
1 · Set up the SageMaker domain. In the AWS console, create a SageMaker domain and user profile with an IAM execution role that can read your S3 data and write artifacts. This is the one-time setup that opens Studio — the same prerequisite as any SageMaker work.
2 · Open JumpStart in Studio and pick a model. Launch Studio, open the JumpStart panel, and browse the catalogue. Read the model card for the licence, the recommended instance type, and whether fine-tuning is supported. For a first run, choose a smaller model so the instance is cheap and loads quickly.
3 · Deploy to an endpoint and test. Click deploy (or call the SDK), wait for the endpoint to come up, and invoke it with real prompts or inputs. Validate quality and latency. For a first deployment prefer a modest instance or serverless (where the model size allows) so a forgotten endpoint cannot quietly run up cost.
4 · Fine-tune if needed. If the base model is close but not domain-specific, point a fine-tuning-enabled JumpStart model at your S3 dataset, set the few hyperparameters, and run the managed training job. Deploy the resulting artifact the same way you deployed the base model.
5 · Productionize. Once it works, move from clicks to code: drive deployment and tuning from the SageMaker SDK, wrap them in a Pipeline, register the model, and add Model Monitor. This turns a JumpStart prototype into a repeatable, governed system.
Cost discipline from day one: delete test endpoints the moment you are done, right-size instances, use Spot for fine-tuning, and prefer serverless or batch over always-on real-time until traffic justifies it. The single most common JumpStart cost mistake is an idle GPU endpoint left running after an experiment.
The clearest way to choose: line up the two on the dimensions that drive the decision. JumpStart optimizes for control and ownership of an open model; Bedrock optimizes for zero-ops convenience and pay-per-use economics.
| Dimension | SageMaker JumpStart | Amazon Bedrock |
|---|---|---|
| What it is | Model hub — deploy/fine-tune open models to your own endpoint | Managed API to call foundation models |
| You manage infrastructure? | Yes — you pick the instance, VPC, scaling | No — fully managed, serverless |
| Where the model runs | Your SageMaker endpoint, your account | AWS-managed, multi-tenant API |
| Pricing basis | Per instance-hour (hosting) + training | Per token (on-demand / batch / provisioned) |
| Cost when idle | Still billed (real-time endpoint runs 24/7) | Zero (pay per token) |
| Best cost at | High, steady utilization | Low or bursty utilization |
| Control / customization | Full — instance, container, deep fine-tune | Managed fine-tune + Agents, KBs, Guardrails |
| Best for | Owning/fine-tuning a specific open model | Fastest path to a GenAI feature, no ops |
Situation: For data-residency and client-confidentiality reasons they needed the model running inside their own VPC, not on a shared multi-tenant API — so a fully-managed API alone did not satisfy the requirement. They chose to deploy an open model (a Mistral variant) via SageMaker JumpStart to a self-managed endpoint and fine-tune it on their annotated clause dataset. GPU hosting for the always-on endpoint plus the fine-tuning runs were projected at ~$7K/month, which the seed budget could not absorb during the build.
What CloudRoute did: Routed within 19 hours to a UK partner with an ML / SageMaker track record. The partner filed an Activate Portfolio application for general AWS infrastructure plus a Bedrock/GenAI PoC application for the GenAI workload, and advised parameter-efficient (LoRA) fine-tuning to keep training cost down and an Inferentia-backed endpoint to lower cost-per-inference versus a general-purpose GPU.
Outcome: Credits approved within 16 days, covering the JumpStart fine-tuning runs and the self-managed endpoint. The team fine-tuned and shipped the contract-analysis model inside their own VPC on credits, kept full control of the deployment for their compliance story, and cut projected inference cost with the Inferentia + LoRA combination. CloudRoute's commission was paid by the partner from AWS engagement funding — the startup paid $0.
matched in: < 24h · credits secured: 6-figure · inference cost cut: meaningful · cost to customer: $0
CloudRoute connects ML and data-science teams with vetted AWS partners who deploy and fine-tune open models on SageMaker JumpStart and file the credit applications that fund the endpoints and training runs. Customer pays $0 — AWS funds it.