Amazon SageMaker · the complete 2026 guide

Amazon SageMaker — AWS's end-to-end machine-learning platform, explained.

Q: What is the difference between SageMaker and Amazon Bedrock?

Bedrock is a managed API for calling existing foundation models (Claude, Llama, Nova, Mistral, and more) — you never touch infrastructure and you pay per token. SageMaker is a full platform for building, training, and deploying your own models on your own instances, billed per instance-second. Use Bedrock when an existing foundation model already does what you need; use SageMaker when you need to train, fine-tune deeply, run classical/tabular ML, or control the serving environment. Many teams use both.

Q: Does SageMaker cost money to use even if I am not training anything?

There is no licence fee, but you pay for any compute that is running. The classic source of unexpected cost is an always-on real-time endpoint or an idle Studio app left running — those bill per hour whether or not you send requests. Serverless inference and batch transform scale to zero, so they do not bill when idle. Always shut down test endpoints and notebook apps you are not using.

Q: What are the four types of SageMaker inference endpoints?

Real-time (persistent, always-on, millisecond latency for steady online traffic); serverless (scales to zero, pay per inference, for spiky traffic, with occasional cold starts); asynchronous (queued, for large payloads or long-running inferences, can scale to zero); and batch transform (a transient job that scores a whole dataset in S3 with no persistent endpoint, for offline bulk scoring). Choosing the right one is the biggest cost-and-latency lever in serving.

Q: Can I run open-source or Hugging Face models on SageMaker?

Yes. SageMaker is framework-agnostic — it runs PyTorch, TensorFlow, JAX, Hugging Face transformers, XGBoost, scikit-learn, and custom containers. SageMaker JumpStart provides hundreds of pre-trained open and proprietary models (Llama, Mistral, Falcon, Stable Diffusion, and more) that you can deploy or fine-tune in a few clicks.

Q: What is SageMaker Pipelines and why does it matter?

Pipelines is SageMaker's purpose-built workflow orchestrator — effectively CI/CD for machine learning. It chains the steps of an ML workflow (preprocess, train, evaluate, conditionally register, deploy) into a repeatable, versioned DAG. Combined with the Model Registry, Clarify, and Model Monitor, it is the backbone of MLOps on SageMaker: it turns a one-off notebook experiment into an automated, governed, retrainable production system.

Q: Do I need SageMaker if I only want to build a chatbot or RAG app?

Usually not. If you want a chat assistant, summarizer, or retrieval-augmented-generation app over an existing foundation model, Amazon Bedrock (with its Knowledge Bases and Agents) is the simpler, faster, cheaper-to-start path — you are calling a model, not training one. Reach for SageMaker when you need to train or deeply control a model yourself, or when you have classical-ML workloads Bedrock cannot serve.

Q: Can AWS credits cover SageMaker training and hosting?

Yes. AWS credit programs apply to SageMaker compute (training jobs and endpoints), storage, and features just as they do to other AWS services. Activate Portfolio (up to $100K), Bedrock/GenAI PoC funding ($10K–$50K), and the Generative AI Accelerator (up to $1M) can all fund SageMaker workloads. CloudRoute routes you to a vetted AWS partner who files the application; the customer pays $0 because AWS funds the credit pool and the partner pays CloudRoute a routing commission.

SageMaker is the platform AWS gives data-science and ML teams to build, train, tune, deploy, and operate models — from a Jupyter notebook all the way to a versioned, monitored production endpoint. This guide covers every major component, the full ML lifecycle on SageMaker, how it differs from Amazon Bedrock (and when you use both), what it costs, and how to get AWS credits to fund your training and hosting.

Get matched in 24h →→ SageMaker vs Bedrock

platform scope

build → deploy

endpoint modes

lifecycle stages

credits to fund it

up to $1M

TL;DR

Amazon SageMaker is AWS's fully-managed, end-to-end machine-learning platform. It spans the whole lifecycle — data labeling and preparation, notebook-based experimentation in Studio, distributed training jobs, hyperparameter tuning, model registry and governance, four kinds of inference endpoints, and MLOps automation via Pipelines. You bring the model and the data; SageMaker manages the infrastructure.
SageMaker and Amazon Bedrock solve different problems. Bedrock is a managed API to call existing foundation models (Claude, Llama, Nova, etc.) — you never touch infrastructure. SageMaker gives you full control to train, fine-tune deeply, and deploy your own models on your own instances. Many teams use both: Bedrock for generative-AI features, SageMaker for custom/classical ML.
You pay only for the compute, storage, and managed features you use — there is no SageMaker licence fee. The dominant cost is instance-time for training and for always-on endpoints, so GPU choice and endpoint mode drive the bill. AWS credit programs (Activate up to $100K, Bedrock/GenAI PoC $10K–$50K, GenAI Accelerator up to $1M) can fund both training runs and 24/7 hosting — CloudRoute routes you to the partner who files them; you pay $0.

definition

IWhat Amazon SageMaker actually is

Amazon SageMaker is a fully-managed service that covers the entire machine-learning lifecycle on AWS — building, training, tuning, deploying, and operating models — without you having to provision, patch, or scale the underlying servers yourself.

The cleanest one-line definition: SageMaker is the managed platform you use to take a machine-learning model from a blank notebook to a monitored production endpoint, with AWS handling the infrastructure at every step. Where a raw EC2 GPU instance gives you a bare box you must configure, secure, and babysit, SageMaker gives you managed primitives — a training job, an endpoint, a pipeline — that spin the right compute up, run your code, and tear it back down.

It is deliberately broad. SageMaker is not one feature; it is a suite of roughly a dozen capabilities under one umbrella, sharing one IAM model, one billing surface, and one console (SageMaker Studio). That breadth is the point: a data-science team can do experimentation, large-scale distributed training, model governance, real-time serving, and batch scoring without leaving the platform or stitching together five separate tools.

It is also model-agnostic and framework-agnostic. SageMaker runs PyTorch, TensorFlow, JAX, Hugging Face transformers, XGBoost, scikit-learn, and custom containers equally well. It is used for everything from classical tabular ML (fraud scoring, churn, demand forecasting) to deep learning (computer vision, recommendation) to training and serving large language and foundation models. The same platform that fine-tunes a gradient-boosted tree also fine-tunes a multi-billion-parameter transformer.

A useful mental model: Bedrock is "AI as an API call"; SageMaker is "the full ML factory." If you want to call a foundation model someone else trained, Bedrock is the shorter path. If you need to train, fine-tune deeply, or serve your own model with control over the instance, the container, and the scaling behaviour, SageMaker is the tool. We unpack that distinction in detail in section V.

the rename you may have seen

In late 2024 AWS expanded the brand to Amazon SageMaker as a unified platform that also folds in data, analytics, and SQL tooling, with the original ML capability now positioned as SageMaker AI inside it. For practical purposes — and throughout this guide — "SageMaker" means the end-to-end ML capability (Studio, training, endpoints, pipelines). Check the AWS console for the exact current product nesting in your account.

the toolbox

IIThe key components, one by one

SageMaker's breadth is easiest to understand component by component. Each one maps to a stage of ML work; together they cover the lifecycle. Here are the parts you will actually touch.

You will not use every component on every project — a team serving one classical model may only touch Studio, a training job, and a real-time endpoint. But knowing the full toolbox tells you what is available when a project grows from a prototype into a governed production system.

SageMaker Studio (the IDE) — The web-based integrated development environment that is the front door to everything else. Jupyter notebooks, a code editor, experiment tracking, a visual pipeline view, and one-click access to training and deployment. Studio is where data scientists live day to day.
JumpStart (the model + solution hub) — A catalogue of hundreds of pre-trained, open and proprietary foundation models and built-in solution templates you can deploy or fine-tune in a few clicks — Llama, Mistral, Falcon, Stable Diffusion, and many task-specific models. The fastest way to go from "I want to try this model" to a running endpoint inside SageMaker.
Training jobs — Managed, ephemeral compute for model training. You specify the instance type and count, the container/framework, the data location in S3, and the hyperparameters; SageMaker provisions the cluster, runs the job, writes the model artifact back to S3, and shuts the cluster down. Supports distributed training across many GPUs, Spot instances for cheaper training, and warm pools to cut start-up latency.
Automatic Model Tuning (hyperparameter optimization) — Runs many training jobs in parallel across a search space (Bayesian, grid, random, or Hyperband) to find the best hyperparameters automatically, instead of hand-tuning.
Inference endpoints — Four modes for serving predictions — real-time, serverless, asynchronous, and batch transform. Choosing among them is the single biggest cost-and-latency decision in serving; section IV covers each.
Pipelines (MLOps / CI-CD for ML) — A purpose-built workflow orchestrator that chains the steps of an ML workflow — preprocess, train, evaluate, conditionally register, deploy — into a repeatable, versioned DAG. The backbone of MLOps on SageMaker.
Feature Store — A managed repository for ML features with both an online store (low-latency reads for real-time inference) and an offline store (for training), so the exact same feature definitions are used at training time and serving time — eliminating training/serving skew.
Model Registry — A versioned catalogue of trained models with approval status, lineage, and metadata. Promotes governance: a model moves from "pending" to "approved" before a pipeline is allowed to deploy it.
Ground Truth (data labeling) — Managed data-labeling workflows — human annotators (your own, a vendor workforce, or Mechanical Turk) plus automated/active-learning labeling to cut cost. Turns raw data into labeled training sets.
Data Wrangler — A visual data-preparation tool inside Studio for importing, exploring, transforming, and feature-engineering tabular data with little or no code, exportable directly into a training pipeline.
Clarify — Bias detection and explainability. Measures dataset and model bias across sensitive attributes and produces feature-importance explanations (SHAP-based) for predictions — increasingly required for regulated use cases.
Model Monitor — Watches deployed endpoints for data drift, quality drift, bias drift, and feature-attribution drift, and alerts you when live traffic diverges from the training distribution.

how they fit together

A mature setup looks like: Ground Truth labels data → Data Wrangler + Feature Store prepare features → a training job (tuned by Automatic Model Tuning) produces an artifact → Clarify checks bias → the model lands in the Model Registry → a Pipeline deploys an approved version to an endpoint → Model Monitor watches it in production. All orchestrated from Studio.

the workflow

IIIThe machine-learning lifecycle on SageMaker

The components above map onto a repeatable lifecycle. Walking the seven stages in order is the clearest way to see how a model gets from idea to production — and where each tool plugs in.

Every ML project moves through roughly the same arc. SageMaker's design mirrors that arc, which is why teams adopt the whole platform rather than picking one piece: each stage hands off cleanly to the next.

1 · Prepare & label the data

Raw data lands in Amazon S3 (often via a data lake or feature pipeline). Ground Truth produces labels where you need supervised data; Data Wrangler cleans and transforms tabular inputs; the Feature Store records the resulting features so they are reusable and consistent between training and serving. Most real-world ML time is spent here, not in modeling.

2 · Experiment & build in Studio

A data scientist opens a notebook in Studio, pulls a candidate model from JumpStart or writes one in PyTorch/TensorFlow, and iterates on a small sample. SageMaker Experiments tracks each run's parameters and metrics so results are comparable rather than lost in notebook cells.

3 · Train at scale

When the approach looks promising, the work graduates from the notebook to a managed training job on the right instance type — often a GPU instance, sometimes a multi-node distributed cluster for large models. Automatic Model Tuning sweeps hyperparameters. Training runs are ephemeral: you pay for the seconds the cluster exists, then it disappears.

4 · Evaluate & check for bias

The trained artifact is evaluated against a hold-out set; Clarify measures bias and produces explainability reports. A pipeline can gate on these metrics — only models that clear an accuracy/bias threshold proceed.

5 · Register & govern

Approved models are versioned in the Model Registry with lineage (which data, which code, which hyperparameters produced this artifact). This is the governance checkpoint: a human or an automated rule approves a model version before deployment.

6 · Deploy to an endpoint

The approved model is deployed to one of four endpoint types (real-time, serverless, asynchronous, or batch transform) depending on traffic shape and latency needs. SageMaker handles the container, the auto-scaling, and the load balancing.

7 · Monitor & retrain

Model Monitor watches the live endpoint for drift; when accuracy degrades or the input distribution shifts, a Pipeline can automatically kick off retraining — closing the loop. This is what "MLOps" means in practice: the lifecycle is automated and repeatable, not a one-off manual deploy.

serving

IVThe four ways to serve a model — and when to use each

Inference is where most production cost and latency live. SageMaker offers four distinct deployment modes, and picking the wrong one is the most common way to overspend. Here is each, plainly.

The decision turns on two questions: how predictable is your traffic, and how fast does each prediction need to come back? Match those to the four modes below.

Real-time endpoints — A persistent endpoint on always-on instances behind an auto-scaling group, returning predictions in milliseconds. Best for steady, latency-sensitive online traffic (a live recommendation API, a fraud check in the checkout flow). You pay for the instances 24/7 whether or not requests arrive — so this is the most expensive idle mode, and the one to scope carefully.
Serverless inference — SageMaker provisions and scales compute automatically per request and scales to zero when idle, so you pay only for compute used during inference. Best for intermittent or spiky traffic where an always-on endpoint would sit idle. Trade-off: occasional cold-start latency on the first request after a quiet period.
Asynchronous inference — Requests are queued and processed in the background; results are written to S3 and you are notified when ready. Built for large payloads (big images/documents) or long-running inferences where a synchronous response is not required. Can scale the endpoint to zero between bursts.
Batch transform — No persistent endpoint at all. You point a transient job at a dataset in S3, SageMaker spins up compute, scores every record, writes results back to S3, and tears the compute down. Best for offline, scheduled scoring of large datasets (nightly churn scores for the whole user base) where no online latency is needed — and usually the cheapest path for bulk scoring.

SageMaker inference modes · choosing the right one (2026)

Mode	Traffic shape	Latency	Scales to zero?	Billing basis	Typical use
Real-time	Steady, online	Milliseconds	No (always-on)	Per instance-hour, 24/7	Live API, fraud check
Serverless	Spiky / intermittent	Ms (cold-start risk)	Yes	Per inference compute used	Bursty internal apps
Asynchronous	Large payloads, bursts	Seconds–minutes	Yes	Per instance-time while busy	Big docs/images, long inferences
Batch transform	Offline, scheduled	N/A (not online)	N/A (transient job)	Per job instance-time	Nightly bulk scoring

Rule of thumb: steady online traffic → real-time; spiky online → serverless; big or slow inferences → asynchronous; whole-dataset offline scoring → batch transform. The full cost mechanics of each are in the SageMaker pricing breakdown.

the key distinction

VSageMaker vs Amazon Bedrock — managed FM API vs full ML control

This is the most common question teams arrive with, and the answer is not "either/or." SageMaker and Bedrock sit at different points on the control-vs-convenience spectrum, and a large share of teams run both.

Amazon Bedrock is a fully-managed API to call existing foundation models — Anthropic's Claude, Meta's Llama, Mistral, Amazon's own Nova and Titan, Cohere, Stability AI, AI21, DeepSeek — through one consistent interface, with enterprise privacy (your prompts and data are not used to train the base models and stay in your account and region). You never see an instance, a container, or a GPU. You send tokens, you get tokens back, you pay per token. Bedrock also layers on Agents, Knowledge Bases (managed RAG), Guardrails, fine-tuning, and Flows on top of those models.

Amazon SageMaker gives you the full ML stack and full control. You choose the model (including your own from-scratch architectures), the framework, the instance type, the container, the training regime, and the scaling behaviour. You can train a model that does not exist anywhere else, fine-tune deeply (not just adapter tuning), and serve it exactly how you want. With that control comes more responsibility: you own the instance selection, the scaling configuration, and the operational tuning.

The deciding question is usually: does a foundation model that already does what you need exist on Bedrock? If yes — you want a chat assistant, a summarizer, a RAG system over your docs, a coding helper — Bedrock is the shorter, cheaper-to-start path; you are calling a model, not running infrastructure. If no — you have a proprietary model, a classical-ML problem (tabular fraud/forecasting/recommendation), a need to fine-tune weights deeply, or strict control requirements over the serving environment — SageMaker is the right tool.

And the two genuinely complement each other. A common architecture: Bedrock powers the generative-AI features (the customer-facing assistant, the document Q&A), while SageMaker trains and serves the company's proprietary models (the recommendation engine, the demand forecaster, a fine-tuned domain model). You can also deploy open foundation models from SageMaker JumpStart when you want full control over an open-weights model rather than calling it through Bedrock. The comparison table below lays the two side by side; the dedicated Bedrock vs SageMaker page goes deeper.

the one-sentence test

If your answer to "do I need to train or deeply control the model myself?" is no, start with Bedrock. If it is yes, you need SageMaker. Plenty of teams answer "yes for some workloads, no for others" — and run both.

cost shape

VIPricing overview — what drives the SageMaker bill

There is no licence fee for SageMaker. You pay for the underlying compute, storage, and managed features you actually use, billed per second for compute. Understanding the shape of the bill matters more than memorizing rates.

The cost is dominated by two things: training compute (the instance-seconds your training jobs consume, which spike then disappear) and inference compute (the instances behind your endpoints, which — for real-time endpoints — run continuously). GPU instance choice is the single biggest lever on both: a high-end accelerator can cost many times what a CPU or smaller GPU instance does per hour.

Secondary costs include Studio/notebook compute while a data scientist is working, storage (S3 for data and model artifacts, plus any provisioned volumes), Feature Store reads/writes and storage, Data Wrangler processing, Ground Truth labeling, and data processing/transfer. Each is modest next to the compute line, but they add up at scale.

AWS offers SageMaker Savings Plans — commit to a steady dollar-per-hour of usage for one or three years in exchange for a meaningful discount versus on-demand, covering Studio, training, and real-time inference usage. For training specifically, Spot instances can cut compute cost substantially in exchange for interruptibility (managed Spot training checkpoints automatically). And the endpoint mode you pick (section IV) changes the bill more than almost anything else: an always-on real-time endpoint that sits mostly idle is the classic source of surprise SageMaker spend.

For exact, current per-instance and per-feature rates, two worked examples (training a model; hosting an endpoint 24/7), an instance-and-GPU cost table, the Savings Plans math, and the cost-optimization levers, see the dedicated SageMaker pricing breakdown — and verify live rates on the AWS pricing page, since GPU pricing in particular moves.

fit

VIIWho SageMaker is for — and who should look elsewhere

SageMaker is built for teams that own models, not just teams that call them. Knowing whether that is you saves a lot of wasted setup.

SageMaker is squarely aimed at data-science and ML engineering teams who need to build, train, and operate models as a core part of what they ship. The honest fit assessment:

A strong fit if you train or fine-tune your own models — Custom architectures, proprietary models, deep fine-tuning of open-weights models, or classical ML (tabular fraud, churn, forecasting, recommendation) all live naturally on SageMaker.
A strong fit if you need MLOps and governance at scale — When you are running many models in production and need versioning, lineage, automated retraining, drift monitoring, and bias/explainability checks, the Pipelines + Model Registry + Clarify + Model Monitor stack is purpose-built for it.
A strong fit if you need control over the serving environment — Specific instance types, custom containers, particular scaling behaviour, or AI-silicon (Trainium for training, Inferentia for inference) to cut cost — SageMaker exposes all of it.
Probably overkill if you only want to call a foundation model — If you want a chat assistant, summarizer, or RAG system over an existing FM and have no need to train anything, Amazon Bedrock is simpler and faster to start. Reach for SageMaker when you outgrow "just calling a model."
Not the right layer for a non-technical team — SageMaker assumes ML/engineering skills. A business team that wants an out-of-the-box GenAI assistant over company data is better served by Amazon Q Business than by standing up SageMaker.

the credits angle for ML teams

Training runs and 24/7 endpoints are exactly the kind of spend AWS credit programs are designed to absorb. A funded ML team can run experiments, train models, and host endpoints on credits rather than burning cash — which is where CloudRoute fits (next section).

first steps

VIIIGetting started on SageMaker

Going from zero to a deployed model is a short, well-trodden path. Here is the realistic sequence for a team's first project.

1 · Set up the SageMaker domain. In the AWS console, create a SageMaker domain (the account-level container for Studio) and a user profile, with an IAM execution role that can read your S3 data and write artifacts. This is a one-time setup.

2 · Open Studio and get data in S3. Launch SageMaker Studio, open a notebook, and point it at your training data in S3. For a first project, JumpStart gives you a working model in a few clicks so you can see the end-to-end flow before writing custom code.

3 · Run a training job. Move from in-notebook experimentation to a managed training job: specify the instance type (start small — a single GPU or even CPU for a first run), the framework container, and the data location. SageMaker provisions, trains, writes the artifact to S3, and tears down.

4 · Deploy an endpoint. Deploy the trained model. For a first deployment, serverless inference is a low-risk choice — it scales to zero, so a forgotten test endpoint will not quietly run up cost the way an always-on real-time endpoint would.

5 · Add governance as you grow. Once the prototype works, wrap it in a Pipeline, register the model, and add Model Monitor. This is the step that turns a notebook experiment into a maintainable production system.

Cost discipline from day one: shut down idle Studio apps and test endpoints, use Spot for training, and prefer serverless/batch over always-on real-time until traffic justifies it. The single most common SageMaker cost mistake is an idle real-time endpoint left running after an experiment.

side by side

SageMaker vs Amazon Bedrock — the decision table

The clearest way to choose: line up the two on the dimensions that actually drive the decision. Bedrock optimizes for convenience and speed-to-first-call; SageMaker optimizes for control and ownership.

Dimension	Amazon Bedrock	Amazon SageMaker
What it is	Managed API to existing foundation models	End-to-end platform to build/train/deploy your own models
You manage infrastructure?	No — fully managed, serverless	Yes — you choose instances, containers, scaling
Train a model from scratch?	No	Yes
Fine-tune?	Yes (managed, on supported models)	Yes (full, deep — any framework)
Classical / tabular ML?	No (it is foundation models only)	Yes (XGBoost, scikit-learn, etc.)
Pricing basis	Per token (on-demand / batch / provisioned)	Per instance-second (compute) + storage + features
Time to first result	Minutes (one API call)	Hours–days (set up domain, train, deploy)
Best for	GenAI features over existing FMs	Custom models, deep control, MLOps, classical ML

Not mutually exclusive. A common production pattern runs Bedrock for generative-AI features and SageMaker for proprietary/classical models in the same account. See the dedicated Bedrock vs SageMaker page for the deep comparison.

training and hosting add up fast

Fund your SageMaker training and endpoints with AWS credits — pay $0

Get matched in 24h →

a recent match

A SageMaker build, credit-funded — anonymized

inquiry · seed-stage logistics-AI, Singapore

Seed-stage logistics-AI startup, 9 people, building a proprietary delivery-time prediction model on AWS

Situation: Their core product was a custom demand-and-ETA forecasting model — not something an off-the-shelf foundation model could do, so Bedrock alone was not enough. They needed SageMaker for training and serving, plus a small Bedrock-powered assistant for customer support. Training GPU runs and an always-on real-time endpoint were projected at ~$6K/month, which the seed budget could not absorb during the build.

What CloudRoute did: Routed within 20 hours to an APAC partner with an ML / SageMaker track record. The partner filed an Activate Portfolio application for general AWS infrastructure plus a Bedrock/GenAI PoC application for the support-assistant workload, and advised splitting serving into batch transform for nightly bulk ETA scoring plus a small serverless endpoint for live lookups — cutting the always-on real-time cost.

Outcome: Credits approved within 15 days, covering the SageMaker training runs, the Feature Store, and the endpoints. The team trained and shipped the forecasting model on credits, ran the Bedrock support assistant alongside it, and re-architected serving (batch + serverless) to roughly halve projected monthly inference cost. CloudRoute's commission was paid by the partner from AWS engagement funding — the startup paid $0.

matched in: < 24h · credits secured: 6-figure · serving cost cut: ~50% · cost to customer: $0

faq

Common questions

What is Amazon SageMaker in one sentence?

Amazon SageMaker is AWS's fully-managed, end-to-end machine-learning platform: it lets data-science and ML teams build, train, tune, deploy, and operate models — from notebook experimentation in SageMaker Studio through distributed training jobs to versioned, monitored production endpoints — with AWS managing the underlying infrastructure at every step.

What is the difference between SageMaker and Amazon Bedrock?

Bedrock is a managed API for calling existing foundation models (Claude, Llama, Nova, Mistral, and more) — you never touch infrastructure and you pay per token. SageMaker is a full platform for building, training, and deploying your own models on your own instances, billed per instance-second. Use Bedrock when an existing foundation model already does what you need; use SageMaker when you need to train, fine-tune deeply, run classical/tabular ML, or control the serving environment. Many teams use both.

Does SageMaker cost money to use even if I am not training anything?

There is no licence fee, but you pay for any compute that is running. The classic source of unexpected cost is an always-on real-time endpoint or an idle Studio app left running — those bill per hour whether or not you send requests. Serverless inference and batch transform scale to zero, so they do not bill when idle. Always shut down test endpoints and notebook apps you are not using.

What are the four types of SageMaker inference endpoints?

Real-time (persistent, always-on, millisecond latency for steady online traffic); serverless (scales to zero, pay per inference, for spiky traffic, with occasional cold starts); asynchronous (queued, for large payloads or long-running inferences, can scale to zero); and batch transform (a transient job that scores a whole dataset in S3 with no persistent endpoint, for offline bulk scoring). Choosing the right one is the biggest cost-and-latency lever in serving.

Can I run open-source or Hugging Face models on SageMaker?

Yes. SageMaker is framework-agnostic — it runs PyTorch, TensorFlow, JAX, Hugging Face transformers, XGBoost, scikit-learn, and custom containers. SageMaker JumpStart provides hundreds of pre-trained open and proprietary models (Llama, Mistral, Falcon, Stable Diffusion, and more) that you can deploy or fine-tune in a few clicks.

What is SageMaker Pipelines and why does it matter?

Pipelines is SageMaker's purpose-built workflow orchestrator — effectively CI/CD for machine learning. It chains the steps of an ML workflow (preprocess, train, evaluate, conditionally register, deploy) into a repeatable, versioned DAG. Combined with the Model Registry, Clarify, and Model Monitor, it is the backbone of MLOps on SageMaker: it turns a one-off notebook experiment into an automated, governed, retrainable production system.

Do I need SageMaker if I only want to build a chatbot or RAG app?