Beyond the core serverless providers, Bedrock Marketplace puts 100-plus specialized and emerging foundation models — domain-tuned, multilingual, embedding, vision, and frontier-research models — behind the same Bedrock API and tooling. The catch and the point: most are served from a managed endpoint you deploy onto an instance, not from the pay-per-token serverless pool. This is the full reference: serverless vs. marketplace, how discovery works, the deploy-and-invoke flow, endpoint/instance billing, when a marketplace model beats a core model, and the security and governance that carry across.
Amazon Bedrock Marketplace is a catalog inside Bedrock that gives you access to over 100 specialized and emerging foundation models — beyond the handful of core providers offered as always-on serverless endpoints — and lets you deploy and call them through the same Bedrock API, console, and security controls you already use.
The core Bedrock catalog is broad but curated: a set of provider families (Anthropic Claude, Meta Llama, Mistral, Amazon Nova and Titan, Cohere, Stability AI, AI21, DeepSeek) served as fully-managed, pay-per-token serverless endpoints. That covers the large majority of production needs. But it is not the whole frontier. There are hundreds of high-quality models in the world — vertical-domain models trained on medical, legal, financial, or scientific corpora; strong multilingual models for languages the core set under-serves; specialized embedding and reranking models; vision and document-understanding models; and brand-new research releases that appear weeks or months before (or instead of) landing in the serverless pool. The Marketplace exists so you can use those models without leaving Bedrock.
The problem it solves is the one that used to force teams off-platform. Before Bedrock Marketplace, reaching a specialized model that AWS did not offer serverless meant one of two awkward paths: stand the model up yourself on raw GPU instances and own the entire serving stack (autoscaling, containers, drivers, the perpetual scarcity of accelerators), or send your data to a third-party model API outside AWS and lose your data-governance boundary. Marketplace collapses both. AWS handles the container, the model artifact, and the deployment plumbing; you choose an instance type and click deploy. And because the model runs inside your AWS account and Region, your prompts and outputs stay within your governance boundary — governed by IAM, logged to CloudTrail, the same as any core Bedrock model.
The mental model that matters: Bedrock Marketplace widens the catalog; it does not change the front door. Discovery happens in the Bedrock console, access is governed by Bedrock and IAM, and compatible models answer through the same InvokeModel (and, where supported, Converse) APIs. What changes is underneath — how the model is served and how you pay for it. Put precisely: the core serverless catalog is "models AWS runs for everyone, billed per token"; the Marketplace is "100-plus more models AWS makes deployable into your account, billed per instance-hour." Both are Bedrock; you move between them by choosing which kind of model your task warrants.
Amazon Bedrock Marketplace = a catalog of 100+ specialized and emerging foundation models you discover in the Bedrock console and deploy to a managed endpoint in your own account, reachable through the same Bedrock APIs and the same IAM + CloudTrail governance as core serverless models — but billed per instance-hour for as long as the endpoint runs, not per token.
Almost every decision about the Marketplace comes back to one structural difference: a core serverless model is an always-on shared endpoint you pay for by the token, while a marketplace model is usually a dedicated endpoint you deploy onto an instance and pay for by the hour. Internalize this and the rest of the page follows.
With a core serverless model, there is nothing to provision. AWS runs the inference fleet; you call the model by its model ID and are billed per 1,000 input and output tokens. Capacity is elastic and shared, idle costs nothing, and a model you do not call costs you exactly zero. This is the model behind most of the core Bedrock catalog and the reason "serverless" is the default mental picture of Bedrock.
With a marketplace model served via a managed endpoint, you make a deployment decision first. You pick the model, choose an instance type (a GPU-backed accelerated-compute instance sized to the model) and an instance count, and Bedrock provisions a dedicated endpoint that loads the model and stays running until you delete it. You then invoke that endpoint through the Bedrock API. The endpoint gives you dedicated, predictable capacity — but it bills per instance-hour the entire time it is up, whether it is serving one request a minute or none at all. The economics resemble Amazon SageMaker real-time endpoints more than serverless tokens, which is the single most important thing to understand before you deploy one.
A useful nuance: the line is not perfectly binary. A subset of Marketplace models is also offered in a serverless (pay-per-token) mode, and AWS keeps broadening which models can run serverless. So the first thing to check on any marketplace model is whether it runs serverless in your Region — if it does, you may get the wider catalog without taking on endpoint economics at all. If it does not, you are trading per-token simplicity for access to a model the serverless pool lacks, and you accept the per-hour, capacity-managed billing that comes with it.
The practical consequence shows up in utilization. A per-token model is efficient at any volume, including spiky and near-zero. A per-hour endpoint is only efficient when kept reasonably busy: at high steady throughput it can be cheaper per request than tokens, but at low or bursty volume you pay for idle capacity. That is why "do I have enough sustained traffic to justify an endpoint?" is as central to the marketplace decision as "is this model better for my task?"
| Dimension | Core serverless model | Marketplace model (managed endpoint) |
|---|---|---|
| Catalog scope | Curated core providers (Claude, Llama, Mistral, Nova, Titan, Cohere, Stability, AI21, DeepSeek) | 100+ additional specialized & emerging models |
| What you provision | Nothing — shared, elastic fleet | A dedicated endpoint on an instance type you choose |
| Billing basis | Per 1,000 input / output tokens | Per instance-hour while the endpoint is running |
| Idle cost | Zero — uncalled model costs nothing | Non-zero — the endpoint bills even when idle |
| Scales well at | Any volume, including bursty / near-zero | High, steady throughput that keeps the endpoint busy |
| Time to first call | Seconds (after Model access) | Minutes (endpoint must deploy and warm) |
| Closest analogue | Pay-per-use API | A SageMaker real-time endpoint |
| API surface | InvokeModel / Converse (where supported) | InvokeModel against the endpoint; Converse for compatible models |
A hundred-plus models is only useful if you can find the right one. Discovery happens in the Bedrock console, where marketplace models are presented alongside the core catalog and filterable by the attributes that actually narrow a shortlist — provider, modality, task, and deployment option.
In the Bedrock console, the model catalog surfaces both the core providers and the Marketplace in one place, so you browse the wider set the same way you browse the core one. You filter by provider, by modality (text, vision, embeddings, image), by task or domain, and — critically — by deployment option (serverless vs. deploy-to-endpoint), so you can immediately see whether a candidate can run pay-per-token or will require an instance. Each model has a detail page with its description, the provider, supported instance types, applicable end-user license terms, and the deployment paths available to you.
Two filters do most of the work. The first is deployment option: if a model offers serverless, you can trial it for the cost of a few tokens before committing to anything; if it is endpoint-only, your evaluation has a per-hour cost the moment you deploy, so you plan the trial accordingly. The second is modality and task: a marketplace model usually earns its place by being specialized, so you are typically searching for a specific capability — a clinical or legal language model, a particular multilingual or code model, a strong reranker, a document-vision model — rather than a general-purpose chat model the core catalog already covers well.
Evaluation should be empirical, not brochure-driven. The same discipline that applies to choosing among core models applies doubly here: a specialized model's advantage is real only on the tasks it was specialized for, so confirm it on your data. Run a candidate against a representative slice of your real traffic and compare it head-to-head with the best core serverless model for the same job, weighing quality, latency, and — because an endpoint bills per hour — total cost at your expected utilization, not just per-request cost. Bedrock's model-evaluation tooling helps make that comparison structured rather than anecdotal.
Using a marketplace model that requires an endpoint is a short, repeatable path: subscribe, deploy to an endpoint on an instance you choose, invoke it through the Bedrock API, then manage (and eventually delete) the endpoint so it stops billing. Here is the flow end to end.
The whole flow is designed so the only genuinely new step versus serverless is provisioning the endpoint; access, authorization, and invocation reuse the Bedrock mechanics you already know.
From the model's detail page in the Bedrock console, you subscribe to the model and accept any provider end-user license terms. This is the equivalent of requesting Model access for a core model: a deliberate, auditable opt-in that records your acceptance of the model's license. Subscription itself does not start a meter — billing begins when you deploy an endpoint (or, for serverless-capable models, when you invoke).
You choose an instance type — a GPU-backed accelerated-compute instance sized to the model — and an instance count, then deploy. Bedrock provisions a dedicated endpoint, pulls the model container and weights, and warms it; this takes minutes, not seconds, because real hardware is being allocated and a large model is being loaded. From this moment the endpoint bills per instance-hour until you delete it. Right-sizing the instance matters: too small and the model will not fit or will be slow; too large and you over-pay every hour. The model detail page lists the supported and recommended instances.
Once the endpoint is live, you call it through the Bedrock runtime API. For models compatible with the unified schema you can use the Converse API, so the same request/response shape and tooling (system prompts, multi-turn, tool use, streaming) you use for core models carries over; otherwise you use InvokeModel with the model's own request body. The key difference from a serverless call is that you are targeting your deployed endpoint rather than a shared serverless model ID — but the SDK, the auth, and the surrounding code look the same.
Authorize callers with IAM exactly as you would for any Bedrock model, scoping permissions to the specific endpoint where useful. Monitor utilization and latency in CloudWatch, and remember the cardinal rule of endpoint economics: an endpoint you are not using is still billing you. Delete endpoints you no longer need, and for intermittent workloads consider deploying on demand and tearing down between batches rather than leaving an endpoint idle. CloudTrail records the deploy, invoke, and delete actions for audit.
A serverless model that you stop calling costs $0. A marketplace endpoint that you stop calling keeps billing per instance-hour until you delete it. The most common marketplace cost mistake is leaving an endpoint running after a trial or between bursts of traffic. Treat endpoints as live infrastructure: size them deliberately, watch utilization, and tear them down when idle.
Marketplace billing is where the platform feels least like serverless and most like running infrastructure. The dominant cost is the instance-hour, the dominant lever is utilization, and the figures below are representative as of 2026 to show relative shape — always confirm current instance options and rates on the AWS pricing pages.
The headline number for an endpoint-deployed model is the per-instance-hour rate of the accelerated-compute instance it runs on, multiplied by the number of instances, multiplied by the hours the endpoint is up. Some models also carry a separate software/usage charge from the model provider layered on top of the underlying compute, surfaced through AWS Marketplace; the model detail page makes the all-in rate explicit before you deploy. Because the meter runs on wall-clock time rather than tokens, your effective cost per request is entirely a function of how busy you keep the endpoint.
That makes the cost comparison against a core serverless model a utilization question, not a sticker-price one. At high, steady throughput, a dedicated endpoint can be cheaper per request than per-token pricing — you are buying capacity wholesale and using all of it. At low or bursty volume, the same endpoint is far more expensive than serverless, because you pay full freight for the idle hours between requests. There is a crossover point for every workload, and finding it (rough traffic profile × endpoint hourly cost vs. expected tokens × per-token price) is the core of the marketplace cost decision.
A few practices keep endpoint bills sane. Prefer a serverless deployment option when the model offers one and your volume is not high and steady — you avoid endpoint economics entirely. Right-size the instance to the model so you are neither paying for unused accelerator nor throttling throughput. Tear down idle endpoints and, for intermittent jobs, deploy-then-delete around the work. And remember the dependent costs that surround any GenAI app — the vector store behind a Knowledge Base, data transfer, logging — apply here just as they do to core Bedrock.
| Cost component | How it is charged | Driven by | Lever to control it |
|---|---|---|---|
| Instance-hours (the main cost) | Per accelerated instance, per hour, while the endpoint runs | Instance type × count × uptime | Right-size the instance; tear down when idle; deploy-then-delete for bursts |
| Provider software / usage charge | Per-hour or per-request surcharge from the model provider (some models) | The specific model's licensing | Compare all-in rate on the model detail page before deploying |
| Serverless option (where offered) | Per 1,000 input / output tokens | Tokens processed | Prefer it for low / bursty volume — no idle cost |
| Dependent services | Standard AWS rates (vector store, data transfer, logging) | Surrounding architecture | Same optimizations as any Bedrock app |
The Marketplace widens what is possible, but a wider catalog is not a reason to leave the core serverless pool. The honest rule is narrow: choose a marketplace model only when a specialized or newly-released model clears a real quality bar your core options cannot — and your traffic can keep an endpoint busy.
Default to a core serverless model for the broad middle of GenAI work: general reasoning, chat, summarization, extraction, coding, and agents, especially at bursty or modest volume where per-token billing and zero idle cost are exactly what you want. The core catalog is deliberately strong across these jobs, and the operational simplicity of serverless is a real, ongoing advantage you give up the moment you take on an endpoint.
Reach into the Marketplace when one of a few specific conditions holds. Specialization: a vertical-domain model (clinical, legal, financial, scientific) or a specific multilingual, embedding, reranking, or vision model materially outperforms every core option on your task. Recency: a just-released research model you need is not yet in the serverless pool, and the Marketplace is the fastest compliant way to use it inside AWS. Control or portability: you need a particular open model you intend to standardize on, with dedicated capacity and predictable latency. In each case, the second test still applies — you need enough sustained throughput (or a tolerance for deploy-then-delete) to make per-hour billing rational.
When neither specialization nor recency nor control is in play, the Marketplace is usually the wrong tool, because you would be taking on endpoint economics and operational overhead to run a model a core serverless option already handles well. And if the marketplace model you want offers a serverless deployment option, prefer that — you get the wider catalog without the idle-cost downside. For the broader build-vs-buy framing across the platform, see Bedrock vs SageMaker; for the full serverless catalog you are comparing against, see the Bedrock model catalog.
| Your situation | Lean core serverless | Lean marketplace (endpoint) |
|---|---|---|
| General reasoning / chat / coding / agents | Yes — the core catalog is strong here | Rarely |
| Bursty, low, or unpredictable volume | Yes — zero idle cost | No — endpoints bill while idle |
| Need a vertical-domain or specialized model | Only if a core model clears the bar | Yes — this is the marketplace's reason to exist |
| Need a brand-new model not yet serverless | No (it is not there yet) | Yes — fastest compliant in-AWS access |
| High, steady throughput on one model | Viable, but endpoint may be cheaper per request | Yes — dedicated capacity can win on unit cost |
| Want least operational overhead | Yes — nothing to provision | No — you manage an endpoint |
| Model offers a serverless option | Use it — wider catalog, no idle cost | Only if you also need dedicated capacity |
The reason to reach a specialized model through Bedrock Marketplace rather than a third-party API is governance. A marketplace model runs inside your AWS account and Region and inherits the same controls as the core catalog — so widening the model set does not widen your data-exposure surface.
Your data stays in your account and Region. A marketplace model deployed to an endpoint runs on instances inside your AWS account, in the Region you choose, and your prompts and outputs are processed there rather than being shipped to an external provider API. That is the central security advantage over calling a specialized model's own hosted endpoint outside AWS: the data-residency and data-control boundary you rely on for core Bedrock applies to the wider catalog too.
It plugs into your existing controls. Access is governed by IAM, scoped where useful to the specific endpoint and actions a service needs. Endpoints can be reached privately via VPC networking so traffic need not traverse the public internet. Every deploy, invoke, and delete is recorded in CloudTrail, and you can layer Bedrock Guardrails on top for content filtering and PII handling the same way you would for a core model. See Bedrock Guardrails for that safety layer.
Mind the differences that do come with self-served models. Two honest caveats. First, licensing: each marketplace model carries its own provider end-user license terms that you accept at subscription — read them, because permitted uses vary across models. Second, shared responsibility for capacity and patching of the model layer: a dedicated endpoint is more your infrastructure than a shared serverless model is, so monitoring its utilization, sizing, and lifecycle is on you in a way that a fully-managed serverless model is not. None of this changes the data-governance boundary; it just means an endpoint is operationally closer to running a service than to calling an API.
The point of the Marketplace is to get a specialized model without trading away governance. Because the model runs in your account and Region, under IAM + CloudTrail, behind private VPC networking, and optionally wrapped in Guardrails, a regulated team can adopt a vertical-domain or emerging model inside its existing compliance posture — the same reason it chose Bedrock over a public model endpoint in the first place. The trade you accept is operational: an endpoint is infrastructure you size and manage, and it carries the model's own license.
The 100-plus models are not random; they cluster into categories that the curated serverless core deliberately leaves room for. These are representative examples of the kinds of models the Marketplace is for — the catalog evolves continuously, so treat them as categories to look for rather than a fixed list.
Across the categories below, the pattern is the same: each is a place where a specialized or newer model can beat a general-purpose core model on a specific job. Browse the Marketplace by these categories when the core catalog under-serves the exact capability you need.
Two of these categories sit next to capabilities elsewhere in Bedrock, and it is worth keeping them distinct. If you have your own weights (a model you trained or heavily fine-tuned outside Bedrock) and want to serve them through the Bedrock API, that is Custom Model Import, not the Marketplace — the Marketplace is for third-party and provider models you discover and deploy, whereas Custom Model Import is for bringing your own. And if you mostly need to specialize a core model on your data rather than adopt a different one, Bedrock fine-tuning is often the simpler path. The Marketplace is specifically the answer to "I need a different model than the core catalog offers, and I want it inside Bedrock."
Endpoints make the marketplace powerful and make its bill less forgiving than serverless. This is where the catalog and the funding story meet.
The Marketplace inverts serverless cost in one important way: idle is expensive. A trial that left a large GPU endpoint running over a weekend, or a production endpoint sized for peak but mostly idle, can turn a modest experiment into a meaningful bill — precisely because you pay per instance-hour regardless of traffic. The levers throughout this page keep it sane: prefer a serverless deployment option when one exists, right-size the instance, tear down idle endpoints, and deploy-then-delete around intermittent work.
The other lever is funding the bill with someone else's money — specifically AWS's. AWS runs credit programs designed precisely for teams building generative AI on Bedrock, and they apply to marketplace endpoints and serverless calls alike: Activate Portfolio (up to $100K) for institutionally-funded startups, dedicated Bedrock / GenAI proof-of-concept funding ($10K–$50K) for a defined GenAI build, and the competitive Generative AI Accelerator (up to $1M) for AI-first companies. These pools are largely partner-filed and effectively invisible on the public Activate page.
This is exactly what CloudRoute does: we route you to a vetted AWS partner who files the credit application and, if you need hands, who can build the Bedrock workload with you — including choosing between a core serverless model and a marketplace endpoint, right-sizing the instance, and wiring the deploy-and-invoke flow above. Because AWS funds both the credits and the partner engagement, you pay $0. See AWS credits for generative-AI startups, AWS PoC / Bedrock POC funding, and $100K AWS credits.
The fastest way to use the Marketplace well is to map your job onto the two serving paths and pick the one whose economics and catalog fit. Cost shorthand: serverless is per-token with zero idle cost; an endpoint is per instance-hour whether busy or not. Exact rates live on the AWS pricing pages.
| Path | What you get | Billing | Reach for it when | Avoid it when |
|---|---|---|---|---|
| Core serverless model | Curated core providers, fully managed, nothing to provision | Per 1,000 input / output tokens | General reasoning, chat, coding, agents; bursty or modest volume; least overhead | A specialized model genuinely beats every core option on your task |
| Marketplace — serverless option | A wider-catalog model that also offers pay-per-token | Per 1,000 input / output tokens | You need a marketplace model but volume is low / bursty — best of both | The model only runs as an endpoint (then weigh utilization) |
| Marketplace — managed endpoint | 100+ specialized & emerging models, dedicated capacity | Per instance-hour while running | Specialized / vertical / brand-new model at high steady throughput | Bursty, low volume, or you want zero idle cost |
| Custom model import | Serve your own weights through the Bedrock API | Per instance-hour (capacity-based) | You bring a model you trained or heavily fine-tuned yourself | You just need a different third-party model (use the Marketplace) |
Situation: The team had benchmarked core serverless models and found a clinical-domain language model from the Marketplace meaningfully more accurate on their medical-terminology extraction — but they had never run a managed endpoint, had no GPU experience, and were rightly worried about (a) HIPAA-grade data residency if they used a third-party clinical-model API outside AWS, and (b) a runaway bill from leaving a GPU endpoint idle. They also needed retrieval over a large corpus of guidelines and had no ML infrastructure or budget for it.
What CloudRoute did: Routed within 19 hours to a US-East AWS partner with a healthcare + Bedrock track record. The partner kept the architecture inside Bedrock: the clinical Marketplace model deployed to a right-sized managed endpoint in the team's account for the specialized extraction, a core serverless model (via the Converse API) for general summarization and routing so only the extraction calls hit the per-hour endpoint, Titan embeddings behind a Knowledge Base for guideline retrieval, and a Guardrail for PII handling — all data resident in-Region, nothing sent to an external model API. They set the extraction endpoint to deploy-then-delete around nightly batch runs to avoid idle-hour cost, and used model evaluation to confirm the domain model's edge on real notes. In parallel the partner filed a Bedrock/GenAI proof-of-concept credit application and an Activate Portfolio application.
Outcome: GenAI POC credits ($25K) approved in under 2 weeks, Portfolio ($100K) shortly after — the first ~6 months of inference, including the endpoint hours, were fully credit-funded. Routing only the specialized extraction to the endpoint (and tearing it down between batches) kept endpoint spend to a fraction of a naive always-on deployment, and the product shipped in 6 weeks with all data resident in-Region. CloudRoute's commission was paid by the partner from AWS engagement funding; the customer paid $0.
time-to-match: < 24h · credits secured: $125K · specialized model in-account & in-Region · endpoint run only for nightly batches · cost to customer: $0
CloudRoute routes you to a vetted AWS partner who files your Bedrock/GenAI credit application (Activate Portfolio up to $100K, GenAI POC $10K–$50K, GenAI Accelerator up to $1M) and, if you need hands, decides between a core serverless model and a marketplace endpoint, right-sizes the instance, and wires the deploy-and-invoke flow with you. AWS funds the credits and the engagement. You pay $0.