amazon bedrock marketplace · the 2026 reference

Amazon Bedrock Marketplace — 100+ specialized models, one Bedrock API.

Beyond the core serverless providers, Bedrock Marketplace puts 100-plus specialized and emerging foundation models — domain-tuned, multilingual, embedding, vision, and frontier-research models — behind the same Bedrock API and tooling. The catch and the point: most are served from a managed endpoint you deploy onto an instance, not from the pay-per-token serverless pool. This is the full reference: serverless vs. marketplace, how discovery works, the deploy-and-invoke flow, endpoint/instance billing, when a marketplace model beats a core model, and the security and governance that carry across.

models in the marketplace
100+
serving mode
managed endpoint
billing basis
per instance-hour
same API & governance
Bedrock
TL;DR
  • Amazon Bedrock Marketplace is a catalog of 100-plus specialized and emerging foundation models that you reach through Bedrock — the same console, the same IAM and CloudTrail governance, the same InvokeModel/Converse surface for compatible models — but that sit beyond the core serverless providers. It is how you get a domain-tuned, multilingual, vision, embedding, or frontier-research model that is not in the always-on serverless pool.
  • The defining difference is the serving and billing model. Core Bedrock serverless models are pay-per-token with nothing to provision. Most marketplace models are deployed to a managed endpoint on an instance (or fleet of instances) that you choose, and you pay per instance-hour for as long as that endpoint is running — closer to SageMaker economics than to serverless tokens. A small subset of marketplace models is also offered serverless.
  • Reach for a marketplace model when a specialized or newly-released model genuinely outperforms a core model on your task — a vertical-domain model, a specific multilingual or embedding model, a brand-new research release — and you have steady enough traffic to keep an endpoint busy. Default to a core serverless model for general reasoning, bursty or low volume, and the least operational overhead. Endpoints bill while idle, so GenAI costs scale fast; CloudRoute routes you to AWS credits (Activate Portfolio up to $100K, Bedrock/GenAI POC $10K–$50K, GenAI Accelerator up to $1M) and vetted partners to build it — you pay $0.
the core idea

IWhat Amazon Bedrock Marketplace is — and the problem it solves

Amazon Bedrock Marketplace is a catalog inside Bedrock that gives you access to over 100 specialized and emerging foundation models — beyond the handful of core providers offered as always-on serverless endpoints — and lets you deploy and call them through the same Bedrock API, console, and security controls you already use.

The core Bedrock catalog is broad but curated: a set of provider families (Anthropic Claude, Meta Llama, Mistral, Amazon Nova and Titan, Cohere, Stability AI, AI21, DeepSeek) served as fully-managed, pay-per-token serverless endpoints. That covers the large majority of production needs. But it is not the whole frontier. There are hundreds of high-quality models in the world — vertical-domain models trained on medical, legal, financial, or scientific corpora; strong multilingual models for languages the core set under-serves; specialized embedding and reranking models; vision and document-understanding models; and brand-new research releases that appear weeks or months before (or instead of) landing in the serverless pool. The Marketplace exists so you can use those models without leaving Bedrock.

The problem it solves is the one that used to force teams off-platform. Before Bedrock Marketplace, reaching a specialized model that AWS did not offer serverless meant one of two awkward paths: stand the model up yourself on raw GPU instances and own the entire serving stack (autoscaling, containers, drivers, the perpetual scarcity of accelerators), or send your data to a third-party model API outside AWS and lose your data-governance boundary. Marketplace collapses both. AWS handles the container, the model artifact, and the deployment plumbing; you choose an instance type and click deploy. And because the model runs inside your AWS account and Region, your prompts and outputs stay within your governance boundary — governed by IAM, logged to CloudTrail, the same as any core Bedrock model.

The mental model that matters: Bedrock Marketplace widens the catalog; it does not change the front door. Discovery happens in the Bedrock console, access is governed by Bedrock and IAM, and compatible models answer through the same InvokeModel (and, where supported, Converse) APIs. What changes is underneath — how the model is served and how you pay for it. Put precisely: the core serverless catalog is "models AWS runs for everyone, billed per token"; the Marketplace is "100-plus more models AWS makes deployable into your account, billed per instance-hour." Both are Bedrock; you move between them by choosing which kind of model your task warrants.

the one-sentence definition

Amazon Bedrock Marketplace = a catalog of 100+ specialized and emerging foundation models you discover in the Bedrock console and deploy to a managed endpoint in your own account, reachable through the same Bedrock APIs and the same IAM + CloudTrail governance as core serverless models — but billed per instance-hour for as long as the endpoint runs, not per token.

the distinction that matters most

IIServerless vs. marketplace — deploy-to-an-endpoint vs. pay-per-token

Almost every decision about the Marketplace comes back to one structural difference: a core serverless model is an always-on shared endpoint you pay for by the token, while a marketplace model is usually a dedicated endpoint you deploy onto an instance and pay for by the hour. Internalize this and the rest of the page follows.

With a core serverless model, there is nothing to provision. AWS runs the inference fleet; you call the model by its model ID and are billed per 1,000 input and output tokens. Capacity is elastic and shared, idle costs nothing, and a model you do not call costs you exactly zero. This is the model behind most of the core Bedrock catalog and the reason "serverless" is the default mental picture of Bedrock.

With a marketplace model served via a managed endpoint, you make a deployment decision first. You pick the model, choose an instance type (a GPU-backed accelerated-compute instance sized to the model) and an instance count, and Bedrock provisions a dedicated endpoint that loads the model and stays running until you delete it. You then invoke that endpoint through the Bedrock API. The endpoint gives you dedicated, predictable capacity — but it bills per instance-hour the entire time it is up, whether it is serving one request a minute or none at all. The economics resemble Amazon SageMaker real-time endpoints more than serverless tokens, which is the single most important thing to understand before you deploy one.

A useful nuance: the line is not perfectly binary. A subset of Marketplace models is also offered in a serverless (pay-per-token) mode, and AWS keeps broadening which models can run serverless. So the first thing to check on any marketplace model is whether it runs serverless in your Region — if it does, you may get the wider catalog without taking on endpoint economics at all. If it does not, you are trading per-token simplicity for access to a model the serverless pool lacks, and you accept the per-hour, capacity-managed billing that comes with it.

The practical consequence shows up in utilization. A per-token model is efficient at any volume, including spiky and near-zero. A per-hour endpoint is only efficient when kept reasonably busy: at high steady throughput it can be cheaper per request than tokens, but at low or bursty volume you pay for idle capacity. That is why "do I have enough sustained traffic to justify an endpoint?" is as central to the marketplace decision as "is this model better for my task?"

core serverless model vs. marketplace (endpoint) model · representative as of 2026 — verify in the Bedrock console
DimensionCore serverless modelMarketplace model (managed endpoint)
Catalog scopeCurated core providers (Claude, Llama, Mistral, Nova, Titan, Cohere, Stability, AI21, DeepSeek)100+ additional specialized & emerging models
What you provisionNothing — shared, elastic fleetA dedicated endpoint on an instance type you choose
Billing basisPer 1,000 input / output tokensPer instance-hour while the endpoint is running
Idle costZero — uncalled model costs nothingNon-zero — the endpoint bills even when idle
Scales well atAny volume, including bursty / near-zeroHigh, steady throughput that keeps the endpoint busy
Time to first callSeconds (after Model access)Minutes (endpoint must deploy and warm)
Closest analoguePay-per-use APIA SageMaker real-time endpoint
API surfaceInvokeModel / Converse (where supported)InvokeModel against the endpoint; Converse for compatible models
A subset of marketplace models is also available serverless (pay-per-token), and AWS keeps expanding which models can run serverless — always check the model's available deployment options and Region in the Bedrock console first. Instance availability and per-hour rates vary by Region; confirm current options and prices on the AWS pricing pages.
finding the right model

IIIDiscovery — how the catalog is organized and how to evaluate a model

A hundred-plus models is only useful if you can find the right one. Discovery happens in the Bedrock console, where marketplace models are presented alongside the core catalog and filterable by the attributes that actually narrow a shortlist — provider, modality, task, and deployment option.

In the Bedrock console, the model catalog surfaces both the core providers and the Marketplace in one place, so you browse the wider set the same way you browse the core one. You filter by provider, by modality (text, vision, embeddings, image), by task or domain, and — critically — by deployment option (serverless vs. deploy-to-endpoint), so you can immediately see whether a candidate can run pay-per-token or will require an instance. Each model has a detail page with its description, the provider, supported instance types, applicable end-user license terms, and the deployment paths available to you.

Two filters do most of the work. The first is deployment option: if a model offers serverless, you can trial it for the cost of a few tokens before committing to anything; if it is endpoint-only, your evaluation has a per-hour cost the moment you deploy, so you plan the trial accordingly. The second is modality and task: a marketplace model usually earns its place by being specialized, so you are typically searching for a specific capability — a clinical or legal language model, a particular multilingual or code model, a strong reranker, a document-vision model — rather than a general-purpose chat model the core catalog already covers well.

Evaluation should be empirical, not brochure-driven. The same discipline that applies to choosing among core models applies doubly here: a specialized model's advantage is real only on the tasks it was specialized for, so confirm it on your data. Run a candidate against a representative slice of your real traffic and compare it head-to-head with the best core serverless model for the same job, weighing quality, latency, and — because an endpoint bills per hour — total cost at your expected utilization, not just per-request cost. Bedrock's model-evaluation tooling helps make that comparison structured rather than anecdotal.

  • Filter by deployment option first — Serverless-capable models can be trialed for pennies; endpoint-only models cost per hour from the moment you deploy. Knowing which you are dealing with shapes the entire evaluation plan.
  • Search by capability, not brand — Marketplace models earn their place by specialization — a vertical-domain, multilingual, embedding, reranking, or vision model. Start from the exact capability the core catalog under-serves.
  • Check supported instance types and license terms — The model detail page lists the instances it can run on (which sets your per-hour cost) and any provider end-user license terms you must accept before deploying.
  • Benchmark against the best core model for the job — A marketplace model only makes sense if it beats the strongest core serverless option on your task by enough to justify endpoint economics. Always compare, never assume.
  • Evaluate on your own representative traffic — Use Bedrock model evaluation on a real slice of your data, scoring quality, latency, and total cost at your expected utilization — not vendor benchmarks.
zero to a first completion

IVThe deploy-and-invoke flow — from catalog to a managed endpoint

Using a marketplace model that requires an endpoint is a short, repeatable path: subscribe, deploy to an endpoint on an instance you choose, invoke it through the Bedrock API, then manage (and eventually delete) the endpoint so it stops billing. Here is the flow end to end.

The whole flow is designed so the only genuinely new step versus serverless is provisioning the endpoint; access, authorization, and invocation reuse the Bedrock mechanics you already know.

Step 1 — Subscribe and accept terms

From the model's detail page in the Bedrock console, you subscribe to the model and accept any provider end-user license terms. This is the equivalent of requesting Model access for a core model: a deliberate, auditable opt-in that records your acceptance of the model's license. Subscription itself does not start a meter — billing begins when you deploy an endpoint (or, for serverless-capable models, when you invoke).

Step 2 — Deploy to a managed endpoint

You choose an instance type — a GPU-backed accelerated-compute instance sized to the model — and an instance count, then deploy. Bedrock provisions a dedicated endpoint, pulls the model container and weights, and warms it; this takes minutes, not seconds, because real hardware is being allocated and a large model is being loaded. From this moment the endpoint bills per instance-hour until you delete it. Right-sizing the instance matters: too small and the model will not fit or will be slow; too large and you over-pay every hour. The model detail page lists the supported and recommended instances.

Step 3 — Invoke through the Bedrock API

Once the endpoint is live, you call it through the Bedrock runtime API. For models compatible with the unified schema you can use the Converse API, so the same request/response shape and tooling (system prompts, multi-turn, tool use, streaming) you use for core models carries over; otherwise you use InvokeModel with the model's own request body. The key difference from a serverless call is that you are targeting your deployed endpoint rather than a shared serverless model ID — but the SDK, the auth, and the surrounding code look the same.

Step 4 — Authorize, monitor, and tear down

Authorize callers with IAM exactly as you would for any Bedrock model, scoping permissions to the specific endpoint where useful. Monitor utilization and latency in CloudWatch, and remember the cardinal rule of endpoint economics: an endpoint you are not using is still billing you. Delete endpoints you no longer need, and for intermittent workloads consider deploying on demand and tearing down between batches rather than leaving an endpoint idle. CloudTrail records the deploy, invoke, and delete actions for audit.

the rule that prevents surprise bills

A serverless model that you stop calling costs $0. A marketplace endpoint that you stop calling keeps billing per instance-hour until you delete it. The most common marketplace cost mistake is leaving an endpoint running after a trial or between bursts of traffic. Treat endpoints as live infrastructure: size them deliberately, watch utilization, and tear them down when idle.

how you actually pay

VBilling — endpoint and instance-based economics

Marketplace billing is where the platform feels least like serverless and most like running infrastructure. The dominant cost is the instance-hour, the dominant lever is utilization, and the figures below are representative as of 2026 to show relative shape — always confirm current instance options and rates on the AWS pricing pages.

The headline number for an endpoint-deployed model is the per-instance-hour rate of the accelerated-compute instance it runs on, multiplied by the number of instances, multiplied by the hours the endpoint is up. Some models also carry a separate software/usage charge from the model provider layered on top of the underlying compute, surfaced through AWS Marketplace; the model detail page makes the all-in rate explicit before you deploy. Because the meter runs on wall-clock time rather than tokens, your effective cost per request is entirely a function of how busy you keep the endpoint.

That makes the cost comparison against a core serverless model a utilization question, not a sticker-price one. At high, steady throughput, a dedicated endpoint can be cheaper per request than per-token pricing — you are buying capacity wholesale and using all of it. At low or bursty volume, the same endpoint is far more expensive than serverless, because you pay full freight for the idle hours between requests. There is a crossover point for every workload, and finding it (rough traffic profile × endpoint hourly cost vs. expected tokens × per-token price) is the core of the marketplace cost decision.

A few practices keep endpoint bills sane. Prefer a serverless deployment option when the model offers one and your volume is not high and steady — you avoid endpoint economics entirely. Right-size the instance to the model so you are neither paying for unused accelerator nor throttling throughput. Tear down idle endpoints and, for intermittent jobs, deploy-then-delete around the work. And remember the dependent costs that surround any GenAI app — the vector store behind a Knowledge Base, data transfer, logging — apply here just as they do to core Bedrock.

marketplace endpoint billing shape · illustrative 2026 ranges — verify live instance options and rates on the AWS pricing pages
Cost componentHow it is chargedDriven byLever to control it
Instance-hours (the main cost)Per accelerated instance, per hour, while the endpoint runsInstance type × count × uptimeRight-size the instance; tear down when idle; deploy-then-delete for bursts
Provider software / usage chargePer-hour or per-request surcharge from the model provider (some models)The specific model's licensingCompare all-in rate on the model detail page before deploying
Serverless option (where offered)Per 1,000 input / output tokensTokens processedPrefer it for low / bursty volume — no idle cost
Dependent servicesStandard AWS rates (vector store, data transfer, logging)Surrounding architectureSame optimizations as any Bedrock app
Endpoint pricing is per instance-hour, not per token, so cost is dominated by how busy you keep the endpoint. Instance availability, recommended sizes, and rates vary by Region and change over time — confirm current options and prices in the Bedrock console and on the AWS pricing pages. Where a model offers a serverless option, that path bills per token with zero idle cost.
the buy decision

VIWhen a marketplace model beats a core Bedrock model

The Marketplace widens what is possible, but a wider catalog is not a reason to leave the core serverless pool. The honest rule is narrow: choose a marketplace model only when a specialized or newly-released model clears a real quality bar your core options cannot — and your traffic can keep an endpoint busy.

Default to a core serverless model for the broad middle of GenAI work: general reasoning, chat, summarization, extraction, coding, and agents, especially at bursty or modest volume where per-token billing and zero idle cost are exactly what you want. The core catalog is deliberately strong across these jobs, and the operational simplicity of serverless is a real, ongoing advantage you give up the moment you take on an endpoint.

Reach into the Marketplace when one of a few specific conditions holds. Specialization: a vertical-domain model (clinical, legal, financial, scientific) or a specific multilingual, embedding, reranking, or vision model materially outperforms every core option on your task. Recency: a just-released research model you need is not yet in the serverless pool, and the Marketplace is the fastest compliant way to use it inside AWS. Control or portability: you need a particular open model you intend to standardize on, with dedicated capacity and predictable latency. In each case, the second test still applies — you need enough sustained throughput (or a tolerance for deploy-then-delete) to make per-hour billing rational.

When neither specialization nor recency nor control is in play, the Marketplace is usually the wrong tool, because you would be taking on endpoint economics and operational overhead to run a model a core serverless option already handles well. And if the marketplace model you want offers a serverless deployment option, prefer that — you get the wider catalog without the idle-cost downside. For the broader build-vs-buy framing across the platform, see Bedrock vs SageMaker; for the full serverless catalog you are comparing against, see the Bedrock model catalog.

decision guide · core serverless model vs. bedrock marketplace model
Your situationLean core serverlessLean marketplace (endpoint)
General reasoning / chat / coding / agentsYes — the core catalog is strong hereRarely
Bursty, low, or unpredictable volumeYes — zero idle costNo — endpoints bill while idle
Need a vertical-domain or specialized modelOnly if a core model clears the barYes — this is the marketplace's reason to exist
Need a brand-new model not yet serverlessNo (it is not there yet)Yes — fastest compliant in-AWS access
High, steady throughput on one modelViable, but endpoint may be cheaper per requestYes — dedicated capacity can win on unit cost
Want least operational overheadYes — nothing to provisionNo — you manage an endpoint
Model offers a serverless optionUse it — wider catalog, no idle costOnly if you also need dedicated capacity
Two tests gate every marketplace choice: (1) does a specialized or newly-released model beat the best core serverless option on your task, and (2) can your traffic keep an endpoint busy (or tolerate deploy-then-delete)? If both are not "yes," default to core serverless. Prefer a model's serverless deployment option whenever it offers one.
what carries across

VIISecurity and governance — the Bedrock boundary still holds

The reason to reach a specialized model through Bedrock Marketplace rather than a third-party API is governance. A marketplace model runs inside your AWS account and Region and inherits the same controls as the core catalog — so widening the model set does not widen your data-exposure surface.

Your data stays in your account and Region. A marketplace model deployed to an endpoint runs on instances inside your AWS account, in the Region you choose, and your prompts and outputs are processed there rather than being shipped to an external provider API. That is the central security advantage over calling a specialized model's own hosted endpoint outside AWS: the data-residency and data-control boundary you rely on for core Bedrock applies to the wider catalog too.

It plugs into your existing controls. Access is governed by IAM, scoped where useful to the specific endpoint and actions a service needs. Endpoints can be reached privately via VPC networking so traffic need not traverse the public internet. Every deploy, invoke, and delete is recorded in CloudTrail, and you can layer Bedrock Guardrails on top for content filtering and PII handling the same way you would for a core model. See Bedrock Guardrails for that safety layer.

Mind the differences that do come with self-served models. Two honest caveats. First, licensing: each marketplace model carries its own provider end-user license terms that you accept at subscription — read them, because permitted uses vary across models. Second, shared responsibility for capacity and patching of the model layer: a dedicated endpoint is more your infrastructure than a shared serverless model is, so monitoring its utilization, sizing, and lifecycle is on you in a way that a fully-managed serverless model is not. None of this changes the data-governance boundary; it just means an endpoint is operationally closer to running a service than to calling an API.

why this matters for the buy decision

The point of the Marketplace is to get a specialized model without trading away governance. Because the model runs in your account and Region, under IAM + CloudTrail, behind private VPC networking, and optionally wrapped in Guardrails, a regulated team can adopt a vertical-domain or emerging model inside its existing compliance posture — the same reason it chose Bedrock over a public model endpoint in the first place. The trade you accept is operational: an endpoint is infrastructure you size and manage, and it carries the model's own license.

what you actually find there

VIIIModel categories — examples of what the marketplace adds

The 100-plus models are not random; they cluster into categories that the curated serverless core deliberately leaves room for. These are representative examples of the kinds of models the Marketplace is for — the catalog evolves continuously, so treat them as categories to look for rather than a fixed list.

Across the categories below, the pattern is the same: each is a place where a specialized or newer model can beat a general-purpose core model on a specific job. Browse the Marketplace by these categories when the core catalog under-serves the exact capability you need.

  • Vertical-domain language models — Models pre-trained or tuned on a specific field — clinical and biomedical, legal, financial, or scientific text — that can outperform a general model on in-domain terminology, reasoning, and formatting. The classic reason to leave the serverless core.
  • Multilingual & region-specific models — Models with deep coverage of languages or scripts the core set under-serves, useful when you operate in markets a general-purpose model handles only adequately.
  • Specialized embedding & reranking models — Task- or domain-specific embedding models and rerankers that can lift retrieval quality in a RAG pipeline beyond a general embedding model — the quiet workhorses of search-shaped systems.
  • Code & developer-tooling models — Models specialized for code generation, completion, review, or a particular language ecosystem, where a code-tuned model can beat a general chat model on developer tasks.
  • Vision & document-understanding models — Image-understanding, OCR-adjacent, and document-layout models for extracting structure from scans, forms, and complex documents.
  • Frontier & emerging research models — Newly-released models — including strong open-weight reasoning models — that you want to evaluate or run inside AWS before, or instead of, they appear in the serverless pool.
  • Open-weight models you intend to standardize on — Specific open models you want to run with dedicated capacity, predictable latency, and a clear license, rather than depending on a shared serverless endpoint. For deeper customization, contrast with custom model import and fine-tuning.

Two of these categories sit next to capabilities elsewhere in Bedrock, and it is worth keeping them distinct. If you have your own weights (a model you trained or heavily fine-tuned outside Bedrock) and want to serve them through the Bedrock API, that is Custom Model Import, not the Marketplace — the Marketplace is for third-party and provider models you discover and deploy, whereas Custom Model Import is for bringing your own. And if you mostly need to specialize a core model on your data rather than adopt a different one, Bedrock fine-tuning is often the simpler path. The Marketplace is specifically the answer to "I need a different model than the core catalog offers, and I want it inside Bedrock."

funding the build

IXThe cost reality — and how AWS credits fund your Bedrock build

Endpoints make the marketplace powerful and make its bill less forgiving than serverless. This is where the catalog and the funding story meet.

The Marketplace inverts serverless cost in one important way: idle is expensive. A trial that left a large GPU endpoint running over a weekend, or a production endpoint sized for peak but mostly idle, can turn a modest experiment into a meaningful bill — precisely because you pay per instance-hour regardless of traffic. The levers throughout this page keep it sane: prefer a serverless deployment option when one exists, right-size the instance, tear down idle endpoints, and deploy-then-delete around intermittent work.

The other lever is funding the bill with someone else's money — specifically AWS's. AWS runs credit programs designed precisely for teams building generative AI on Bedrock, and they apply to marketplace endpoints and serverless calls alike: Activate Portfolio (up to $100K) for institutionally-funded startups, dedicated Bedrock / GenAI proof-of-concept funding ($10K–$50K) for a defined GenAI build, and the competitive Generative AI Accelerator (up to $1M) for AI-first companies. These pools are largely partner-filed and effectively invisible on the public Activate page.

This is exactly what CloudRoute does: we route you to a vetted AWS partner who files the credit application and, if you need hands, who can build the Bedrock workload with you — including choosing between a core serverless model and a marketplace endpoint, right-sizing the instance, and wiring the deploy-and-invoke flow above. Because AWS funds both the credits and the partner engagement, you pay $0. See AWS credits for generative-AI startups, AWS PoC / Bedrock POC funding, and $100K AWS credits.

pick the right path

Serverless core vs. marketplace endpoint — which path for which job

The fastest way to use the Marketplace well is to map your job onto the two serving paths and pick the one whose economics and catalog fit. Cost shorthand: serverless is per-token with zero idle cost; an endpoint is per instance-hour whether busy or not. Exact rates live on the AWS pricing pages.

PathWhat you getBillingReach for it whenAvoid it when
Core serverless modelCurated core providers, fully managed, nothing to provisionPer 1,000 input / output tokensGeneral reasoning, chat, coding, agents; bursty or modest volume; least overheadA specialized model genuinely beats every core option on your task
Marketplace — serverless optionA wider-catalog model that also offers pay-per-tokenPer 1,000 input / output tokensYou need a marketplace model but volume is low / bursty — best of bothThe model only runs as an endpoint (then weigh utilization)
Marketplace — managed endpoint100+ specialized & emerging models, dedicated capacityPer instance-hour while runningSpecialized / vertical / brand-new model at high steady throughputBursty, low volume, or you want zero idle cost
Custom model importServe your own weights through the Bedrock APIPer instance-hour (capacity-based)You bring a model you trained or heavily fine-tuned yourselfYou just need a different third-party model (use the Marketplace)
Two tests gate the endpoint paths: does the model beat the best core serverless option on your task, and can your traffic keep an endpoint busy (or tolerate deploy-then-delete)? Prefer a serverless deployment option whenever one exists. Custom Model Import is for your own weights; the Marketplace is for third-party and provider models. Run a Bedrock model evaluation on your own data before committing to any path.
building on bedrock?
Get AWS credits to fund your Bedrock workload — and a vetted partner to choose serverless vs. a marketplace endpoint and wire it. You pay $0.
Get matched in 24h →
a recent match

A specialized-model Bedrock build, funded by AWS credits — anonymized

inquiry · seed-stage healthcare-NLP startup, US
Seed-stage B2B healthcare SaaS, 12 people, extracting structured findings from clinical notes; HIPAA-conscious; net-new to AWS

Situation: The team had benchmarked core serverless models and found a clinical-domain language model from the Marketplace meaningfully more accurate on their medical-terminology extraction — but they had never run a managed endpoint, had no GPU experience, and were rightly worried about (a) HIPAA-grade data residency if they used a third-party clinical-model API outside AWS, and (b) a runaway bill from leaving a GPU endpoint idle. They also needed retrieval over a large corpus of guidelines and had no ML infrastructure or budget for it.

What CloudRoute did: Routed within 19 hours to a US-East AWS partner with a healthcare + Bedrock track record. The partner kept the architecture inside Bedrock: the clinical Marketplace model deployed to a right-sized managed endpoint in the team's account for the specialized extraction, a core serverless model (via the Converse API) for general summarization and routing so only the extraction calls hit the per-hour endpoint, Titan embeddings behind a Knowledge Base for guideline retrieval, and a Guardrail for PII handling — all data resident in-Region, nothing sent to an external model API. They set the extraction endpoint to deploy-then-delete around nightly batch runs to avoid idle-hour cost, and used model evaluation to confirm the domain model's edge on real notes. In parallel the partner filed a Bedrock/GenAI proof-of-concept credit application and an Activate Portfolio application.

Outcome: GenAI POC credits ($25K) approved in under 2 weeks, Portfolio ($100K) shortly after — the first ~6 months of inference, including the endpoint hours, were fully credit-funded. Routing only the specialized extraction to the endpoint (and tearing it down between batches) kept endpoint spend to a fraction of a naive always-on deployment, and the product shipped in 6 weeks with all data resident in-Region. CloudRoute's commission was paid by the partner from AWS engagement funding; the customer paid $0.

time-to-match: < 24h · credits secured: $125K · specialized model in-account & in-Region · endpoint run only for nightly batches · cost to customer: $0

faq

Common questions

What is Amazon Bedrock Marketplace?
Amazon Bedrock Marketplace is a catalog of 100-plus specialized and emerging foundation models — beyond the core serverless providers — that you discover in the Bedrock console and reach through the same Bedrock APIs and the same IAM + CloudTrail governance. Most marketplace models are deployed to a managed endpoint that runs on an instance in your own AWS account and bills per instance-hour, rather than being served pay-per-token like the core serverless catalog. A subset is also available serverless.
How is Bedrock Marketplace different from the core Bedrock models?
Two ways: catalog and serving. The core catalog is a curated set of providers (Claude, Llama, Mistral, Nova, Titan, Cohere, Stability, AI21, DeepSeek) served as always-on, pay-per-token serverless endpoints with nothing to provision and zero idle cost. The Marketplace adds 100-plus more specialized and emerging models, most of which you deploy to a dedicated managed endpoint on an instance you choose and pay for per instance-hour for as long as it runs. Same Bedrock front door and governance; different serving and billing underneath.
How are Bedrock Marketplace models priced?
For models served via a managed endpoint, you pay per instance-hour for the accelerated-compute instance(s) the endpoint runs on, for the entire time the endpoint is up — busy or idle. Some models add a provider software/usage charge on top, shown on the model detail page. Models that also offer a serverless option bill per 1,000 input/output tokens with no idle cost. Because endpoint cost is wall-clock-based, your effective cost per request depends entirely on utilization. Instance options and rates vary by Region — confirm current figures on the AWS pricing pages.
When should I use a marketplace model instead of a core Bedrock model?
Use a marketplace model only when a specialized or newly-released model clears a quality bar your core serverless options cannot — a vertical-domain (clinical, legal, financial, scientific) model, a specific multilingual, embedding, reranking, or vision model, or a brand-new research release not yet in the serverless pool — and your traffic can keep an endpoint reasonably busy (or you can deploy-then-delete around batches). For general reasoning, chat, coding, and agents at bursty or modest volume, default to a core serverless model. If the marketplace model offers a serverless option, prefer that.
Do I have to manage infrastructure to use a marketplace model?
More than with serverless, but far less than rolling your own. AWS handles the container, the model artifact, and the deployment plumbing — you choose an instance type and count and click deploy, and Bedrock provisions and warms a dedicated endpoint in minutes. From there it behaves like a SageMaker real-time endpoint: you monitor utilization and latency, right-size it, and delete it when idle. The key operational rule is that an endpoint bills per instance-hour until you delete it, so idle endpoints cost real money.
Is my data private when I use a Bedrock Marketplace model?
Yes. A marketplace model deployed to an endpoint runs on instances inside your own AWS account and Region, so your prompts and outputs are processed there rather than being sent to an external provider API — that is the core advantage over calling a specialized model's own hosted endpoint outside AWS. Access is governed by IAM, endpoints can be reached privately over your VPC, every action is logged to CloudTrail, and you can wrap the model in Bedrock Guardrails. Each model also carries its own provider license terms you accept at subscription.
Can I call a marketplace model with the same Converse API as core models?
For models compatible with the unified schema, yes — you can use the Converse API, so the same request/response shape and tooling (system prompts, multi-turn, tool use, streaming) carry over, with the main difference being that you target your deployed endpoint. Models that are not Converse-compatible use the lower-level InvokeModel API with the model's own request body. Check the model's detail page in the Bedrock console for its supported APIs and deployment options.
What is the difference between Bedrock Marketplace and Custom Model Import?
The Marketplace is for discovering and deploying third-party and provider models that AWS makes available — you pick from the catalog and deploy to an endpoint. Custom Model Import is for bringing your own weights — a model you trained or heavily fine-tuned outside Bedrock — and serving them through the Bedrock API. Use the Marketplace when you need a different existing model; use Custom Model Import when you already have the model artifact. If you mainly need to specialize a core model on your data, Bedrock fine-tuning is often the simpler path.
How can I afford to run a marketplace endpoint in production?
Two ways. First, control the bill: prefer a serverless deployment option when the model offers one, right-size the instance, tear down idle endpoints, and deploy-then-delete around intermittent work so you are not paying for idle hours. Second, fund it with AWS credits — Activate Portfolio (up to $100K), Bedrock/GenAI POC ($10K–$50K), and the GenAI Accelerator (up to $1M) apply to marketplace endpoints and serverless calls alike. CloudRoute routes you to a vetted AWS partner who files the credit application and can build the workload — choosing serverless vs. endpoint and right-sizing it — so you pay $0.

Use the right model — core or marketplace — and let AWS credits pay for it.

CloudRoute routes you to a vetted AWS partner who files your Bedrock/GenAI credit application (Activate Portfolio up to $100K, GenAI POC $10K–$50K, GenAI Accelerator up to $1M) and, if you need hands, decides between a core serverless model and a marketplace endpoint, right-sizes the instance, and wires the deploy-and-invoke flow with you. AWS funds the credits and the engagement. You pay $0.

matched within< 24h
GenAI credit ceilingup to $1M
cost to you$0
Amazon Bedrock Marketplace — 100+ models, endpoints & cost (2026) · CloudRoute