for AWS partners →Run Bedrock on AWS credits →

bedrock vs openai cost · 2026

Bedrock vs OpenAI cost — the real per-token math (2026).

A cost-deep, neutral comparison of running generative AI on Amazon Bedrock versus the OpenAI API. We map comparable models (Claude Sonnet/Haiku and Amazon Nova against GPT tiers), put representative per-1M-token rates in side-by-side tables, work three total-cost scenarios end to end — a support chatbot, a RAG knowledge assistant, and an overnight batch job — compare the discount levers (Batch, prompt caching, committed/provisioned capacity) on each side, surface the hidden costs people forget, and cover the one structural cost difference that is easy to miss: Bedrock spend is creditable with AWS credits, OpenAI API spend generally is not. All rates are representative of 2026 — confirm live prices before you standardize.

Run Bedrock on AWS credits →→ jump to the cost table

cost driver #1

model choice

biggest lever

caching + batch

per-tier gap

small

creditable?

Bedrock only

TL;DR

At a fixed model tier, Amazon Bedrock and the OpenAI API land in a broadly similar per-token ballpark — both bill per input/output token, so the platform itself is rarely the deciding cost factor. What actually moves the bill by 10–20× is which model tier you run (a small model can be one to two orders of magnitude cheaper than a frontier one) and how aggressively you trim tokens with prompt caching, RAG, and batch.
The honest comparison is model-to-model: map Claude Haiku / Amazon Nova Lite against a small GPT tier, Claude Sonnet against a mid/flagship GPT tier, and price the specific models you would actually use at your real token volumes. Both platforms offer the same shape of discounts — a ~50% batch tier and prompt caching on each, plus Bedrock Provisioned Throughput and OpenAI committed/scale options for steady high volume.
One cost difference is structural rather than per-token: Amazon Bedrock usage is AWS spend, so it is fundable with AWS credits — Activate up to $100K, a Bedrock/GenAI PoC pool of $10K–$50K, the GenAI Accelerator up to $1M — whereas OpenAI API spend is billed by OpenAI and is not covered by AWS credits. For an eligible startup that effectively makes early Bedrock inference $0 out of pocket. CloudRoute routes you to a vetted AWS partner and gets the credits; customer pays $0, AWS funds it.

how the bill is built

IHow each platform charges — the shape of the bill

Before any numbers, get the billing model straight, because the two are more similar than most "X vs Y pricing" posts admit. Both Amazon Bedrock and the OpenAI API are usage-priced per token, metered separately for input and output, with the rate set by the specific model — so the structure is comparable and the differences live in model choice, discount levers, and a handful of surrounding costs.

Both bill per token, input and output priced separately. You pay for tokens in (your prompt: system instructions, retrieved context, conversation history, the user message) and tokens out (the model's response), at a per-model rate usually quoted per 1,000 or per 1,000,000 tokens. Output tokens are typically priced higher than input tokens on both platforms — often 3–5× — which is why long, verbose responses cost disproportionately more than long prompts.

On-demand is the default on both. Call the API, pay for what you used, no commitment. Amazon Bedrock calls this On-Demand; OpenAI's standard API is the same pay-as-you-go shape. This is where almost everyone starts, and where the per-token rates below apply.

Both add the same families of discount lever. A batch tier for non-urgent, asynchronous work at roughly half the on-demand rate (Bedrock Batch ~50% off; OpenAI's Batch API similarly discounted). Prompt caching to avoid re-billing the same repeated context (long system prompts, fixed instructions, reused documents) on every call — available on both. And a steady-capacity option for predictable high volume — Bedrock Provisioned Throughput (reserve model units, billed hourly/monthly) and OpenAI's committed/scale arrangements.

Where the bills diverge is mostly outside the per-token rate. Bedrock, being an AWS service, can carry adjacent AWS charges for the architecture around the model (Knowledge Bases storage and the vector store, fine-tuning/customization and custom-model storage, the Lambda/compute and data transfer in your app). OpenAI bundles more of its higher-level features into its own pricing and adds its own line items (for example, hosted retrieval/file storage and certain tool calls). Section VI breaks these "hidden costs" down on both sides. And one difference is not a fee at all but who you pay — which, as Section VII shows, is what makes Bedrock creditable and OpenAI not.

the one-sentence cost model

On either platform, monthly cost ≈ (input tokens × input rate) + (output tokens × output rate) for the model you pick, minus what caching and batch remove, plus the surrounding architecture (storage, retrieval, compute, transfer). Get the model tier and the token volume right and you have ~90% of the bill.

comparable models

IIMapping comparable models: Claude and Nova vs GPT tiers

You cannot compare "Bedrock vs OpenAI" on cost without first deciding which models you are actually putting head to head — comparing a frontier model on one side to a small model on the other is the most common way these analyses go wrong. The fair method is to line up models by capability tier and price like-for-like.

On the OpenAI side you have a single provider's ladder: a small/cheap tier for high-throughput and simple tasks, a mid/standard tier that balances cost and capability, and a flagship/reasoning tier for the hardest work, plus specialized embedding, image, and audio models. On the Bedrock side you have many providers' ladders in one catalog — the relevant ones for a cost comparison are usually Anthropic's Claude family (a fast/cheap Haiku-class tier and a balanced Sonnet-class tier, with a top-end Opus-class tier for the hardest tasks) and Amazon's Nova family (Micro / Lite / Pro — explicitly positioned as low-cost, low-latency), alongside open-weight options like Llama and Mistral.

The practical mapping most teams use: put a small GPT tier against Claude Haiku or Amazon Nova Lite/Micro; put a mid/standard GPT tier against Claude Sonnet; and put a flagship/reasoning GPT tier against Claude Opus-class. Nova Micro and the smallest open-weight models often sit a notch below the cheapest GPT tier on price, which is exactly why high-volume, latency-sensitive workloads frequently route to Nova on Bedrock.

The candid caveat: the frontier ranking changes with every release, and "comparable" means comparable on your evaluation set, not on a leaderboard. The right way to decide is to take the two or three models you would realistically deploy, run them on your own prompts and quality bar, and only then compare their token rates. The table below lines the tiers up so you know which rates to put next to which.

capability-tier mapping · OpenAI ladder vs Bedrock models · 2026 (for like-for-like pricing)

Capability tier	OpenAI (single provider)	Bedrock — Anthropic Claude	Bedrock — Amazon Nova	Typical use
Ultra-low-cost / high-volume	Smallest GPT tier	—	Nova Micro	Classification, routing, simple extraction at scale
Small / efficient	Small GPT tier	Claude Haiku	Nova Lite	Chat, summarization, RAG answer-gen, high QPS
Mid / balanced	Mid / standard GPT tier	Claude Sonnet	Nova Pro	Most production assistants, agents, complex RAG
Flagship / hardest reasoning	Flagship / reasoning GPT tier	Claude Opus-class	—	Hard reasoning, code, multi-step analysis
Embeddings (retrieval)	OpenAI embedding model	via Cohere / Titan on Bedrock	Titan Embeddings	Vectorizing docs for RAG

Mapping is by capability band for fair cost comparison, not an exact-equivalence claim — verify current model quality on your own eval set. Bedrock also offers Llama, Mistral, AI21, Stability, and DeepSeek; the Claude and Nova rows are the ones most often weighed directly against GPT tiers on cost.

per-token rates

IIIPer-token price tables (representative 2026 rates)

Here are representative per-1M-token rates by tier, so you can see the shape of the comparison. These are illustrative figures for reasoning about cost — NOT live quotes. Prices in this category change frequently; confirm current per-model rates on the AWS Bedrock pricing page and the OpenAI pricing page before you commit.

Read these as ranges, not gospel. The headline pattern that holds across the category: output tokens cost several times more than input tokens, and each step up the capability ladder multiplies the rate — often 4–6× per step. The gap between Bedrock and OpenAI at the same tier is usually small relative to the gap between tiers, which is the whole point: you save far more by right-sizing the model than by switching platforms.

The two tables below give on-demand rates by tier, then the discounted rates once you apply the two biggest levers (batch and caching). Use them to bracket your own numbers, then plug your real token volumes into the worked scenarios in Section IV.

On-demand rates by tier (representative)

These representative on-demand rates show the tier-to-tier multipliers that dominate the bill. Within a tier, Bedrock (e.g., a Claude or Nova model) and OpenAI (the comparable GPT tier) sit close enough that model choice and token discipline matter far more than the platform label.

representative on-demand rates · $ per 1M tokens · 2026 illustrative, NOT live quotes

Capability tier	Bedrock input $/1M	Bedrock output $/1M	OpenAI input $/1M	OpenAI output $/1M	Notes
Ultra-low-cost (Nova Micro / smallest GPT)	~$0.04	~$0.14	~$0.10	~$0.40	Cheapest tier; Nova Micro often undercuts
Small / efficient (Haiku / Nova Lite / small GPT)	~$0.25	~$1.25	~$0.15	~$0.60	High-volume workhorse tier
Mid / balanced (Sonnet / Nova Pro / mid GPT)	~$3.00	~$15.00	~$2.50	~$10.00	Most production assistants live here
Flagship / reasoning (Opus-class / flagship GPT)	~$15.00	~$75.00	~$10.00	~$40.00	Reserve for the hardest tasks
Embeddings ($/1M input)	~$0.10	n/a	~$0.13	n/a	One-time per doc + per query

ILLUSTRATIVE 2026 placeholders to show tier multipliers and rough parity — not current prices, and exact figures vary by model, version, and region. Confirm on the AWS Bedrock and OpenAI pricing pages. The cross-platform gap within a tier is small next to the 4–6× jump between tiers.

discount levers

IVThe discount levers: batch, caching, and committed capacity

On-demand rates are the sticker price. What you actually pay depends on three levers, and both platforms offer all three in similar shape — so the question is less "which platform discounts" and more "which levers fit your workload." Applied well, these cut a real bill by 50–90% without changing the model.

Batch (~50% off) — for anything not real-time. If a job can tolerate minutes-to-hours of latency (overnight document processing, bulk classification, dataset labeling, embeddings backfills), submit it through the batch path and pay roughly half the on-demand rate. Amazon Bedrock offers Batch inference at ~50% off on-demand; OpenAI offers a Batch API at a comparable discount with a turnaround window. For batch-heavy workloads this is the single largest lever and it is near-parity across the two.

Prompt caching — for repeated context. If every call repeats the same large prefix — a long system prompt, fixed few-shot examples, a reused policy document, a stable RAG context — caching lets you avoid paying full input price for those repeated tokens on subsequent calls. Both Bedrock and OpenAI support prompt caching; the savings scale with how much of your input is repeated and how often. For chat assistants with big static system prompts, caching commonly removes a large fraction of input cost. (CloudRoute's sibling reference page on Bedrock prompt caching goes deeper on the mechanics.)

Committed / provisioned capacity — for steady high volume. When throughput is large and predictable, reserving capacity beats per-token. Amazon Bedrock Provisioned Throughput reserves model units billed hourly or monthly (with term commitments lowering the effective rate); OpenAI offers committed/scale arrangements for high-volume customers. These only pay off above a utilization threshold — below it, on-demand is cheaper — so model your real duty cycle before committing. This lever is most relevant to mature, high-traffic products, not early-stage ones.

cost-lever availability · Amazon Bedrock vs OpenAI API · 2026

Cost lever	What it does	Amazon Bedrock	OpenAI API	Best for
On-demand	Pay per token, no commitment	Yes (default)	Yes (default)	Variable / early-stage traffic
Batch (~50% off)	Async jobs at ~half rate	Yes — Batch inference	Yes — Batch API	Overnight / bulk / non-urgent
Prompt caching	Stop re-billing repeated context	Yes	Yes	Big static system prompts, reused docs
Committed / provisioned	Reserve capacity for steady load	Yes — Provisioned Throughput	Committed / scale tiers	High, predictable volume
Model right-sizing	Route each task to cheapest adequate model	Easier (many models, one API)	Within OpenAI lineup	Every workload

Both platforms offer the same families of lever; the cross-platform difference is small. Bedrock's multi-provider catalog makes per-task model right-sizing (the biggest lever of all) easier; OpenAI lets you right-size within its own ladder.

worked total cost

VThree worked total-cost scenarios at scale

Per-token rates only become decisions when you push real volume through them. Below are three end-to-end worked examples — a support chatbot, a RAG knowledge assistant, and an overnight batch job — using the representative mid/small rates from Section III. The arithmetic is the same on both platforms; what changes is the model you pick and how you apply the levers. Numbers are illustrative, to show the method.

For each scenario the recipe is identical: estimate monthly input and output tokens, multiply by the per-1M rate for the chosen tier, then subtract what batch or caching removes. Do this on the specific models you would actually deploy and you have a defensible budget on either platform.

Scenario A — Support chatbot (real-time, 100K conversations/month)

Assume 100,000 conversations/month, each averaging 2,000 input tokens (system prompt + a little context + user turns) and 500 output tokens. That is 200M input + 50M output tokens/month. On a small/efficient tier at ~$0.25 input / ~$1.25 output per 1M (Claude Haiku / Nova Lite class), the bill is (200 × $0.25) + (50 × $1.25) = $50 + $62.50 = ~$113/month. Move to a mid tier at ~$3 / ~$15 and the same traffic is (200 × $3) + (50 × $15) = $600 + $750 = ~$1,350/month — roughly 12× more for the same conversations. Now apply prompt caching to the big static system prompt: if it cuts effective input cost by ~60%, the mid-tier input drops from $600 toward ~$240, taking the bill to ~$990/month. The platform barely matters here; tier and caching dominate.

Scenario B — RAG knowledge assistant (retrieval + generation)

A RAG assistant has two cost components people often forget to add together: embeddings (a one-time cost to vectorize your corpus, plus a tiny cost per query to embed the question) and generation (the answer, which now includes retrieved context in the input). Say a 5M-token corpus embedded once at ~$0.10/1M = ~$0.50 one-time (trivial), then 50,000 queries/month, each pulling 3,000 tokens of retrieved context + 300 token question as input and producing 400 output tokens. That is (3,300 × 50K) = 165M input + (400 × 50K) = 20M output per month. On the mid tier (~$3/~$15): (165 × $3) + (20 × $15) = $495 + $300 = ~$795/month, plus a few dollars of query-embedding cost. The retrieved-context tokens are the bulk of the bill — which is why tighter retrieval (fewer, more relevant chunks) and caching of any stable context are the highest-leverage cost moves in RAG. On Bedrock you can also lean on managed Knowledge Bases (with its own vector-store/storage cost) instead of self-hosting; on OpenAI you can use hosted retrieval (with its own storage line item). Either way, the generation token math above is what dominates.

Scenario C — Overnight batch job (asynchronous, bulk)

Batch is where the cheapest total cost lives. Say you must process 2M documents/month (classification + extraction), each 1,500 input tokens and 200 output tokens: 3,000M (3B) input + 400M output. On an ultra-low-cost tier at ~$0.04 input / ~$0.14 output per 1M (Nova Micro / smallest GPT class), on-demand is (3,000 × $0.04) + (400 × $0.14) = $120 + $56 = ~$176/month. Run it through the batch path at ~50% off and it falls to ~$88/month for 2M documents. The lesson: pairing the cheapest adequate model with the batch tier compounds two levers, and it is near-identical on both platforms — Bedrock Batch and the OpenAI Batch API both roughly halve the rate. If the same job ran on a mid tier on-demand instead, it would be 30–40× more expensive, which is the recurring theme of this whole page.

what the three scenarios prove

Across chatbot, RAG, and batch, the cross-platform difference at a fixed tier is in the single-digit-percent range, while model tier swings cost 10–40× and batch + caching cut 50–90% off what is left. Decide Bedrock vs OpenAI on fit (governance, model choice, ecosystem, and — below — creditability), not on a per-token price war you will mostly tie.

the costs people forget

VIHidden costs on each side

The per-token rate is the visible number; a real bill has more line items. These are the costs that surprise teams after they ship — listed honestly for both platforms so a budget reflects total cost of ownership, not just the model rate.

Output-token blowout (both) — Output is priced several times higher than input, and verbose models or unbounded responses quietly multiply cost. Capping max output tokens and prompting for concise answers is a real, free saving on either platform.
Retrieved context in RAG (both) — Every RAG call re-pays input cost for the chunks you stuff in. Loose retrieval (too many chunks, whole documents) is one of the largest avoidable costs; tight retrieval plus caching of stable context cuts it sharply.
Bedrock — surrounding AWS services — Knowledge Bases adds vector-store + storage cost; fine-tuning/customization adds training + custom-model storage (and, for Provisioned Throughput, an hourly reservation); your app's Lambda/EC2 compute, S3, and data transfer are separate AWS line items. These are normal AWS charges, but they belong in the total.
OpenAI — platform line items — Higher-level features carry their own charges: hosted retrieval/file storage, certain built-in tools, fine-tuning (training + usage), and realtime/audio modalities priced separately from text. Bundled convenience, but still line items to budget.
Egress / networking (mostly Bedrock-adjacent) — Calling a model inside your AWS region over private networking avoids public-internet egress; cross-region or external calls can add data-transfer cost. Co-locating inference with your app is a small but real saving.
Idle provisioned/committed capacity (both) — Reserved capacity (Bedrock Provisioned Throughput, OpenAI committed tiers) is billed whether or not you use it. Under-utilized reservations are a classic overspend — model the duty cycle before committing.
Re-tuning and evaluation (both, one-time) — Switching models or platforms costs engineering time to re-tune prompts and re-run evals. It is a one-time cost, but it is real and worth naming when comparing a migration's payback.

the structural cost difference

VIIThe cost difference that is not per-token: creditability

Everything above is roughly symmetric — both platforms tie at a fixed tier and offer the same levers. There is one asymmetry that does not show up in any rate card and can dominate the real out-of-pocket cost for an early-stage company: who you pay, and therefore whether credits apply.

Amazon Bedrock usage is AWS spend. Bedrock is an AWS service billed on your AWS account, so Bedrock token costs draw down AWS credits exactly like EC2, S3, or RDS. If you hold an AWS Activate credit balance (up to $100K), a Bedrock/GenAI PoC pool ($10K–$50K), or a GenAI Accelerator award (up to $1M), your Bedrock inference is paid from those credits until they are exhausted — which for an eligible startup can make early GenAI inference effectively $0 out of pocket.

OpenAI API usage is billed by OpenAI. It is a separate vendor relationship paid on OpenAI's own billing, so AWS credits do not apply to it. Your OpenAI spend is real cash from day one regardless of any AWS credit balance you hold. (OpenAI runs its own startup/credit programs from time to time, but those are separate from, and not stackable with, AWS credits.)

Why this can outweigh the per-token comparison. Suppose your GenAI inference will run ~$1,500/month in year one. On OpenAI that is ~$18K of real cash over the year. On Bedrock, funded by an Activate balance, the same workload can be $0 out of pocket until the credits run out — at which point you are paying Bedrock's (competitive, per-tier-comparable) rates anyway. So even if a specific OpenAI model were marginally cheaper per token, the creditable path can be dramatically cheaper in actual cash for the period that matters most to a young company. This is the cost lever that the rate cards never show.

The honest framing: creditability is a timing-and-cash advantage, not a permanent per-token discount — once credits are spent, Bedrock is priced on its merits like any platform. But for a startup deciding where to run inference in the first 12–24 months, "fundable with AWS credits" is frequently the largest single factor in total cost.

the creditability summary

Bedrock spend = AWS spend = creditable with AWS credits (Activate up to $100K · Bedrock/GenAI PoC $10K–$50K · GenAI Accelerator up to $1M). OpenAI API spend is billed by OpenAI and is not covered by AWS credits. For an eligible startup, that can make early Bedrock inference $0 out of pocket while OpenAI is cash from day one. CloudRoute routes you to a vetted AWS partner and gets the credits — customer pays $0; AWS funds it.

the cost verdict

VIIICost verdict — and how to switch on credits

A cost comparison has to end with a plain answer. Here it is, then the practical path if the answer points you to Bedrock.

The verdict, without hedging

On raw per-token cost at a fixed capability tier, Bedrock and OpenAI are close enough to call a tie — single-digit-percent differences that flip model to model and change with every release, so do not pick a platform on the rate card alone. The two factors that genuinely move money are (1) right-sizing the model (10–40× swings — and Bedrock's multi-provider catalog, with Claude and Nova next to open-weight options, makes per-task right-sizing easier) and (2) applying batch + caching (50–90% off, near-parity on both). The factor that can dominate real cash for an eligible startup is (3) creditability — Bedrock spend is fundable with AWS credits; OpenAI spend is not.

So: if you are an early-stage or AWS-aligned team and you qualify for AWS credits, Bedrock is usually the cheaper place to run inference in practice — not because its tokens are inherently cheaper, but because credits can take your out-of-pocket cost to $0 for the period that matters, on top of per-tier parity and easy model right-sizing. If you have no AWS credit path and your team is deeply standardized on OpenAI's lineup, the per-token economics are close enough that cost alone will not force a move — decide on fit. (Our general, non-cost decision page weighs governance, privacy, regions, ecosystem, and lock-in.)

If Bedrock wins on cost — the switch

Moving (or adding) inference to Bedrock to capture per-tier parity, easy right-sizing, and credit funding is a well-trodden, usually-modest migration: enable the target models (e.g., Claude Sonnet/Haiku or Nova) in your AWS regions; swap the OpenAI client for Bedrock's Converse API (the concepts map closely); re-tune prompts and re-run your eval set; map any hosted retrieval to Knowledge Bases or keep your own RAG; then A/B on real traffic and cut over. The one-time re-tuning cost is real but small against the ongoing savings — especially once credits are funding the bill.

how CloudRoute fits the cost switch

If the cost math points you to Bedrock, CloudRoute routes you to a vetted AWS partner who has done OpenAI → Bedrock migrations and gets AWS credits to fund both the migration and the inference (Activate up to $100K, Bedrock/GenAI PoC $10K–$50K, GenAI Accelerator up to $1M). The partner handles model enablement, the API swap, prompt re-tuning, and right-sizing the model per workload to minimize spend. Customer pays $0 — AWS funds the engagement and the partner pays CloudRoute the routing commission.

side by side

Bedrock vs OpenAI on cost — the decision table

One scannable view of the dimensions that actually move a GenAI bill. Rates and model lists are representative of 2026 — confirm on each vendor's live pricing pages; this category moves fast.

Cost dimension	Amazon Bedrock	OpenAI API	Who wins
Billing model	Per token (input/output), per model	Per token (input/output), per model	Tie — same shape
Per-token rate at a fixed tier	Competitive (Claude / Nova / open-weight)	Competitive (GPT ladder)	Tie (single-digit-% gaps)
Cheapest small/volume tier	Nova Micro / Lite often undercut	Smallest GPT tier	Often Bedrock (Nova)
Model right-sizing (biggest lever)	Many providers behind one API	Within OpenAI lineup	Bedrock (more choice)
Batch discount (~50%)	Yes — Batch inference	Yes — Batch API	Tie
Prompt caching	Yes	Yes	Tie
Committed / reserved capacity	Provisioned Throughput	Committed / scale tiers	Tie
Surrounding/hidden costs	Adjacent AWS services (KB, storage, transfer)	Platform line items (retrieval, tools)	Tie — both have them
Fundable with AWS credits	Yes — it is AWS spend	No — billed by OpenAI	Bedrock (structural)
Out-of-pocket for credit-eligible startup	Can be $0 until credits exhaust	Cash from day one	Bedrock
Cheapest in practice	Usually, when credits apply + right-sized	Competitive without a credit path	Depends on credit eligibility

Representative as of 2026; verify per-model rates, batch/caching specifics, and program eligibility on the AWS Bedrock and OpenAI pricing pages. The recurring theme: per-tier cost is a near-tie, so model right-sizing, batch+caching, and credit eligibility decide the real bill.

cost pointing you to Bedrock?

Run Bedrock on AWS credits — vetted partner, $0 to you

Get matched in 24h →

a recent match

A cost-driven OpenAI → Bedrock move — anonymized

inquiry · seed+ AI-native SaaS, 16 people, US, ~$1.8K/mo OpenAI spend

Seed-plus AI-native B2B SaaS, ~16 people, AWS-native backend, ran all inference on the OpenAI API

Situation: Their product (an AI research assistant with chat + RAG + nightly bulk enrichment) worked well on OpenAI, but inference had grown to ~$1,800/month of real cash and was rising with usage — a meaningful line item at their stage. Two things bothered the founders: most of that spend was on a mid/flagship tier for tasks a cheaper model could handle, and none of it was creditable because it was billed by OpenAI even though their whole stack was on AWS with an unused Activate eligibility. They wanted lower spend without losing quality, and they wanted the bill to draw on AWS credits.

What CloudRoute did: CloudRoute routed them within 24 hours to a US AWS Advanced partner experienced in cost-focused OpenAI → Bedrock migrations. The partner mapped each workload to the cheapest adequate model — chat and RAG answer-gen to Claude Haiku / Nova Lite, the hard-reasoning path kept on Claude Sonnet, and the nightly enrichment moved to the cheapest tier run through Bedrock Batch (~50% off). They swapped the OpenAI client for the Converse API, added prompt caching on the large static system prompt, and re-ran the eval set to confirm quality held. In parallel they filed an AWS Activate application plus a Bedrock/GenAI PoC credit request so the new Bedrock spend would be credit-funded.

Outcome: Right-sizing plus batch and caching cut the modeled monthly inference cost by roughly two-thirds versus the old all-mid-tier OpenAI setup, and the remaining Bedrock spend was paid from AWS credits — taking out-of-pocket inference toward $0 for the credit-funded period. Quality held on the eval set after prompt re-tuning. CloudRoute's commission was paid by the partner from AWS engagement funding — the customer paid $0 for the routing.

engagement window: ~4 weeks · eng time: ~14 hours · modeled cost cut: ~2/3 · credits secured: Activate + GenAI PoC · cost to customer: $0

faq

Common questions

Is Amazon Bedrock cheaper than the OpenAI API?

At a fixed capability tier the two are close — both bill per input/output token, so platform-to-platform differences are usually single-digit-percent and flip from model to model. What actually decides the bill is which model tier you run (a small model can be 10–20× cheaper per token than a flagship one) and how you apply batch (~50% off) and prompt caching. There is one structural difference, though: Bedrock spend is AWS spend, so it is fundable with AWS credits, while OpenAI API spend is not. For a credit-eligible startup that can make Bedrock dramatically cheaper in actual out-of-pocket cash even when per-token rates are comparable. Price the specific models you would deploy at your real volumes on each vendor's current pricing page.

How do I compare Claude or Nova on Bedrock against GPT on cost?

Compare by capability tier, like-for-like. Map a small GPT tier against Claude Haiku or Amazon Nova Lite/Micro, a mid/standard GPT tier against Claude Sonnet (or Nova Pro), and a flagship/reasoning GPT tier against a Claude Opus-class model. Then take the two or three models you would realistically deploy, run them on your own evaluation set to confirm quality is comparable, and only then put their per-1M-token rates side by side. Comparing across tiers (e.g., a frontier model on one side to a small model on the other) is the most common way these cost analyses go wrong.

What is the biggest lever to cut GenAI cost on either platform?

Model right-sizing — routing each task to the cheapest model that still meets your quality bar. Moving high-volume, simple work (chat, summarization, RAG answer-gen, classification) off a flagship model and onto a small/efficient tier typically cuts cost 10–40× for that workload. After that, apply batch (~50% off) for anything non-urgent and prompt caching for repeated context (big system prompts, reused documents), which together remove another 50–90% of what is left. Both platforms support all of these; Bedrock's multi-provider catalog makes per-task right-sizing easier because you can reach Claude, Nova, and open-weight models behind one API.

Do both Bedrock and OpenAI offer batch and prompt caching discounts?

Yes, and in similar shape. Both offer a batch tier at roughly 50% off on-demand for asynchronous, non-urgent jobs (Amazon Bedrock Batch inference and the OpenAI Batch API), and both support prompt caching to avoid re-billing repeated input context on every call. Both also offer a steady-capacity option for predictable high volume — Bedrock Provisioned Throughput (reserved model units billed hourly/monthly) and OpenAI committed/scale tiers. Because these levers are near-parity across the two platforms, they rarely decide Bedrock vs OpenAI on cost; they decide how cheaply you run on whichever you pick.

Can AWS credits pay for OpenAI API usage?

No. AWS credits (Activate, Bedrock/GenAI PoC, GenAI Accelerator) apply to AWS spend, and the OpenAI API is billed by OpenAI as a separate vendor, so AWS credits do not cover it. Amazon Bedrock, by contrast, is an AWS service billed on your AWS account, so Bedrock token costs draw down AWS credits exactly like EC2 or S3. This is why an eligible startup can run early inference on Bedrock at effectively $0 out of pocket while the same workload on OpenAI is real cash from day one. OpenAI runs its own occasional startup-credit programs, but those are separate from and not stackable with AWS credits.

What hidden costs should I budget for beyond the token rate?

On both platforms: output-token blowout (output is priced several times higher than input, so cap and tighten responses), and retrieved-context cost in RAG (loose retrieval re-pays input cost for every chunk). On Bedrock specifically: adjacent AWS charges for the architecture around the model — Knowledge Bases vector store and storage, fine-tuning training and custom-model storage, your app's compute, and data transfer. On OpenAI specifically: platform line items such as hosted retrieval/file storage, certain built-in tools, fine-tuning, and separately-priced realtime/audio modalities. And on both: idle reserved/committed capacity if you over-provision. Put these in the total cost of ownership, not just the per-token rate.

How much does it cost to run a support chatbot on Bedrock vs OpenAI?

Using illustrative rates: a chatbot at 100,000 conversations/month with ~2,000 input and ~500 output tokens each (200M input + 50M output) costs about $113/month on a small/efficient tier (~$0.25 input / ~$1.25 output per 1M) and about $1,350/month on a mid tier (~$3 / ~$15) — same traffic, ~12× from tier alone. Prompt caching on the static system prompt can shave a large fraction of the input cost. The figures are essentially the same on Bedrock and OpenAI at a comparable tier; the platform is not the cost driver, the model tier and caching are. Plug your real token volumes and current per-model rates into that formula to budget.

How does CloudRoute reduce my Bedrock vs OpenAI cost?

Two ways. First, the partner CloudRoute routes you to right-sizes each workload to the cheapest adequate model and applies batch and caching, which is where the real savings live — typically cutting a naive all-mid-tier setup by a large fraction. Second, because Bedrock is AWS spend, the partner files for AWS credits (Activate up to $100K, Bedrock/GenAI PoC $10K–$50K, GenAI Accelerator up to $1M) so the remaining Bedrock bill is credit-funded — taking out-of-pocket inference toward $0 for the funded period. You pay $0 for the routing: AWS funds the engagement and the partner pays CloudRoute a commission, so there is no invoice on your side.

Cheaper inference, funded: run Bedrock on AWS credits

If the cost math points to Bedrock — per-tier parity, easy model right-sizing, and spend that AWS credits can cover — CloudRoute routes you to a vetted AWS partner who right-sizes the workload and funds it with credits. Customer pays $0.

Get matched in 24h →→ see the data & AI persona detail

matched within< 24h

credit ceilingup to $1M

cost to you$0