A cost-deep, neutral comparison of running generative AI on Amazon Bedrock versus the OpenAI API. We map comparable models (Claude Sonnet/Haiku and Amazon Nova against GPT tiers), put representative per-1M-token rates in side-by-side tables, work three total-cost scenarios end to end — a support chatbot, a RAG knowledge assistant, and an overnight batch job — compare the discount levers (Batch, prompt caching, committed/provisioned capacity) on each side, surface the hidden costs people forget, and cover the one structural cost difference that is easy to miss: Bedrock spend is creditable with AWS credits, OpenAI API spend generally is not. All rates are representative of 2026 — confirm live prices before you standardize.
Before any numbers, get the billing model straight, because the two are more similar than most "X vs Y pricing" posts admit. Both Amazon Bedrock and the OpenAI API are usage-priced per token, metered separately for input and output, with the rate set by the specific model — so the structure is comparable and the differences live in model choice, discount levers, and a handful of surrounding costs.
Both bill per token, input and output priced separately. You pay for tokens in (your prompt: system instructions, retrieved context, conversation history, the user message) and tokens out (the model's response), at a per-model rate usually quoted per 1,000 or per 1,000,000 tokens. Output tokens are typically priced higher than input tokens on both platforms — often 3–5× — which is why long, verbose responses cost disproportionately more than long prompts.
On-demand is the default on both. Call the API, pay for what you used, no commitment. Amazon Bedrock calls this On-Demand; OpenAI's standard API is the same pay-as-you-go shape. This is where almost everyone starts, and where the per-token rates below apply.
Both add the same families of discount lever. A batch tier for non-urgent, asynchronous work at roughly half the on-demand rate (Bedrock Batch ~50% off; OpenAI's Batch API similarly discounted). Prompt caching to avoid re-billing the same repeated context (long system prompts, fixed instructions, reused documents) on every call — available on both. And a steady-capacity option for predictable high volume — Bedrock Provisioned Throughput (reserve model units, billed hourly/monthly) and OpenAI's committed/scale arrangements.
Where the bills diverge is mostly outside the per-token rate. Bedrock, being an AWS service, can carry adjacent AWS charges for the architecture around the model (Knowledge Bases storage and the vector store, fine-tuning/customization and custom-model storage, the Lambda/compute and data transfer in your app). OpenAI bundles more of its higher-level features into its own pricing and adds its own line items (for example, hosted retrieval/file storage and certain tool calls). Section VI breaks these "hidden costs" down on both sides. And one difference is not a fee at all but who you pay — which, as Section VII shows, is what makes Bedrock creditable and OpenAI not.
On either platform, monthly cost ≈ (input tokens × input rate) + (output tokens × output rate) for the model you pick, minus what caching and batch remove, plus the surrounding architecture (storage, retrieval, compute, transfer). Get the model tier and the token volume right and you have ~90% of the bill.
You cannot compare "Bedrock vs OpenAI" on cost without first deciding which models you are actually putting head to head — comparing a frontier model on one side to a small model on the other is the most common way these analyses go wrong. The fair method is to line up models by capability tier and price like-for-like.
On the OpenAI side you have a single provider's ladder: a small/cheap tier for high-throughput and simple tasks, a mid/standard tier that balances cost and capability, and a flagship/reasoning tier for the hardest work, plus specialized embedding, image, and audio models. On the Bedrock side you have many providers' ladders in one catalog — the relevant ones for a cost comparison are usually Anthropic's Claude family (a fast/cheap Haiku-class tier and a balanced Sonnet-class tier, with a top-end Opus-class tier for the hardest tasks) and Amazon's Nova family (Micro / Lite / Pro — explicitly positioned as low-cost, low-latency), alongside open-weight options like Llama and Mistral.
The practical mapping most teams use: put a small GPT tier against Claude Haiku or Amazon Nova Lite/Micro; put a mid/standard GPT tier against Claude Sonnet; and put a flagship/reasoning GPT tier against Claude Opus-class. Nova Micro and the smallest open-weight models often sit a notch below the cheapest GPT tier on price, which is exactly why high-volume, latency-sensitive workloads frequently route to Nova on Bedrock.
The candid caveat: the frontier ranking changes with every release, and "comparable" means comparable on your evaluation set, not on a leaderboard. The right way to decide is to take the two or three models you would realistically deploy, run them on your own prompts and quality bar, and only then compare their token rates. The table below lines the tiers up so you know which rates to put next to which.
| Capability tier | OpenAI (single provider) | Bedrock — Anthropic Claude | Bedrock — Amazon Nova | Typical use |
|---|---|---|---|---|
| Ultra-low-cost / high-volume | Smallest GPT tier | — | Nova Micro | Classification, routing, simple extraction at scale |
| Small / efficient | Small GPT tier | Claude Haiku | Nova Lite | Chat, summarization, RAG answer-gen, high QPS |
| Mid / balanced | Mid / standard GPT tier | Claude Sonnet | Nova Pro | Most production assistants, agents, complex RAG |
| Flagship / hardest reasoning | Flagship / reasoning GPT tier | Claude Opus-class | — | Hard reasoning, code, multi-step analysis |
| Embeddings (retrieval) | OpenAI embedding model | via Cohere / Titan on Bedrock | Titan Embeddings | Vectorizing docs for RAG |
Here are representative per-1M-token rates by tier, so you can see the shape of the comparison. These are illustrative figures for reasoning about cost — NOT live quotes. Prices in this category change frequently; confirm current per-model rates on the AWS Bedrock pricing page and the OpenAI pricing page before you commit.
Read these as ranges, not gospel. The headline pattern that holds across the category: output tokens cost several times more than input tokens, and each step up the capability ladder multiplies the rate — often 4–6× per step. The gap between Bedrock and OpenAI at the same tier is usually small relative to the gap between tiers, which is the whole point: you save far more by right-sizing the model than by switching platforms.
The two tables below give on-demand rates by tier, then the discounted rates once you apply the two biggest levers (batch and caching). Use them to bracket your own numbers, then plug your real token volumes into the worked scenarios in Section IV.
These representative on-demand rates show the tier-to-tier multipliers that dominate the bill. Within a tier, Bedrock (e.g., a Claude or Nova model) and OpenAI (the comparable GPT tier) sit close enough that model choice and token discipline matter far more than the platform label.
| Capability tier | Bedrock input $/1M | Bedrock output $/1M | OpenAI input $/1M | OpenAI output $/1M | Notes |
|---|---|---|---|---|---|
| Ultra-low-cost (Nova Micro / smallest GPT) | ~$0.04 | ~$0.14 | ~$0.10 | ~$0.40 | Cheapest tier; Nova Micro often undercuts |
| Small / efficient (Haiku / Nova Lite / small GPT) | ~$0.25 | ~$1.25 | ~$0.15 | ~$0.60 | High-volume workhorse tier |
| Mid / balanced (Sonnet / Nova Pro / mid GPT) | ~$3.00 | ~$15.00 | ~$2.50 | ~$10.00 | Most production assistants live here |
| Flagship / reasoning (Opus-class / flagship GPT) | ~$15.00 | ~$75.00 | ~$10.00 | ~$40.00 | Reserve for the hardest tasks |
| Embeddings ($/1M input) | ~$0.10 | n/a | ~$0.13 | n/a | One-time per doc + per query |
On-demand rates are the sticker price. What you actually pay depends on three levers, and both platforms offer all three in similar shape — so the question is less "which platform discounts" and more "which levers fit your workload." Applied well, these cut a real bill by 50–90% without changing the model.
Batch (~50% off) — for anything not real-time. If a job can tolerate minutes-to-hours of latency (overnight document processing, bulk classification, dataset labeling, embeddings backfills), submit it through the batch path and pay roughly half the on-demand rate. Amazon Bedrock offers Batch inference at ~50% off on-demand; OpenAI offers a Batch API at a comparable discount with a turnaround window. For batch-heavy workloads this is the single largest lever and it is near-parity across the two.
Prompt caching — for repeated context. If every call repeats the same large prefix — a long system prompt, fixed few-shot examples, a reused policy document, a stable RAG context — caching lets you avoid paying full input price for those repeated tokens on subsequent calls. Both Bedrock and OpenAI support prompt caching; the savings scale with how much of your input is repeated and how often. For chat assistants with big static system prompts, caching commonly removes a large fraction of input cost. (CloudRoute's sibling reference page on Bedrock prompt caching goes deeper on the mechanics.)
Committed / provisioned capacity — for steady high volume. When throughput is large and predictable, reserving capacity beats per-token. Amazon Bedrock Provisioned Throughput reserves model units billed hourly or monthly (with term commitments lowering the effective rate); OpenAI offers committed/scale arrangements for high-volume customers. These only pay off above a utilization threshold — below it, on-demand is cheaper — so model your real duty cycle before committing. This lever is most relevant to mature, high-traffic products, not early-stage ones.
| Cost lever | What it does | Amazon Bedrock | OpenAI API | Best for |
|---|---|---|---|---|
| On-demand | Pay per token, no commitment | Yes (default) | Yes (default) | Variable / early-stage traffic |
| Batch (~50% off) | Async jobs at ~half rate | Yes — Batch inference | Yes — Batch API | Overnight / bulk / non-urgent |
| Prompt caching | Stop re-billing repeated context | Yes | Yes | Big static system prompts, reused docs |
| Committed / provisioned | Reserve capacity for steady load | Yes — Provisioned Throughput | Committed / scale tiers | High, predictable volume |
| Model right-sizing | Route each task to cheapest adequate model | Easier (many models, one API) | Within OpenAI lineup | Every workload |
Per-token rates only become decisions when you push real volume through them. Below are three end-to-end worked examples — a support chatbot, a RAG knowledge assistant, and an overnight batch job — using the representative mid/small rates from Section III. The arithmetic is the same on both platforms; what changes is the model you pick and how you apply the levers. Numbers are illustrative, to show the method.
For each scenario the recipe is identical: estimate monthly input and output tokens, multiply by the per-1M rate for the chosen tier, then subtract what batch or caching removes. Do this on the specific models you would actually deploy and you have a defensible budget on either platform.
Assume 100,000 conversations/month, each averaging 2,000 input tokens (system prompt + a little context + user turns) and 500 output tokens. That is 200M input + 50M output tokens/month. On a small/efficient tier at ~$0.25 input / ~$1.25 output per 1M (Claude Haiku / Nova Lite class), the bill is (200 × $0.25) + (50 × $1.25) = $50 + $62.50 = ~$113/month. Move to a mid tier at ~$3 / ~$15 and the same traffic is (200 × $3) + (50 × $15) = $600 + $750 = ~$1,350/month — roughly 12× more for the same conversations. Now apply prompt caching to the big static system prompt: if it cuts effective input cost by ~60%, the mid-tier input drops from $600 toward ~$240, taking the bill to ~$990/month. The platform barely matters here; tier and caching dominate.
A RAG assistant has two cost components people often forget to add together: embeddings (a one-time cost to vectorize your corpus, plus a tiny cost per query to embed the question) and generation (the answer, which now includes retrieved context in the input). Say a 5M-token corpus embedded once at ~$0.10/1M = ~$0.50 one-time (trivial), then 50,000 queries/month, each pulling 3,000 tokens of retrieved context + 300 token question as input and producing 400 output tokens. That is (3,300 × 50K) = 165M input + (400 × 50K) = 20M output per month. On the mid tier (~$3/~$15): (165 × $3) + (20 × $15) = $495 + $300 = ~$795/month, plus a few dollars of query-embedding cost. The retrieved-context tokens are the bulk of the bill — which is why tighter retrieval (fewer, more relevant chunks) and caching of any stable context are the highest-leverage cost moves in RAG. On Bedrock you can also lean on managed Knowledge Bases (with its own vector-store/storage cost) instead of self-hosting; on OpenAI you can use hosted retrieval (with its own storage line item). Either way, the generation token math above is what dominates.
Batch is where the cheapest total cost lives. Say you must process 2M documents/month (classification + extraction), each 1,500 input tokens and 200 output tokens: 3,000M (3B) input + 400M output. On an ultra-low-cost tier at ~$0.04 input / ~$0.14 output per 1M (Nova Micro / smallest GPT class), on-demand is (3,000 × $0.04) + (400 × $0.14) = $120 + $56 = ~$176/month. Run it through the batch path at ~50% off and it falls to ~$88/month for 2M documents. The lesson: pairing the cheapest adequate model with the batch tier compounds two levers, and it is near-identical on both platforms — Bedrock Batch and the OpenAI Batch API both roughly halve the rate. If the same job ran on a mid tier on-demand instead, it would be 30–40× more expensive, which is the recurring theme of this whole page.
Across chatbot, RAG, and batch, the cross-platform difference at a fixed tier is in the single-digit-percent range, while model tier swings cost 10–40× and batch + caching cut 50–90% off what is left. Decide Bedrock vs OpenAI on fit (governance, model choice, ecosystem, and — below — creditability), not on a per-token price war you will mostly tie.
Everything above is roughly symmetric — both platforms tie at a fixed tier and offer the same levers. There is one asymmetry that does not show up in any rate card and can dominate the real out-of-pocket cost for an early-stage company: who you pay, and therefore whether credits apply.
Amazon Bedrock usage is AWS spend. Bedrock is an AWS service billed on your AWS account, so Bedrock token costs draw down AWS credits exactly like EC2, S3, or RDS. If you hold an AWS Activate credit balance (up to $100K), a Bedrock/GenAI PoC pool ($10K–$50K), or a GenAI Accelerator award (up to $1M), your Bedrock inference is paid from those credits until they are exhausted — which for an eligible startup can make early GenAI inference effectively $0 out of pocket.
OpenAI API usage is billed by OpenAI. It is a separate vendor relationship paid on OpenAI's own billing, so AWS credits do not apply to it. Your OpenAI spend is real cash from day one regardless of any AWS credit balance you hold. (OpenAI runs its own startup/credit programs from time to time, but those are separate from, and not stackable with, AWS credits.)
Why this can outweigh the per-token comparison. Suppose your GenAI inference will run ~$1,500/month in year one. On OpenAI that is ~$18K of real cash over the year. On Bedrock, funded by an Activate balance, the same workload can be $0 out of pocket until the credits run out — at which point you are paying Bedrock's (competitive, per-tier-comparable) rates anyway. So even if a specific OpenAI model were marginally cheaper per token, the creditable path can be dramatically cheaper in actual cash for the period that matters most to a young company. This is the cost lever that the rate cards never show.
The honest framing: creditability is a timing-and-cash advantage, not a permanent per-token discount — once credits are spent, Bedrock is priced on its merits like any platform. But for a startup deciding where to run inference in the first 12–24 months, "fundable with AWS credits" is frequently the largest single factor in total cost.
Bedrock spend = AWS spend = creditable with AWS credits (Activate up to $100K · Bedrock/GenAI PoC $10K–$50K · GenAI Accelerator up to $1M). OpenAI API spend is billed by OpenAI and is not covered by AWS credits. For an eligible startup, that can make early Bedrock inference $0 out of pocket while OpenAI is cash from day one. CloudRoute routes you to a vetted AWS partner and gets the credits — customer pays $0; AWS funds it.
A cost comparison has to end with a plain answer. Here it is, then the practical path if the answer points you to Bedrock.
On raw per-token cost at a fixed capability tier, Bedrock and OpenAI are close enough to call a tie — single-digit-percent differences that flip model to model and change with every release, so do not pick a platform on the rate card alone. The two factors that genuinely move money are (1) right-sizing the model (10–40× swings — and Bedrock's multi-provider catalog, with Claude and Nova next to open-weight options, makes per-task right-sizing easier) and (2) applying batch + caching (50–90% off, near-parity on both). The factor that can dominate real cash for an eligible startup is (3) creditability — Bedrock spend is fundable with AWS credits; OpenAI spend is not.
So: if you are an early-stage or AWS-aligned team and you qualify for AWS credits, Bedrock is usually the cheaper place to run inference in practice — not because its tokens are inherently cheaper, but because credits can take your out-of-pocket cost to $0 for the period that matters, on top of per-tier parity and easy model right-sizing. If you have no AWS credit path and your team is deeply standardized on OpenAI's lineup, the per-token economics are close enough that cost alone will not force a move — decide on fit. (Our general, non-cost decision page weighs governance, privacy, regions, ecosystem, and lock-in.)
Moving (or adding) inference to Bedrock to capture per-tier parity, easy right-sizing, and credit funding is a well-trodden, usually-modest migration: enable the target models (e.g., Claude Sonnet/Haiku or Nova) in your AWS regions; swap the OpenAI client for Bedrock's Converse API (the concepts map closely); re-tune prompts and re-run your eval set; map any hosted retrieval to Knowledge Bases or keep your own RAG; then A/B on real traffic and cut over. The one-time re-tuning cost is real but small against the ongoing savings — especially once credits are funding the bill.
If the cost math points you to Bedrock, CloudRoute routes you to a vetted AWS partner who has done OpenAI → Bedrock migrations and gets AWS credits to fund both the migration and the inference (Activate up to $100K, Bedrock/GenAI PoC $10K–$50K, GenAI Accelerator up to $1M). The partner handles model enablement, the API swap, prompt re-tuning, and right-sizing the model per workload to minimize spend. Customer pays $0 — AWS funds the engagement and the partner pays CloudRoute the routing commission.
One scannable view of the dimensions that actually move a GenAI bill. Rates and model lists are representative of 2026 — confirm on each vendor's live pricing pages; this category moves fast.
| Cost dimension | Amazon Bedrock | OpenAI API | Who wins |
|---|---|---|---|
| Billing model | Per token (input/output), per model | Per token (input/output), per model | Tie — same shape |
| Per-token rate at a fixed tier | Competitive (Claude / Nova / open-weight) | Competitive (GPT ladder) | Tie (single-digit-% gaps) |
| Cheapest small/volume tier | Nova Micro / Lite often undercut | Smallest GPT tier | Often Bedrock (Nova) |
| Model right-sizing (biggest lever) | Many providers behind one API | Within OpenAI lineup | Bedrock (more choice) |
| Batch discount (~50%) | Yes — Batch inference | Yes — Batch API | Tie |
| Prompt caching | Yes | Yes | Tie |
| Committed / reserved capacity | Provisioned Throughput | Committed / scale tiers | Tie |
| Surrounding/hidden costs | Adjacent AWS services (KB, storage, transfer) | Platform line items (retrieval, tools) | Tie — both have them |
| Fundable with AWS credits | Yes — it is AWS spend | No — billed by OpenAI | Bedrock (structural) |
| Out-of-pocket for credit-eligible startup | Can be $0 until credits exhaust | Cash from day one | Bedrock |
| Cheapest in practice | Usually, when credits apply + right-sized | Competitive without a credit path | Depends on credit eligibility |
Situation: Their product (an AI research assistant with chat + RAG + nightly bulk enrichment) worked well on OpenAI, but inference had grown to ~$1,800/month of real cash and was rising with usage — a meaningful line item at their stage. Two things bothered the founders: most of that spend was on a mid/flagship tier for tasks a cheaper model could handle, and none of it was creditable because it was billed by OpenAI even though their whole stack was on AWS with an unused Activate eligibility. They wanted lower spend without losing quality, and they wanted the bill to draw on AWS credits.
What CloudRoute did: CloudRoute routed them within 24 hours to a US AWS Advanced partner experienced in cost-focused OpenAI → Bedrock migrations. The partner mapped each workload to the cheapest adequate model — chat and RAG answer-gen to Claude Haiku / Nova Lite, the hard-reasoning path kept on Claude Sonnet, and the nightly enrichment moved to the cheapest tier run through Bedrock Batch (~50% off). They swapped the OpenAI client for the Converse API, added prompt caching on the large static system prompt, and re-ran the eval set to confirm quality held. In parallel they filed an AWS Activate application plus a Bedrock/GenAI PoC credit request so the new Bedrock spend would be credit-funded.
Outcome: Right-sizing plus batch and caching cut the modeled monthly inference cost by roughly two-thirds versus the old all-mid-tier OpenAI setup, and the remaining Bedrock spend was paid from AWS credits — taking out-of-pocket inference toward $0 for the credit-funded period. Quality held on the eval set after prompt re-tuning. CloudRoute's commission was paid by the partner from AWS engagement funding — the customer paid $0 for the routing.
engagement window: ~4 weeks · eng time: ~14 hours · modeled cost cut: ~2/3 · credits secured: Activate + GenAI PoC · cost to customer: $0
If the cost math points to Bedrock — per-tier parity, easy model right-sizing, and spend that AWS credits can cover — CloudRoute routes you to a vetted AWS partner who right-sizes the workload and funds it with credits. Customer pays $0.