for AWS partners →Get AWS credits to run Jamba →

ai21 jamba on amazon bedrock · architecture, 256K context, pricing · 2026

AI21 Jamba on Amazon Bedrock — the long-context hybrid model.

A complete, neutral reference for running AI21 Labs' Jamba models on Amazon Bedrock in 2026: the hybrid SSM-Transformer (Mamba) plus Mixture-of-Experts architecture and why it changes the long-context economics; the headline 256K-token context window that is Jamba's differentiator; the model IDs (Jamba 1.5 Mini and Large) and how to enable access; per-model pricing; where Jamba is strong (long-document processing, RAG over large corpora, structured JSON output and tool use); a clear decision on when to pick Jamba versus Claude or Llama for long context; a minimal Converse API call; and how AWS credits make running Jamba $0.

Get AWS credits to run Jamba →→ jump to the pricing table

context window

256K tokens

architecture

SSM + Transformer + MoE

models

Jamba 1.5 Mini · Large

cost with credits

TL;DR

AI21 Labs' Jamba runs natively on Amazon Bedrock as one of the providers behind Bedrock's single API. Its headline differentiator is a very long 256K-token context window — among the largest on Bedrock — which makes it a natural fit for long-document processing and RAG over large corpora in a single call.
Jamba's edge comes from a hybrid architecture: it interleaves Mamba-style state-space (SSM) layers with Transformer attention layers and adds a Mixture-of-Experts (MoE), so it keeps memory use and per-token cost much flatter as context grows than a pure-Transformer model. The family on Bedrock is Jamba 1.5 Mini (fast, cheap) and Jamba 1.5 Large (more capable), both with the 256K window, native structured JSON output, and tool use.
Pricing is per-token and per-model: Mini is the low-cost tier, Large is the higher-quality tier, and the long context plus flatter cost curve is the reason to reach for Jamba over a pricier frontier model on big-context workloads. AWS credits (Activate up to $100K, Bedrock/GenAI POC $10K–$50K, GenAI Accelerator up to $1M) cover Jamba inference entirely — CloudRoute routes you to the credit pool and a vetted AWS partner, so you pay $0.

the models

IWhat AI21 Jamba is, and where it sits on Bedrock

Jamba is AI21 Labs' family of foundation models, available natively on Amazon Bedrock as one of the model providers behind Bedrock's single managed API — alongside Anthropic's Claude, Amazon's own Nova and Titan, Meta Llama, Mistral, Cohere, and others. What sets Jamba apart from almost everything else in that catalog is not a benchmark headline; it is the architecture, and the very long context window that architecture makes affordable.

AI21 Labs is an established foundation-model lab, and Jamba is its production model line built around a single thesis: that a hybrid architecture can serve very long context far more cheaply than a conventional Transformer. On Bedrock the family is offered as two members — Jamba 1.5 Mini, the smaller, faster, lower-cost model, and Jamba 1.5 Large, the larger, more capable model — and both expose the same headline feature: a 256K-token context window, among the largest available on Bedrock. That is roughly the size of a long book or several hundred pages of dense documents in one request.

Practically, that means Jamba is the model you reach for when the problem is shaped like a lot of text: a stack of contracts to compare, a quarter of support transcripts to summarize, a large codebase or specification to reason over, or a RAG application that wants to stuff many retrieved chunks into a single prompt rather than aggressively trimming them. The same job is possible on other Bedrock models, but Jamba is engineered so that the long-context case stays fast and cost-controlled rather than ballooning, which is the whole point of the next section.

Both Jamba models on Bedrock are instruction-tuned for following directions, support structured output (notably native JSON, useful for downstream parsing), and support tool use / function calling for agentic and grounded workflows. They are also multilingual. All of this is reached through the same Bedrock API surface and the same IAM/VPC controls as every other model, so adopting Jamba is an integration change, not a platform change.

One caveat, stated once and meant throughout: exact model version names, model IDs, regional availability, context-window sizes, and per-token prices all change as AI21 ships new Jamba generations and AWS updates Bedrock. The figures and identifiers here are representative as of 2026 to convey structure and relative cost. Always confirm the current model IDs in the Bedrock model catalog and current rates on the AWS Bedrock pricing page before you build or budget.

the one-line version

Jamba = a hybrid SSM-Transformer + MoE model with a 256K-token context window. Two members on Bedrock — Jamba 1.5 Mini (fast, cheap) and Jamba 1.5 Large (more capable). Reach for it when the job is long-document processing or RAG over a large corpus and you want long context without a frontier-model price.

why it is different

IIThe hybrid architecture — SSM (Mamba) + Transformer + MoE

Almost every other model in the Bedrock catalog is a pure Transformer. Jamba is not, and the difference is the reason its long context is practical rather than a headline. Understanding the architecture at a high level tells you exactly when Jamba is the right tool.

A standard Transformer uses self-attention, where every token attends to every other token. That is what makes Transformers so capable — but it is also why long context is expensive: attention cost grows roughly with the square of the sequence length, and the memory needed to hold the running state (the KV cache) grows with length too. Double the context and the attention work rises about fourfold. On very long inputs this is what makes a pure-Transformer model slow, memory-hungry, and costly — and why many models cap context well below 256K.

State-space models (SSMs) — the family popularized by the Mamba architecture — take a different approach. Instead of all-pairs attention, an SSM processes the sequence with a recurrence whose cost grows linearly with length and whose memory footprint stays roughly constant as context grows. That makes SSMs dramatically more efficient on long sequences. The trade-off is that pure SSMs can be weaker than attention at certain tasks that need precise, content-based lookup across the whole context (for example, pulling an exact fact from far back in the input).

Jamba's design is to interleave both: most layers are efficient Mamba/SSM layers, with Transformer attention layers placed at intervals to recover the precise-recall and in-context-learning strengths that attention provides. The result is a model that keeps much of the SSM efficiency on long context while keeping much of the Transformer quality — a deliberate hybrid rather than a compromise. This is the structural reason Jamba can offer a 256K window with a flatter cost-and-latency curve than a same-size pure Transformer would.

On top of the hybrid stack, Jamba uses a Mixture-of-Experts (MoE) design. In an MoE, the model has many "expert" sub-networks but a router activates only a small subset for any given token, so the model has a large total parameter count (for capability) while only a fraction of those parameters do work on each token (for efficiency). The net effect across the whole design: a large, capable model that is comparatively economical to run — especially on the long-context workloads it is built for.

why this matters for your bill

Pure-Transformer attention cost scales roughly with the square of context length; an SSM scales about linearly and keeps memory roughly flat. By interleaving SSM layers with attention layers (and adding MoE), Jamba keeps long context far cheaper and faster than a same-size Transformer — which is exactly why the 256K window is usable in production rather than just on a spec sheet.

the differentiator

IIIThe 256K context window — Jamba's headline feature

Context window — the amount of text a model can consider in a single request — is the single number that most often decides whether Jamba is the right model. At 256K tokens, Jamba sits among the longest-context options on Bedrock, and on big-input workloads that window is the difference between one clean call and a brittle chunking pipeline.

To make 256K tokens concrete: a token is roughly three-quarters of a word in English, so 256K tokens is on the order of 180,000–200,000 words — comparable to a long novel, or several hundred pages of contracts, filings, transcripts, logs, or documentation. Everything you place in that window is available to the model at once, with no need to pre-summarize or drop material to make it fit.

Why does that matter so much? Two reasons. First, it removes engineering you would otherwise have to build. When a document is larger than the context window, you must split it, process the pieces separately, and stitch the results back together — a map-reduce pattern that loses cross-references, double-counts, and is fiddly to get right. A 256K window lets a large document, or a large set of retrieved chunks, go into the model whole, so cross-document reasoning ("does clause 14 in contract A conflict with the indemnity in contract B?") works in a single call.

Second, it changes how you build RAG. Retrieval-augmented generation works by fetching relevant chunks from a knowledge base and feeding them to the model as context. With a small window you can afford only a handful of chunks, so retrieval quality has to be near-perfect or the answer is missing the relevant passage. A 256K window lets you pass far more retrieved context — more chunks, longer chunks, more documents — which makes the whole pipeline more forgiving of imperfect retrieval and better at questions whose answer is spread across many sources. (See the rag-on-aws sibling for the full pattern.)

The honest counterpoint, stated plainly: a long window is a capability, not a free lunch. Input is billed per token, so a 256K-token prompt costs far more than a 4K-token one on the same model — filling the window every call is expensive and often unnecessary. And research across the field shows models can attend less reliably to the deep middle of a very long context than to its start and end, so retrieval and good prompt structure still matter even when everything fits. The right discipline is to use the long window when the task genuinely needs it, and to lean on prompt caching for any large fixed prefix (see amazon-bedrock-prompt-caching) so you are not re-paying for the same context on every request.

getting in

IVModel IDs and how to enable access

Before you can call Jamba on Bedrock, you do one small but mandatory thing: request model access in your account. Foundation models on Bedrock are off by default; turning Jamba on is a one-time, no-cost step in the console.

Enabling access. In the Bedrock console, open Model access, find the AI21 Jamba models you want, and request access. For most models this is granted effectively immediately; some prompt for brief use-case details. There is no charge for enabling access — you only pay when you actually call a model. Access is per-account and per-region, so if you operate in several regions, enable Jamba in each region you will call from. Where you need extra availability or throughput, cross-region inference profiles can route calls across a set of regions (see the amazon-bedrock-cross-region-inference sibling).

Model IDs. Every model on Bedrock is invoked by a model ID — a string identifying the provider, model, and version. AI21's models are namespaced under AI21, so Jamba IDs are of the shape ai21.jamba-… (for example, an identifier for Jamba 1.5 Mini versus Jamba 1.5 Large, each with a version suffix). You pass this ID to the API to choose which Jamba model answers a request, so moving a workload from Mini to Large is a change of model-ID string. Because IDs advance with each generation, do not hard-code a guessed value — read the current ID from the Bedrock model catalog (console) or list it via the API/CLI, and treat it as configuration rather than a literal in your code.

Permissions. The IAM principal making the call needs permission for the relevant Bedrock invoke actions (and, if you use cross-region inference profiles, permission on the profile). A least-privilege policy scoped to the specific Jamba model ARNs you intend to use is the recommended posture. Once access is granted and IAM is in place, you are ready to call Jamba — the Converse snippet later in this page shows the minimal request.

Open the Bedrock console → Model access → request access to the AI21 Jamba models you need (free; usually instant).
Enable access in each region you will call from; consider a cross-region inference profile for availability and throughput.
Get the current model ID (shape ai21.jamba-…) from the model catalog or via the API — do not hard-code a guessed version string.
Attach an IAM policy granting the Bedrock invoke actions on the specific Jamba model ARNs (least privilege).
You are billed only on invocation — enabling access costs nothing.

what it costs

VJamba on Bedrock — per-model pricing

Jamba on Bedrock is billed per token: a rate per 1,000 input tokens (everything you send, including the long context) and a higher rate per 1,000 output tokens (everything Jamba generates). Mini is the low-cost tier; Large is the higher-quality, higher-price tier. With long context the input side dominates the bill, so the per-input-token rate matters most.

The table gives representative 2026 on-demand rates for the two Jamba models, shown per 1,000 and per 1,000,000 tokens (the per-million column is the per-1K figure × 1,000; providers increasingly quote per-million). Use it to rank the models and sanity-check a budget — not as an audited price sheet. Two cost levers sit on top of these rates and are not in the table: Batch (submit non-interactive work as an async job for roughly half the on-demand price — ideal for bulk long-document processing) and prompt caching (stop re-paying full input price for a repeated prefix such as a fixed instruction block or a reference document). Both matter a great deal precisely because Jamba's workloads tend to be large-input. See amazon-bedrock-pricing and amazon-bedrock-prompt-caching.

representative on-demand Jamba-on-Bedrock pricing · per 1K and per 1M tokens · 2026

Jamba model	Context	Input / 1K	Output / 1K	Input / 1M	Output / 1M	Cost position
Jamba 1.5 Mini	256K	$0.0002	$0.0004	$0.20	$0.40	Low — fast, high-volume, cheap long context
Jamba 1.5 Large	256K	$0.002	$0.008	$2.00	$8.00	Mid — higher quality for harder long-context work

Representative 2026 figures for relative comparison only — confirm current rates on the AWS Bedrock pricing page (they change with each generation and vary by region). With long context the input side usually dominates the bill, so Mini's low input rate is what makes large-document and big-RAG workloads economical. Batch (~50% off) and prompt caching lower the effective rate further. Large is roughly 10× Mini on input — so route the easy long-context bulk to Mini and reserve Large for the hard cases.

what it is good at

VIWhere Jamba is strong — long docs, big-corpus RAG, structured output, tools

Jamba is not trying to be the strongest general-purpose frontier model. It is engineered to win a specific, common, and expensive class of work: tasks that are large in input. Mapped to concrete capabilities, here is where it is the right pick.

Long-document processing

This is the home-turf use case. Summarizing, analyzing, or answering questions over a single very large document — a 200-page contract, a financial filing, a research dossier, a long deposition transcript — fits the 256K window without chunking. Because the whole document is in context, the model can resolve cross-references and reason about the document as a coherent whole rather than as disconnected fragments, and the flat-ish long-context cost curve keeps the per-document price sensible even at scale (especially on Mini, and especially via Batch for bulk jobs).

RAG over a large corpus

For retrieval-augmented generation, the long window lets you pass many more retrieved chunks into a single call than a short-context model allows — more documents, longer passages, more of the knowledge base per question. That makes the pipeline more robust to imperfect retrieval and better at questions whose answer is distributed across many sources. Jamba pairs naturally with Bedrock Knowledge Bases (managed RAG) as the generation model behind a large retrieval set. (See amazon-bedrock-knowledge-bases and rag-on-aws.)

Structured output (native JSON)

Jamba models support structured output, notably the ability to return valid JSON conforming to a shape you specify. That is exactly what you want when the model's output feeds another system — extraction pipelines, form-filling, populating a database, or any step where you parse the response programmatically. Reliable JSON output removes the brittle "ask for JSON and hope" post-processing that otherwise surrounds LLM integrations, and it pairs well with the long-document case (extract structured fields from a big unstructured document in one pass).

Tool use / function calling

Jamba supports tool use: you describe tools (functions, APIs, queries) and the model decides when to call them and with what arguments, then folds the results into its answer. This is the basis of agentic and grounded workflows — letting the model look things up, take actions, and ground responses in live data — and on Bedrock it is exposed through the Converse API's tool fields, so a Jamba-backed agent is built the same way as any other Bedrock agent. (See amazon-bedrock-agents.)

choosing the model

VIIWhen to pick Jamba vs Claude vs Llama for long context

Jamba is one of several Bedrock models that can handle long inputs. The honest framing is that this is a workload-and-budget decision, not a "which model is best" decision — and the right answer depends on what you are optimizing for on the long-context job in front of you.

Pick Jamba when the workload is large-input and cost-sensitive. Its reason to exist is efficient long context: if you are routinely sending very large prompts — long documents, big RAG contexts, bulk processing — and you want the 256K window without paying a frontier-model rate for every token, Jamba (especially Mini) is the natural fit. The hybrid SSM-Transformer + MoE design is precisely what keeps that long-context path fast and economical. It is the value pick for "a lot of text, at scale."

Pick Claude when the long-context job also needs top-tier reasoning. Claude on Bedrock offers a large context window and the strongest reasoning, vision, extended thinking, and a deep capability profile — so for a long-context task that is also genuinely hard (intricate multi-document analysis, nuanced synthesis, high-stakes agentic steps over long inputs), Claude (Sonnet or Opus) is often worth its higher per-token price. Use Claude when quality on the hard part dominates; use Jamba when efficient throughput over large inputs dominates. Many teams run both behind one Converse API and route accordingly. (See claude-on-amazon-bedrock.)

Pick Llama when you want open-weight flexibility and a strong general model. Meta's Llama models on Bedrock are capable, widely supported open-weight models with competitive pricing and good general performance; some generations offer large context too. Choose Llama when you value the open-weight ecosystem, want portability across environments, or already standardize on it — recognizing that its long-context cost curve is conventional-Transformer-shaped rather than SSM-efficient, so on the very largest inputs Jamba's architectural edge can tell. (See amazon-bedrock-models for the full provider line-up.)

The meta-point, true across this whole cluster: because every model sits behind the same Bedrock API, this is not a one-way door. Start with whichever fits, benchmark the candidates on your own documents and prompts — long-context behavior in particular varies by task in ways leaderboards do not capture — and re-tier as prices and capabilities move, without re-plumbing your application. The comparison table below puts the three side by side.

calling it

VIIIA minimal Converse API call

The recommended way to call Jamba (and any chat model) on Bedrock is the <strong>Converse API</strong> — a single, model-agnostic interface for multi-turn messages, system prompts, tool use, and structured output. Because it is model-agnostic, the same code calls Jamba Mini or Large by changing only the model ID — and can call Claude or Llama the same way.

A minimal text request with the AWS SDK looks like the snippet below (Python / boto3). You create a Bedrock Runtime client, call converse with a model ID and a list of messages, and read the reply from the response. Swapping modelId between the Jamba Mini and Large IDs is the only change needed to move a request between the two tiers — and the same call shape would target Claude or Llama instead.

import boto3
client = boto3.client("bedrock-runtime", region_name="us-east-1")
resp = client.converse(
  modelId="ai21.jamba-<mini|large>-<version>", # from the model catalog
  messages=[{"role": "user", "content": [{"text": "Summarize the indemnity terms across these contracts: ..."}]}],
  system=[{"text": "You are a precise legal-analysis assistant. Answer only from the documents provided."}],
  inferenceConfig={"maxTokens": 1024, "temperature": 0.2},
)
print(resp["output"]["message"]["content"][0]["text"])

That is the whole pattern for a basic call. For Jamba's signature workloads you extend it the same model-agnostic way: place a long document or many retrieved chunks in the message content to exploit the 256K window; add tool use (a toolConfig describing your functions, with a multi-step loop to feed results back) for agents; request structured JSON via your prompt and schema for extraction pipelines; and use streaming (the converse_stream variant) for token-by-token output on long generations. The API surface barely changes as you add capabilities — that is the point of Converse. The exact model ID string must come from the Bedrock model catalog; the placeholder above is illustrative, not a literal value.

why Converse

The Converse API is model-agnostic: one interface for messages, system prompts, tool use, and structured output across every Bedrock model. Switching Jamba Mini ↔ Large — or swapping Jamba for Claude or Llama to compare them on your long-context task — is a change to modelId, not a rewrite. Build once, route per request.

how it becomes $0

IXHow AWS credits make running Jamba $0

Everything above prices Jamba on Bedrock if you pay AWS directly. For most startups and many companies the relevant number is different — because AWS will frequently fund the build with credits, and Jamba usage on Bedrock draws those credits down before it ever touches your card. Long-context workloads are large-input by nature, so this matters even more here than on a lightweight model.

Jamba inference on Bedrock is ordinary AWS spend, so it is fully credit-eligible and credits apply automatically against your bill until exhausted — covering Jamba input and output tokens, any Batch and prompt-caching usage, plus the supporting services a long-context app leans on (Knowledge Bases, the vector store behind RAG, S3 for the documents, logging). That is significant precisely because the headline 256K-context use cases consume a lot of input tokens: the credit pool absorbs exactly the spend that would otherwise grow fastest. The relevant pools: AWS Activate (general startup credits, commonly up to $100K for institutionally-funded startups); a dedicated Bedrock / Generative-AI POC pool ($10K–$50K) aimed at proving out a GenAI use case; and the competitive Generative AI Accelerator (awards up to $1M for a small cohort of AI-first startups).

The practical mechanic is that most of these pools are partner-filed — requested through the AWS Partner Network (the ACE program), not a public self-serve form — which is why teams route through an AWS partner rather than applying alone. That is the gap CloudRoute fills. CloudRoute matches you to the right credit pool for your stage and to a vetted AWS DevOps/ML partner who both files the credit application and helps build the Jamba workload — the long-document pipeline, the large-context RAG over Knowledge Bases, the structured-extraction step, the tool-using agent, prompt caching on the fixed prefix. The customer pays $0 — AWS funds the credit pool, AWS pays the partner through engagement-funding programs, and the partner pays CloudRoute a routing commission. You never see an invoice.

Put together with Jamba's own cost advantages — efficient long context, Mini for the cheap bulk path, Batch and caching on top — the picture for a startup is: build the long-document or big-RAG product on the model tier each request actually needs, cache the repeated context, and run the whole thing on a $25K–$100K (or larger) credit pool while you find product-market fit — paying real money only once usage, and ideally revenue, has scaled past the credits. Related: AWS credits for generative-AI startups and Bedrock POC funding for the full credit mechanics.

long context, three ways

Jamba vs Claude vs Llama for long context on Bedrock

The core decision in one place: three Bedrock options for long-input work, compared on context, architecture, the cost shape that matters at scale, and the job each suits. Match the workload to the model that optimizes what you actually care about. Representative 2026 figures for relative comparison, not quotes.

Model	Context window	Architecture	Long-context cost shape	Best for	Reach for it when
AI21 Jamba (Mini / Large)	256K (very long)	Hybrid SSM (Mamba) + Transformer + MoE	Flattest — SSM efficiency keeps big inputs cheap	Long-document processing, big-corpus RAG, structured extraction	A lot of text, at scale, cost-sensitive
Anthropic Claude (Sonnet / Opus)	Large	Transformer (frontier)	Higher per token, but top reasoning	Long-context work that is also genuinely hard	Quality on the hard part dominates
Meta Llama	Large (varies by gen)	Transformer (open-weight)	Conventional Transformer scaling	Open-weight flexibility, strong general use	You value open weights / portability

All three sit behind the same Bedrock Converse API, so switching to benchmark them on your own documents is a model-ID change. Jamba's edge is the architecture: on the very largest inputs its SSM layers keep cost and latency flatter than a same-size pure Transformer, which is why it is the value pick for high-volume long-context work. Pick Claude when reasoning quality dominates; pick Llama for the open-weight ecosystem. On any of them, AWS credits cover the spend.

long context, on AWS's budget

Credits cover Jamba's big-input workloads on Bedrock — get the pool + a partner to build it ($0)

Get matched in 24h →

a recent match

A document-heavy RAG product moved onto Jamba — and onto $0 — anonymized

inquiry · Series-A legal-tech SaaS, London

Series-A legal-tech SaaS, 21 people, building contract-analysis over very large document sets

Situation: The product had to read and cross-reference large contract bundles — often 150–300 pages — and answer questions whose answers spanned multiple documents. On their existing short-context frontier model they were forced into an aggressive chunk-and-stitch pipeline that lost cross-references and was expensive per query, and the inference bill was climbing out of runway as usage grew. They were already an AWS customer and wanted long context, lower cost, and to stop paying for it out of pocket.

What CloudRoute did: CloudRoute matched them in under 24 hours to a EU-West AWS partner with GenAI and RAG experience. The partner (1) moved the long-document and RAG generation onto AI21 Jamba via Bedrock's Converse API to use the 256K window — letting whole contract bundles and far more retrieved context go in per call; (2) routed the easy, high-volume queries to Jamba 1.5 Mini and reserved Jamba 1.5 Large for the hard multi-document analyses; (3) wired structured JSON output into the extraction step and prompt caching onto the fixed instruction prefix; and (4) filed a Bedrock POC credit application plus an Activate Portfolio application to fund the workload.

Outcome: The chunk-and-stitch pipeline was retired in favor of whole-document calls, cross-reference accuracy improved, and the Mini/Large split plus caching cut the modeled per-query cost substantially — but the decisive change was that the spend now draws down AWS credits instead of runway, so the team pays $0 during the build and early scale. CloudRoute's commission was paid by the partner from AWS engagement funding, not by the customer.

window used: 256K · pattern: whole-doc + big-RAG, Mini/Large split, JSON + caching · credits secured: POC + Activate · out-of-pocket: $0

faq

Common questions

What is AI21 Jamba on Amazon Bedrock?

Jamba is AI21 Labs' family of foundation models, available natively on Amazon Bedrock as one of the providers behind Bedrock's single managed API (alongside Anthropic Claude, Amazon Nova and Titan, Meta Llama, Mistral, Cohere, and others). Its defining feature is a very long 256K-token context window, made practical by a hybrid architecture. On Bedrock the family is Jamba 1.5 Mini (fast, low-cost) and Jamba 1.5 Large (more capable), both supporting structured JSON output and tool use. You enable access per account and region in the Bedrock console.

What makes Jamba's architecture different from a normal Transformer?

Jamba is a hybrid. Instead of being a pure Transformer (which uses self-attention whose cost grows roughly with the square of sequence length), it interleaves efficient state-space (SSM / Mamba) layers — whose cost grows about linearly and whose memory stays roughly flat as context grows — with Transformer attention layers that recover precise recall and in-context learning. It also adds a Mixture-of-Experts (MoE), activating only a fraction of its parameters per token. The result is a large, capable model that keeps long context far cheaper and faster than a same-size pure Transformer.

How big is Jamba's context window, and why does it matter?

Jamba on Bedrock offers a 256K-token context window — on the order of 180,000–200,000 words, or several hundred pages — among the largest on Bedrock. It matters because it lets a whole large document, or a large set of retrieved RAG chunks, go into the model in one call: no brittle chunk-and-stitch pipeline, cross-document reasoning in a single request, and RAG that is far more forgiving of imperfect retrieval because you can pass much more context. Input is billed per token, though, so fill the window only when the task needs it and use prompt caching for any fixed prefix.

What are the Jamba models and model IDs on Bedrock?

The family on Bedrock is Jamba 1.5 Mini (the smaller, faster, lower-cost model) and Jamba 1.5 Large (the larger, more capable model), both with the 256K context window. Each is invoked by a model ID namespaced under AI21 — of the shape ai21.jamba-… with a version suffix — which you pass to the API to pick the model. Because IDs advance with each generation, read the current value from the Bedrock model catalog in the console or list it via the API/CLI rather than hard-coding a guess; treat it as configuration.

How much does Jamba cost on Bedrock?

It is billed per token, per model: representative 2026 on-demand rates are roughly $0.20 / $0.40 per million input/output tokens for Jamba 1.5 Mini and about $2 / $8 per million for Jamba 1.5 Large. With long context the input side usually dominates, so Mini's low input rate is what makes large-document and big-RAG workloads economical; Large is the higher-quality option for harder cases. Batch (~50% off) and prompt caching lower the effective rate further. These are representative figures for relative comparison — confirm current rates on the AWS Bedrock pricing page, as they change with each generation and vary by region.

When should I pick Jamba over Claude or Llama for long context?

Pick Jamba when the workload is large-input and cost-sensitive — long-document processing, big-corpus RAG, bulk extraction — where you want the 256K window without a frontier-model rate; its hybrid SSM-Transformer + MoE design keeps that path fast and economical. Pick Claude when the long-context job is also genuinely hard and top-tier reasoning quality dominates (it is worth its higher price there). Pick Llama when you value the open-weight ecosystem and portability. Since all sit behind the same Converse API, benchmark them on your own documents and route accordingly.

Does Jamba support structured output and tool use?

Yes. Jamba models on Bedrock support structured output — notably returning valid JSON to a shape you specify — which is ideal when the response feeds an extraction pipeline, form-filling, or a database, removing brittle post-processing. They also support tool use / function calling: you describe tools and the model decides when to call them and with what arguments, then folds the results into its answer. On Bedrock both are reached through the Converse API, so a Jamba-backed agent or extractor is built the same way as for any other Bedrock model.

Can AWS credits cover Jamba usage on Bedrock?

Yes. Jamba on Bedrock is ordinary AWS spend, so it is fully credit-eligible and credits apply automatically against your bill — covering Jamba input and output tokens (which add up fast on long-context jobs), Batch and prompt-caching usage, and supporting services like Knowledge Bases, the vector store, and S3. The relevant pools are AWS Activate (up to $100K), a Bedrock/GenAI POC pool ($10K–$50K), and the GenAI Accelerator (up to $1M), most of which are partner-filed via the AWS Partner Network. CloudRoute routes you to the right pool and a vetted AWS partner who files the application and builds the Jamba workload — customer pays $0, AWS funds it.

Run long context on AWS's budget, not your runway

Jamba's 256K window and SSM-efficient architecture make big-document and large-RAG workloads affordable — and on Bedrock the spend draws down AWS credits instead of your card, under your existing IAM, VPC, and billing. CloudRoute routes you to the right credit pool (Activate up to $100K, Bedrock POC $10K–$50K, GenAI Accelerator up to $1M) and a vetted AWS partner who builds the Jamba long-document or RAG pipeline, splits traffic across Mini and Large, and turns on caching. Customer pays $0.

Get matched in 24h →→ see the AI-team persona detail

matched within< 24h

GenAI credit ceilingup to $1M

cost to you$0