bedrock opensearch vector search · the default vector store · 2026

Bedrock + OpenSearch Serverless — the default vector store, demystified.

A complete, neutral reference for using Amazon OpenSearch Serverless as the vector store behind Amazon Bedrock RAG in 2026: what it is, how it holds vectors (collection → vector index → k-NN), the exact setup whether Bedrock auto-creates it or you build it by hand, why Knowledge Bases pick it by default, the OCU cost model and the redundancy minimum that surprises everyone, how to tune the k-NN engine and dimensions, and how it stacks up against Aurora pgvector and Pinecone — plus how AWS credits make the whole build $0.

what it is
serverless vector DB
KB default store
OpenSearch Serverless
cost unit
OCU
cost with credits
$0
TL;DR
  • Amazon OpenSearch Serverless is the default vector store behind Amazon Bedrock Knowledge Bases, and a common choice for DIY Bedrock RAG. It stores each embedding as a k-NN (k-nearest-neighbour) vector field inside a vector index, inside a "vector search" collection — fully managed, auto-scaling, no servers. When you create a Knowledge Base and accept the default, Bedrock provisions the collection, the vector index, and the field mappings for you in a click.
  • The cost model is the gotcha. OpenSearch Serverless bills by OpenSearch Compute Units (OCUs) — separate pools for indexing and search — plus S3-backed storage, and it charges a standing baseline whether or not you are querying. A production "vector search" collection defaults to redundancy across Availability Zones, which sets a minimum OCU floor; that floor is why a tiny prototype can still cost real money each month. You can lower the floor (e.g. a dev/test collection without standby redundancy), but you cannot make the baseline zero.
  • Tuning lives in the k-NN field: the engine (FAISS, Lucene, or nmslib), the algorithm (HNSW by default, or IVF for huge corpora), the distance metric, the vector dimension (must match your embeddings model), and the HNSW parameters (m, ef_construction, ef_search) that trade recall against memory and latency. When OpenSearch is right: you want native hybrid (vector + BM25) in one engine, AWS-managed search, or you are already on OpenSearch. When it is not: a tiny/bursty workload where Aurora pgvector is cheaper, or a vector-only need where Pinecone is simpler. Either way AWS credits (Activate up to $100K, a Bedrock/GenAI POC pool $10K–$50K, the GenAI Accelerator up to $1M) cover the OCU bill — CloudRoute routes you to the pool and a vetted partner, so you pay $0.
the concept

IWhat "Bedrock OpenSearch vector search" actually means

The phrase bundles two services. Amazon Bedrock is the managed foundation-model layer that runs your RAG; Amazon OpenSearch Serverless is the vector database that holds the embeddings Bedrock searches. "Bedrock OpenSearch vector search" is the pattern where OpenSearch Serverless is the vector store sitting behind Bedrock retrieval — the default arrangement for Bedrock Knowledge Bases.

A quick refresher on why a vector store exists at all. In retrieval-augmented generation, every chunk of your documents is turned into an embedding — a list of a few hundred to a couple of thousand numbers that captures the chunk's meaning, so that semantically similar text lands near it in vector space. At question time, the question is embedded the same way and the system finds the chunks whose vectors are nearest to the question vector. That "find the nearest vectors, fast, across millions of them" job is exactly what a vector store does, and it is the component Bedrock does not abstract away: you choose it, you can see it, and you pay for it directly.

Amazon OpenSearch Serverless is the on-demand, auto-scaling flavour of Amazon OpenSearch Service (the managed fork of Elasticsearch/OpenSearch). Instead of provisioning and sizing a cluster of nodes, you create a collection and AWS runs the underlying capacity for you, scaling it up and down with load. OpenSearch has long supported a k-NN (k-nearest-neighbour) plugin for vector search, and the serverless form exposes a dedicated "vector search" collection type tuned for exactly this workload — storing embeddings and answering approximate-nearest-neighbour queries.

Put the two together and the shape is simple. Bedrock owns the model work — embedding your chunks at ingest, embedding each query, and (in Knowledge Bases) running the whole retrieve-and-generate loop. OpenSearch Serverless owns the storage and search — holding every vector plus its source text and metadata, and returning the closest matches when Bedrock queries it. The connection between them is a vector index inside the collection, with field mappings that tell OpenSearch which field is the k-NN vector, what dimension it is, and how to search it.

One thing worth saying up front, because it frames the rest of this page: OpenSearch Serverless is powerful and is the path of least resistance on AWS, but it is not the cheapest option at small scale. Its serverless capacity carries a standing baseline cost. That trade — managed, scalable, hybrid-capable, but with a cost floor — is the single most important thing to understand before you pick it, and §V covers it in full.

the one-sentence definition

Amazon OpenSearch Serverless is a fully-managed, auto-scaling vector database that, as a "vector search" collection, stores your Bedrock embeddings as k-NN vector fields and answers approximate-nearest-neighbour queries — making it the default vector store behind Bedrock Knowledge Bases and a common store for DIY Bedrock RAG.

the data model

IIHow OpenSearch Serverless holds vectors — collection, index, k-NN field

Before the setup steps make sense, it helps to know the three nested objects OpenSearch Serverless uses to store vectors. Everything you configure — and everything you pay for — hangs off these three.

From outermost to innermost, the structure is collection → index → k-NN field. A Knowledge Base maps onto exactly this: one collection holds one (or more) vector index, and the index has one k-NN field where the embeddings live, plus companion fields for the chunk text and its metadata.

The collection (vector search type)

A collection is the top-level container and the unit of capacity, security, and billing in OpenSearch Serverless. When you create one for RAG you choose the "vector search" collection type (the other types are "time series" and "search"), which configures it for the vector workload. The collection is governed by three policies you must have in place for it to work: an encryption policy (a KMS key — AWS-owned or your own), a network policy (public access or access via a VPC endpoint), and one or more data access policies (which IAM principals — including the Bedrock Knowledge Base service role — may read and write which indexes). Missing or mis-scoped policies are the most common reason a setup "succeeds" but then fails to ingest or query.

The vector index and its field mappings

Inside the collection lives a vector index — the searchable structure that actually holds documents. Its mappings define the fields. The one that matters most is the k-NN vector field (type knn_vector): you declare its dimension (which must exactly equal the output dimension of your embeddings model), the distance metric (cosine, Euclidean/L2, or dot product), and the engine/algorithm settings covered in §VI. Alongside it the index carries a text field holding the original chunk so retrieval can return the source passage, and a metadata field holding the JSON metadata used for filtering and citations. The index must also have index.knn enabled so the k-NN engine builds an approximate-nearest-neighbour structure rather than scanning every vector.

How a query flows through it

At query time the path is: Bedrock embeds the question, then issues a k-NN query against the index ("return the k documents whose vector field is nearest to this query vector," optionally with a metadata filter). OpenSearch walks its approximate-nearest-neighbour graph, returns the top matches with their text, metadata, and a similarity score, and Bedrock assembles them into the prompt. Because OpenSearch is a full search engine, the same index can also answer a BM25 keyword query, which is what makes native hybrid search (vector + keyword in one engine) possible — a genuine advantage over vector-only stores, covered in §VI and §VII.

the nesting in one line

Collection (vector-search type, the billing + security unit) → vector index (the searchable structure, with field mappings) → k-NN vector field (type knn_vector, with a dimension matching your embeddings model, a distance metric, and engine/algorithm settings). Get the dimension and the access policy right and most setup problems disappear.

getting it running

IIISetup — Bedrock auto-create vs building the collection by hand

There are two ways the collection and index come into existence: Bedrock creates them for you (the quickstart, and what most teams should use), or you create them yourself first and point a Knowledge Base — or a DIY pipeline — at them. They produce the same end state; they differ in how much control you keep.

Path A — let Bedrock auto-create it (quickstart)

When you create a Knowledge Base in the Bedrock console and accept the default vector store, Bedrock provisions a new OpenSearch Serverless vector collection, creates the vector index with the correct field mappings, wires the encryption, network, and data-access policies (granting the Knowledge Base service role read/write), and sets the k-NN field's dimension to match the embeddings model you chose. It is genuinely one decision — "Quick create a new vector store" — and you get a working store in a couple of minutes with nothing to configure. This is the right path for the large majority of builds: it removes the part of setup most likely to be misconfigured (policies and field mappings).

Path B — create the collection yourself first (control)

If you need control — a specific collection name, your own KMS key, a VPC-only network policy, a non-default k-NN engine or HNSW parameters, or you are wiring a DIY pipeline rather than a Knowledge Base — you create the pieces yourself. The order is: (1) create a vector-search collection; (2) attach an encryption policy, a network policy, and a data-access policy that grants the Bedrock Knowledge Base role (or your application's IAM role) the needed actions on the collection and index; (3) create the vector index with a knn_vector field of the right dimension, a distance metric, your chosen engine/algorithm, plus the text and metadata fields; (4) when creating the Knowledge Base, choose "use an existing vector store" and supply the collection ARN, index name, and the names of the vector / text / metadata fields. For a DIY pipeline you instead write and query the index directly via the OpenSearch API or SDK.

Common setup snags

Three issues cause most failed setups. Dimension mismatch: the k-NN field dimension must equal the embeddings model's output dimension exactly (e.g. 1,024 for Titan Text Embeddings v2 at its default size, 1,536 for the v1 generation) — a mismatch fails ingestion. Access-policy gaps: the Bedrock service role must appear in a data-access policy with read/write on the index, and the network policy must allow Bedrock to reach the collection — miss either and ingestion or query silently fails. Field-name mismatch: when you bring your own index, the vector, text, and metadata field names you give Bedrock must match the mappings you created. The auto-create path exists precisely because it eliminates all three.

which path to pick

Use auto-create (Path A) unless you have a concrete reason not to — it removes the three most error-prone steps (policies, field mappings, dimension). Use bring-your-own (Path B) when you need a specific KMS key, a VPC-only network policy, a non-default k-NN engine/algorithm, a shared collection, or you are building a DIY pipeline that talks to OpenSearch directly.

why it is the default

IVWhy Bedrock Knowledge Bases use OpenSearch Serverless by default

Knowledge Bases supports several vector stores — OpenSearch Serverless, Aurora PostgreSQL with pgvector, Pinecone, Redis Enterprise Cloud, and Neptune Analytics — yet OpenSearch Serverless is the one offered as the default, one-click option. The reasons are mostly about removing friction, and they are worth understanding so you know when to override the default.

The first reason is zero setup. OpenSearch Serverless is the only supported store Bedrock can fully provision for you on the spot — collection, index, mappings, and policies — without you touching another service or creating an external account. For a managed-RAG product whose whole pitch is "point it at your data and go," a vector store that needs no prior provisioning is the natural default.

The second is that it is fully managed and AWS-native. There are no nodes to size, no version upgrades to run, and it auto-scales with load, so the managed promise of Knowledge Bases extends cleanly to the storage layer. It also lives inside your account and Region, inherits IAM and KMS, and keeps your data within AWS — which matters for the security posture most enterprises expect from Bedrock.

The third is capability. OpenSearch is a full search engine, so a single index can serve both vector (k-NN) and keyword (BM25) search, making native hybrid retrieval available without a second system. It scales to large corpora, supports metadata filtering for precision and access control, and gives a single, consistent place for both the bulk corpus and any keyword-heavy lookups. For a default that has to be "good enough for most production," that breadth is the point.

The honest counterpoint — and the reason the default is not always right — is cost at small scale. Because the default vector-search collection is provisioned for redundancy and carries a standing OCU baseline, a brand-new Knowledge Base with a few hundred documents still incurs a non-trivial monthly minimum even when idle. For a tiny prototype or a bursty internal tool, that floor can be larger than the rest of the bill combined — which is exactly why teams watching cost often switch the default to Aurora pgvector. The next section is the full cost picture.

the cost model (the gotcha)

VThe OCU cost model — and the redundancy minimum nobody expects

OpenSearch Serverless is the line item that surprises teams, so it is worth getting exactly right. There is no per-query price; you pay for capacity measured in OpenSearch Compute Units (OCUs) plus storage, and there is a standing minimum you pay even at zero traffic. Figures here are representative as of 2026 to show the shape of the bill — check the AWS OpenSearch Service pricing page for current rates.

Capacity is measured in OpenSearch Compute Units (OCUs) — a bundle of compute and memory. Crucially there are two separate OCU pools: one for indexing (writing and embedding ingestion) and one for search (serving queries). Each is billed per OCU-hour, and serverless scales the number of OCUs in each pool up and down with load. On top of compute you pay for managed storage (vectors and indexes are persisted to S3-backed storage, billed per GB-month) and the usual data-transfer and KMS costs. Because OCUs are billed by the hour they are running, the bill is driven by how much capacity is kept warm, not by a per-request charge.

Here is the part that catches people. By default, an OpenSearch Serverless collection is provisioned for redundancy — capacity is spread across multiple Availability Zones with standby, so a production collection has a minimum OCU floor that runs continuously (a baseline for indexing and a baseline for search, kept warm even when nothing is happening). That floor is the "redundancy minimum": you are paying for a small always-on amount of capacity in two pools across AZs, around the clock, regardless of whether a single query was served. For a large corpus this baseline is a rounding error; for a tiny prototype it can be the dominant cost — and it is the most common reason a developer is startled by the first month's OpenSearch bill.

You have levers, but none make the baseline zero. You can create a collection without standby redundancy (a "development/test" posture) to roughly halve the OCU floor — appropriate for non-production workloads, at the cost of the HA guarantee. You can consolidate multiple indexes into fewer collections so you pay one baseline instead of several (the floor is per collection, so a handful of small Knowledge Bases each in its own auto-created collection multiplies the minimum). You can keep the corpus and the returned context tight so search OCUs do not scale up unnecessarily. And you can choose a smaller embedding dimension to cut storage and the memory footprint of the index. But the structural fact remains: OpenSearch Serverless has a standing cost, and if your workload is small or bursty, Aurora Serverless v2 with pgvector — which can scale its capacity much closer to zero when idle — is frequently cheaper.

The way to think about it: OpenSearch Serverless is priced like always-on managed capacity, not like a per-request API. That is great when you have steady, meaningful query volume and want hybrid search and zero ops — the baseline amortizes and the managed scaling earns its keep. It is poor value when you have a handful of documents and a query every few minutes, where you are mostly paying the redundancy floor to sit idle. Match the store to the workload shape, not to the default.

opensearch serverless cost model for bedrock vector search · representative shape as of 2026 — check the AWS pricing page for current rates
Cost componentUnitWhen you payThe gotcha / lever
Indexing OCUsPer OCU-hourIngestion / re-embedding + a warm baselineA minimum floor runs continuously, even idle
Search OCUsPer OCU-hourServing queries + a warm baselineSeparate pool with its own continuous minimum
Redundancy minimumFloor across the two poolsAlways (default = AZ-redundant)The surprise line; halve it with a no-standby dev collection
Managed storagePer GB-month (S3-backed)Continuously, with corpus sizeSmaller embedding dimension = less storage
Per collection, not per indexBaseline × number of collectionsAlwaysMany tiny KBs = many baselines; consolidate
Data transfer / KMSStandard AWS ratesWith usageUsually minor relative to OCUs
There is no per-query price — you pay for warm capacity (OCUs) + storage. The standing redundancy minimum across the two OCU pools is the line that surprises teams; at small scale it can dominate the bill, which is when Aurora Serverless v2 + pgvector is often cheaper. Representative as of 2026 — confirm on the AWS OpenSearch Service pricing page.
the cost gotcha in one line

OpenSearch Serverless bills warm OCU capacity (two pools: indexing + search) + storage, and the default AZ-redundant collection carries a standing minimum you pay even at zero traffic. Great for steady volume; expensive for a tiny prototype — where pgvector on Aurora Serverless v2 is usually cheaper. All of it is AWS-credit-eligible.

the knobs

VITuning — k-NN engine, algorithm, dimensions, and HNSW parameters

Most teams accept the defaults and are fine. But OpenSearch exposes real knobs on the k-NN field, and when recall, latency, memory, or cost matter, these are the levers. They live on the vector field's mapping, so the important ones are set at index-creation time.

Engine — FAISS, Lucene, or nmslib

The k-NN field has an engine that implements the approximate-nearest-neighbour index. The common choices are FAISS (Meta's library — broad algorithm support including both HNSW and IVF, supports vector quantization to shrink memory, and the usual pick for large vector workloads), Lucene (the engine built into OpenSearch — no extra native library, supports HNSW, integrates cleanly with filtering, and is a fine default for many corpora), and nmslib (the original HNSW implementation, still available but generally superseded by FAISS for new builds). For most Bedrock RAG, the auto-created index uses a sensible default; reach for FAISS when you need quantization or IVF at scale.

Algorithm — HNSW (default) vs IVF

HNSW (Hierarchical Navigable Small World) is the default approximate-nearest-neighbour algorithm and the right choice for the overwhelming majority of RAG corpora: it gives excellent recall at low query latency, with the trade-off that the graph lives largely in memory (so memory scales with vector count and dimension). IVF (Inverted File index) partitions vectors into clusters and searches only the nearest clusters; it uses less memory and can be the better fit for very large corpora (tens of millions of vectors and up) where HNSW's memory footprint becomes the binding constraint, at some cost to recall/latency tuning effort. Rule of thumb: stay on HNSW until memory cost forces you to evaluate IVF (often paired with FAISS quantization).

Dimension and distance metric

The dimension of the k-NN field must equal your embeddings model's output dimension and is fixed for the life of the index — change the embeddings model (or its dimension) and you re-create the index and re-embed. Where the model supports it, a smaller dimension (e.g. choosing 512 or 256 on a model that allows it) cuts storage and the in-memory index size, speeding search and lowering cost at a modest recall cost — a real lever at scale. The distance metric (cosine similarity, Euclidean/L2, or inner/dot product) should match what your embeddings model was trained for; cosine is the common default for text embeddings.

HNSW parameters — m, ef_construction, ef_search

Three HNSW parameters trade recall against memory and speed. m (the number of bi-directional links per graph node) raises recall and memory as it grows — typical values sit in the 16–48 range. ef_construction (how wide the search is while building the graph) improves index quality at the cost of slower indexing; a higher value builds a better graph once. ef_search (how wide the search is at query time) trades recall for latency on each query and can be tuned without rebuilding. The practical approach: leave the defaults until an evaluation set shows a recall gap, then raise ef_search first (cheap, no rebuild), then m/ef_construction (requires re-indexing) if you need more.

opensearch k-NN tuning knobs for bedrock vector search · 2026
KnobDefault-ish choiceRaise it to…Cost of raisingSet at
EngineLucene or FAISSGet quantization / IVF (FAISS)Native library footprintIndex creation
AlgorithmHNSWCut memory at huge scale (IVF)Recall/latency tuning effortIndex creation
DimensionMatch embeddings model(Lower it) cut storage + memoryModest recall lossIndex creation (fixed after)
Distance metricCosine (for text)Match the model's trainingIndex creation
m (HNSW)16Higher recallMore memoryIndex creation
ef_constructionEngine defaultBetter graph qualitySlower indexingIndex creation
ef_searchEngine defaultHigher recall per queryHigher query latencyQuery time (no rebuild)
Most Bedrock RAG runs fine on auto-created defaults. When an evaluation set shows a recall gap, raise ef_search first (no rebuild), then m / ef_construction (requires re-indexing). Reach for FAISS + IVF + quantization only when memory cost at scale forces it. Dimension and metric are effectively permanent for an index — choose with the embeddings model.
the comparison

VIIOpenSearch Serverless vs Aurora pgvector vs Pinecone

These are the three vector stores most teams weigh for Bedrock RAG. They are all valid; the right answer depends on the shape of your workload, what you already run, and how cost-sensitive you are at your current scale. Here is the honest trade-off, the same dimensions a real architecture review uses.

Amazon OpenSearch Serverless is the AWS-native default. Its standout advantages are zero setup behind Bedrock (auto-created), native hybrid search (vector + BM25 in one engine, which routinely beats vector-only retrieval on real corpora), full management with auto-scaling, and clean scale to large corpora. Its weakness is the standing OCU baseline covered in §V — it is rarely the cheapest option at small or bursty scale, and you pay for redundant capacity even when idle.

Amazon Aurora PostgreSQL with pgvector is the pragmatic choice when you already run Postgres or want to minimize both new infrastructure and cost at low volume. Vectors live in the same database as your relational data, so you can filter with SQL predicates and join to business tables in one query, and Aurora Serverless v2 can scale capacity down close to zero when idle — which is exactly why it is frequently the cheapest store for a prototype or a low-traffic internal tool. The trade-offs: hybrid search is not as turnkey as OpenSearch's single-engine BM25+vector, and a purpose-built vector engine pulls ahead at extreme scale or very high query concurrency. It is a fully supported Knowledge Bases store, so you can pick it from the same dropdown that offers OpenSearch.

Pinecone is a managed, vector-native database (third-party, selectable in Knowledge Bases and available via the AWS Marketplace). Because it does one thing, it offers strong vector performance, serverless scaling, and rich metadata filtering with minimal tuning — attractive when vector search is your core workload and you want a specialist rather than a general engine, or when your team already standardizes on it. The trade-offs are that it is a separate vendor (your vectors leave AWS-native services; Marketplace billing can route through your AWS invoice) and that, being vector-only, you bring your own keyword layer if you want hybrid search.

A useful way to collapse the decision: if you want it to just work behind Bedrock, want native hybrid, and have steady volume, use OpenSearch Serverless. If you already run Postgres or you are cost-sensitive at small/bursty scale, use Aurora pgvector. If vector search is your whole workload and you want a zero-tuning specialist (or you already use it), use Pinecone. All three are first-class Knowledge Bases options — and the supported list grows, so confirm current support in the AWS Bedrock docs.

the honest call

VIIIWhen OpenSearch Serverless is the right call (and when it is not)

The default is not automatically the right choice. Here is the honest decision guide — the situations where OpenSearch Serverless clearly earns its baseline, and the situations where another store is the better answer.

Reach for OpenSearch Serverless when one or more of these is true:

  • You want native hybrid search — Vector + BM25 keyword in one engine, with score fusion, is OpenSearch's signature advantage. Hybrid retrieval catches exact terms, product names, and acronyms that pure-vector search blurs — and on real corpora it usually beats vector-only. If hybrid matters, OpenSearch is the natural pick.
  • You want fully-managed, AWS-native search with zero setup — It is the one store Bedrock auto-creates end to end, it auto-scales, there are no nodes to size or upgrade, and it stays inside your account/Region under IAM and KMS. For a managed-RAG build that values low ops, that is the point.
  • You have steady, meaningful query volume — The standing OCU baseline amortizes when traffic is real and continuous. At production query volumes the redundancy floor is a rounding error and the managed auto-scaling earns its keep.
  • Your corpus is large, or you already run OpenSearch — It scales cleanly to large vector counts, and if you already operate OpenSearch for logs or search, reusing the engine (and the team's know-how) is sensible.

Choose a different store instead when:

the rule of thumb

Default to OpenSearch Serverless when you want native hybrid search, zero-setup managed search, and have steady volume. Override to Aurora pgvector when the workload is tiny/bursty or you already run Postgres (the usual cost-driven switch), to Pinecone for a zero-tuning vector-only specialist, or to Neptune Analytics for graph-shaped data. Match the store to the workload, not to the default.

vector store comparison

OpenSearch Serverless vs Aurora pgvector vs Pinecone — side by side

The three stores most teams weigh for Bedrock RAG, on the dimensions that actually drive the decision. All three are first-class Bedrock Knowledge Bases options. Cost notes are representative as of 2026 — confirm current pricing on the relevant AWS or vendor pricing page.

DimensionOpenSearch ServerlessAurora PostgreSQL (pgvector)Pinecone
Managed byAWS (Bedrock can auto-create)AWS (you run Aurora)Pinecone (third-party)
Setup behind BedrockLowest — one-click auto-createLow–medium (you run Aurora)Medium — external account / Marketplace
Cost shapeWarm OCUs (2 pools) + storage; standing baselineAurora Serverless v2 ACUs — scales near zero idlePinecone pricing (serverless / pods)
Cheapest at small / bursty scale?No — redundancy floor dominatesUsually yesVaries (serverless tier helps)
Hybrid search (vector + keyword)Native — one engine, BM25 + k-NNVector + SQL filters (BYO keyword)Vector + metadata filters (BYO keyword)
Metadata / SQL filteringMetadata filteringFull SQL predicates + joinsRich metadata filtering
Data stays AWS-nativeYes (your account/Region)Yes (your account/Region)No — third-party service
Pick it whenNative hybrid, zero-setup, steady volumeOn Postgres / cost-sensitive / small-burstyVector-only specialist / already use it
Default to OpenSearch Serverless for native hybrid search and zero-setup managed search at steady volume. Switch to Aurora pgvector to minimize cost at small/bursty scale or reuse Postgres (the most common override). Choose Pinecone for a zero-tuning vector-native specialist. All three are selectable as the Knowledge Bases vector store — the supported list grows, so check the AWS Bedrock docs.
before you provision a single collection
Get AWS credits that cover the OCU bill — and a partner to build the Bedrock RAG stack (you pay $0)
Get matched in 24h →
a recent match

A RAG assistant whose OpenSearch bill was the blocker — built on $0 — anonymized

inquiry · Series-A B2B SaaS, internal knowledge assistant, London
Series-A B2B SaaS, 22 people, ~15,000 internal docs across Confluence + S3, building an internal "ask our docs" assistant

Situation: The team had a working Bedrock Knowledge Bases prototype but stalled on the vector store. They had accepted the default and auto-created an OpenSearch Serverless collection per Knowledge Base while experimenting — and the first month's OpenSearch bill, driven by the standing redundancy minimum across several near-idle collections, was far larger than the inference cost and spooked the founder. They could not tell whether OpenSearch was wrong for them or whether they had simply set it up badly, and the one engineer who understood it was fully committed to the core product. They did not want to keep burning runway on a vector database while still proving the assistant out.

What CloudRoute did: CloudRoute matched them in under 24 hours to a UK AWS partner with a GenAI/ML track record. The partner did the architecture call most teams skip: confirmed OpenSearch was actually the right store (they wanted native hybrid search and had steady internal query volume), then fixed the setup — consolidated the scattered per-KB collections into a single vector-search collection to pay one baseline instead of several, right-sized the embedding dimension to cut storage and index memory, and kept retrieval tight so search OCUs did not scale needlessly. They built it on Knowledge Bases with hybrid retrieval (k-NN + BM25) into the OpenSearch index, with metadata filtering for per-team scoping. In parallel, the partner filed a Bedrock POC credit application plus an Activate Portfolio application to fund the whole thing.

Outcome: A grounded, cited internal assistant went live in under three weeks with the OpenSearch bill brought down to a single justified baseline — and the entire cost stack (OCUs, storage, embeddings, RetrieveAndGenerate inference) was covered by the approved credits, so the team paid $0 during the build and early rollout. CloudRoute's commission was paid by the partner from AWS engagement funding, not by the customer.

corpus: ~15k docs · store: OpenSearch Serverless (consolidated, hybrid) · time to live: < 3 weeks · credits secured: POC + Activate · out-of-pocket during build: $0

faq

Common questions

What is the default vector store for Amazon Bedrock Knowledge Bases?
Amazon OpenSearch Serverless is the default. When you create a Knowledge Base in the Bedrock console and accept the default, Bedrock provisions a new OpenSearch Serverless "vector search" collection, creates the vector index with the correct field mappings, wires the encryption, network, and data-access policies (granting the Knowledge Base service role read/write), and sets the k-NN field dimension to match your embeddings model — all in one click. The other supported stores are Aurora PostgreSQL with pgvector, Pinecone, Redis Enterprise Cloud, and Neptune Analytics, but OpenSearch Serverless is the zero-setup default because it is the only one Bedrock can fully auto-create.
How does OpenSearch Serverless store Bedrock vectors?
In a nested structure: collection → vector index → k-NN field. You create a "vector search" collection (the unit of capacity, security, and billing), and inside it a vector index whose mappings define a knn_vector field. That field declares the dimension (which must equal your embeddings model's output dimension), the distance metric (cosine, Euclidean/L2, or dot product), and the engine/algorithm. Alongside the vector field, the index carries a text field (the original chunk, so retrieval can return the source) and a metadata field (JSON used for filtering and citations). The index must have index.knn enabled so OpenSearch builds an approximate-nearest-neighbour structure instead of scanning every vector.
How is OpenSearch Serverless priced for vector search, and why is the bill higher than expected?
There is no per-query price. You pay for capacity in OpenSearch Compute Units (OCUs) — two separate pools, one for indexing and one for search, each billed per OCU-hour — plus S3-backed managed storage per GB-month, and standard data-transfer/KMS. The surprise is the redundancy minimum: a default collection is provisioned across Availability Zones with standby, so a small amount of capacity in both OCU pools runs continuously even at zero traffic. For a large corpus that baseline is negligible; for a tiny prototype it can dominate the bill. Levers: create a development collection without standby redundancy (roughly halves the floor), consolidate many small indexes into fewer collections (the floor is per collection), keep retrieval tight, and use a smaller embedding dimension. None of these make the baseline zero — for tiny/bursty workloads, Aurora Serverless v2 with pgvector is often cheaper. Figures are representative as of 2026; check the AWS OpenSearch Service pricing page.
What is an OCU in OpenSearch Serverless?
An OpenSearch Compute Unit (OCU) is the unit of capacity — a bundle of compute and memory — that OpenSearch Serverless bills by the hour. There are two independent OCU pools: indexing OCUs (for writing/embedding ingestion) and search OCUs (for serving queries). Serverless scales the number of OCUs in each pool up and down with load, but because the default AZ-redundant collection keeps a minimum number of OCUs warm in each pool around the clock, you pay a standing baseline regardless of traffic. That continuous floor is the main driver of OpenSearch Serverless cost at small scale.
Should I let Bedrock auto-create the OpenSearch collection or build it myself?
Use auto-create (the quickstart) unless you have a concrete reason not to. Bedrock provisions the collection, the vector index with correct field mappings, the encryption/network/data-access policies, and the matching k-NN dimension in one step — which removes the three most common setup failures (dimension mismatch, access-policy gaps, and field-name mismatch). Build it yourself when you need a specific collection name or KMS key, a VPC-only network policy, a non-default k-NN engine or HNSW parameters, a shared collection across Knowledge Bases, or you are wiring a DIY pipeline that talks to OpenSearch directly. In the bring-your-own case you then choose "use an existing vector store" and supply the collection ARN, index name, and the vector/text/metadata field names.
How do I tune OpenSearch k-NN for better recall or lower cost?
The knobs live on the k-NN field. Engine: FAISS (broad algorithms, supports quantization and IVF — pick it at scale), Lucene (built in, no extra library, a fine default), or nmslib (legacy). Algorithm: HNSW by default (best recall at low latency, memory-heavy) or IVF for very large corpora where HNSW memory becomes the constraint. Dimension: must match your embeddings model and is fixed per index; a smaller dimension (where the model supports it) cuts storage and memory at a modest recall cost. Distance metric: cosine for most text embeddings. HNSW parameters: m (links per node — more recall, more memory), ef_construction (graph build quality — slower indexing), and ef_search (query-time breadth — more recall, more latency; tunable without rebuilding). Practical order: leave defaults, then raise ef_search first (no rebuild), then m/ef_construction (requires re-indexing) only if an evaluation set shows a recall gap.
OpenSearch Serverless vs Aurora pgvector vs Pinecone — which should I use for Bedrock RAG?
All three are first-class Bedrock Knowledge Bases stores. Use OpenSearch Serverless when you want native hybrid search (vector + BM25 in one engine), fully-managed AWS-native search with zero setup, and you have steady query volume — its weakness is the standing OCU baseline that makes it pricey at tiny scale. Use Aurora PostgreSQL with pgvector when you already run Postgres or you are cost-sensitive at small/bursty scale: vectors sit beside relational data (SQL filters and joins), and Aurora Serverless v2 scales near zero when idle, so it is frequently the cheapest option — the most common reason teams override the OpenSearch default. Use Pinecone when vector search is your whole workload and you want a zero-tuning vector-native specialist, or your team already standardizes on it — the trade-off is it is a third-party service so your vectors leave AWS-native stores, and you bring your own keyword layer for hybrid.
When is OpenSearch Serverless NOT the right vector store for Bedrock?
When your workload is tiny or bursty, the standing redundancy minimum means you mostly pay for idle capacity — Aurora Serverless v2 with pgvector scales much closer to zero and is usually cheaper, which is the most common reason to switch the default. Also choose differently if you already run Postgres and want SQL-joined metadata (pgvector), if vector search is your entire workload and you want a zero-tuning specialist (Pinecone), or if your data is graph-shaped and relationships improve answers (Neptune Analytics for GraphRAG). Match the store to the workload shape rather than accepting the default — OpenSearch shines at steady volume with hybrid search, not at a handful of documents queried occasionally.
Does OpenSearch Serverless support hybrid (vector + keyword) search for RAG?
Yes — and it is its signature advantage. Because OpenSearch is a full search engine, a single index can serve both k-NN vector search and BM25 keyword search, and you can fuse the scores into one hybrid result. Hybrid retrieval routinely beats pure-vector retrieval on real corpora because it catches exact terms, product names, acronyms, and codes that embeddings sometimes blur, while still capturing semantic matches. Vector-only stores (e.g. Pinecone) require you to add a separate keyword layer to get the same effect, and Aurora pgvector gives you vector plus SQL filtering rather than a tuned BM25 engine. If native hybrid search matters to your retrieval quality, OpenSearch Serverless is the natural pick.

Build Bedrock RAG on OpenSearch — funded

Whatever the vector store would cost — the OpenSearch OCU baseline, storage, embeddings, and inference — AWS credits can cover it. CloudRoute routes you to the right credit pool (Activate up to $100K, Bedrock POC $10K–$50K, GenAI Accelerator up to $1M) and a vetted AWS partner to size the collection, tune the k-NN index, pick the right store for your workload, and ship the Bedrock retrieval. Customer pays $0.

matched within< 24h
GenAI credit ceilingup to $1M
cost to you$0
Bedrock OpenSearch Vector Search — setup, OCU cost, tuning · CloudRoute