A complete, neutral reference for using Amazon OpenSearch Serverless as the vector store behind Amazon Bedrock RAG in 2026: what it is, how it holds vectors (collection → vector index → k-NN), the exact setup whether Bedrock auto-creates it or you build it by hand, why Knowledge Bases pick it by default, the OCU cost model and the redundancy minimum that surprises everyone, how to tune the k-NN engine and dimensions, and how it stacks up against Aurora pgvector and Pinecone — plus how AWS credits make the whole build $0.
The phrase bundles two services. Amazon Bedrock is the managed foundation-model layer that runs your RAG; Amazon OpenSearch Serverless is the vector database that holds the embeddings Bedrock searches. "Bedrock OpenSearch vector search" is the pattern where OpenSearch Serverless is the vector store sitting behind Bedrock retrieval — the default arrangement for Bedrock Knowledge Bases.
A quick refresher on why a vector store exists at all. In retrieval-augmented generation, every chunk of your documents is turned into an embedding — a list of a few hundred to a couple of thousand numbers that captures the chunk's meaning, so that semantically similar text lands near it in vector space. At question time, the question is embedded the same way and the system finds the chunks whose vectors are nearest to the question vector. That "find the nearest vectors, fast, across millions of them" job is exactly what a vector store does, and it is the component Bedrock does not abstract away: you choose it, you can see it, and you pay for it directly.
Amazon OpenSearch Serverless is the on-demand, auto-scaling flavour of Amazon OpenSearch Service (the managed fork of Elasticsearch/OpenSearch). Instead of provisioning and sizing a cluster of nodes, you create a collection and AWS runs the underlying capacity for you, scaling it up and down with load. OpenSearch has long supported a k-NN (k-nearest-neighbour) plugin for vector search, and the serverless form exposes a dedicated "vector search" collection type tuned for exactly this workload — storing embeddings and answering approximate-nearest-neighbour queries.
Put the two together and the shape is simple. Bedrock owns the model work — embedding your chunks at ingest, embedding each query, and (in Knowledge Bases) running the whole retrieve-and-generate loop. OpenSearch Serverless owns the storage and search — holding every vector plus its source text and metadata, and returning the closest matches when Bedrock queries it. The connection between them is a vector index inside the collection, with field mappings that tell OpenSearch which field is the k-NN vector, what dimension it is, and how to search it.
One thing worth saying up front, because it frames the rest of this page: OpenSearch Serverless is powerful and is the path of least resistance on AWS, but it is not the cheapest option at small scale. Its serverless capacity carries a standing baseline cost. That trade — managed, scalable, hybrid-capable, but with a cost floor — is the single most important thing to understand before you pick it, and §V covers it in full.
Amazon OpenSearch Serverless is a fully-managed, auto-scaling vector database that, as a "vector search" collection, stores your Bedrock embeddings as k-NN vector fields and answers approximate-nearest-neighbour queries — making it the default vector store behind Bedrock Knowledge Bases and a common store for DIY Bedrock RAG.
Before the setup steps make sense, it helps to know the three nested objects OpenSearch Serverless uses to store vectors. Everything you configure — and everything you pay for — hangs off these three.
From outermost to innermost, the structure is collection → index → k-NN field. A Knowledge Base maps onto exactly this: one collection holds one (or more) vector index, and the index has one k-NN field where the embeddings live, plus companion fields for the chunk text and its metadata.
A collection is the top-level container and the unit of capacity, security, and billing in OpenSearch Serverless. When you create one for RAG you choose the "vector search" collection type (the other types are "time series" and "search"), which configures it for the vector workload. The collection is governed by three policies you must have in place for it to work: an encryption policy (a KMS key — AWS-owned or your own), a network policy (public access or access via a VPC endpoint), and one or more data access policies (which IAM principals — including the Bedrock Knowledge Base service role — may read and write which indexes). Missing or mis-scoped policies are the most common reason a setup "succeeds" but then fails to ingest or query.
Inside the collection lives a vector index — the searchable structure that actually holds documents. Its mappings define the fields. The one that matters most is the k-NN vector field (type knn_vector): you declare its dimension (which must exactly equal the output dimension of your embeddings model), the distance metric (cosine, Euclidean/L2, or dot product), and the engine/algorithm settings covered in §VI. Alongside it the index carries a text field holding the original chunk so retrieval can return the source passage, and a metadata field holding the JSON metadata used for filtering and citations. The index must also have index.knn enabled so the k-NN engine builds an approximate-nearest-neighbour structure rather than scanning every vector.
At query time the path is: Bedrock embeds the question, then issues a k-NN query against the index ("return the k documents whose vector field is nearest to this query vector," optionally with a metadata filter). OpenSearch walks its approximate-nearest-neighbour graph, returns the top matches with their text, metadata, and a similarity score, and Bedrock assembles them into the prompt. Because OpenSearch is a full search engine, the same index can also answer a BM25 keyword query, which is what makes native hybrid search (vector + keyword in one engine) possible — a genuine advantage over vector-only stores, covered in §VI and §VII.
Collection (vector-search type, the billing + security unit) → vector index (the searchable structure, with field mappings) → k-NN vector field (type knn_vector, with a dimension matching your embeddings model, a distance metric, and engine/algorithm settings). Get the dimension and the access policy right and most setup problems disappear.
There are two ways the collection and index come into existence: Bedrock creates them for you (the quickstart, and what most teams should use), or you create them yourself first and point a Knowledge Base — or a DIY pipeline — at them. They produce the same end state; they differ in how much control you keep.
When you create a Knowledge Base in the Bedrock console and accept the default vector store, Bedrock provisions a new OpenSearch Serverless vector collection, creates the vector index with the correct field mappings, wires the encryption, network, and data-access policies (granting the Knowledge Base service role read/write), and sets the k-NN field's dimension to match the embeddings model you chose. It is genuinely one decision — "Quick create a new vector store" — and you get a working store in a couple of minutes with nothing to configure. This is the right path for the large majority of builds: it removes the part of setup most likely to be misconfigured (policies and field mappings).
If you need control — a specific collection name, your own KMS key, a VPC-only network policy, a non-default k-NN engine or HNSW parameters, or you are wiring a DIY pipeline rather than a Knowledge Base — you create the pieces yourself. The order is: (1) create a vector-search collection; (2) attach an encryption policy, a network policy, and a data-access policy that grants the Bedrock Knowledge Base role (or your application's IAM role) the needed actions on the collection and index; (3) create the vector index with a knn_vector field of the right dimension, a distance metric, your chosen engine/algorithm, plus the text and metadata fields; (4) when creating the Knowledge Base, choose "use an existing vector store" and supply the collection ARN, index name, and the names of the vector / text / metadata fields. For a DIY pipeline you instead write and query the index directly via the OpenSearch API or SDK.
Three issues cause most failed setups. Dimension mismatch: the k-NN field dimension must equal the embeddings model's output dimension exactly (e.g. 1,024 for Titan Text Embeddings v2 at its default size, 1,536 for the v1 generation) — a mismatch fails ingestion. Access-policy gaps: the Bedrock service role must appear in a data-access policy with read/write on the index, and the network policy must allow Bedrock to reach the collection — miss either and ingestion or query silently fails. Field-name mismatch: when you bring your own index, the vector, text, and metadata field names you give Bedrock must match the mappings you created. The auto-create path exists precisely because it eliminates all three.
Use auto-create (Path A) unless you have a concrete reason not to — it removes the three most error-prone steps (policies, field mappings, dimension). Use bring-your-own (Path B) when you need a specific KMS key, a VPC-only network policy, a non-default k-NN engine/algorithm, a shared collection, or you are building a DIY pipeline that talks to OpenSearch directly.
Knowledge Bases supports several vector stores — OpenSearch Serverless, Aurora PostgreSQL with pgvector, Pinecone, Redis Enterprise Cloud, and Neptune Analytics — yet OpenSearch Serverless is the one offered as the default, one-click option. The reasons are mostly about removing friction, and they are worth understanding so you know when to override the default.
The first reason is zero setup. OpenSearch Serverless is the only supported store Bedrock can fully provision for you on the spot — collection, index, mappings, and policies — without you touching another service or creating an external account. For a managed-RAG product whose whole pitch is "point it at your data and go," a vector store that needs no prior provisioning is the natural default.
The second is that it is fully managed and AWS-native. There are no nodes to size, no version upgrades to run, and it auto-scales with load, so the managed promise of Knowledge Bases extends cleanly to the storage layer. It also lives inside your account and Region, inherits IAM and KMS, and keeps your data within AWS — which matters for the security posture most enterprises expect from Bedrock.
The third is capability. OpenSearch is a full search engine, so a single index can serve both vector (k-NN) and keyword (BM25) search, making native hybrid retrieval available without a second system. It scales to large corpora, supports metadata filtering for precision and access control, and gives a single, consistent place for both the bulk corpus and any keyword-heavy lookups. For a default that has to be "good enough for most production," that breadth is the point.
The honest counterpoint — and the reason the default is not always right — is cost at small scale. Because the default vector-search collection is provisioned for redundancy and carries a standing OCU baseline, a brand-new Knowledge Base with a few hundred documents still incurs a non-trivial monthly minimum even when idle. For a tiny prototype or a bursty internal tool, that floor can be larger than the rest of the bill combined — which is exactly why teams watching cost often switch the default to Aurora pgvector. The next section is the full cost picture.
OpenSearch Serverless is the line item that surprises teams, so it is worth getting exactly right. There is no per-query price; you pay for capacity measured in OpenSearch Compute Units (OCUs) plus storage, and there is a standing minimum you pay even at zero traffic. Figures here are representative as of 2026 to show the shape of the bill — check the AWS OpenSearch Service pricing page for current rates.
Capacity is measured in OpenSearch Compute Units (OCUs) — a bundle of compute and memory. Crucially there are two separate OCU pools: one for indexing (writing and embedding ingestion) and one for search (serving queries). Each is billed per OCU-hour, and serverless scales the number of OCUs in each pool up and down with load. On top of compute you pay for managed storage (vectors and indexes are persisted to S3-backed storage, billed per GB-month) and the usual data-transfer and KMS costs. Because OCUs are billed by the hour they are running, the bill is driven by how much capacity is kept warm, not by a per-request charge.
Here is the part that catches people. By default, an OpenSearch Serverless collection is provisioned for redundancy — capacity is spread across multiple Availability Zones with standby, so a production collection has a minimum OCU floor that runs continuously (a baseline for indexing and a baseline for search, kept warm even when nothing is happening). That floor is the "redundancy minimum": you are paying for a small always-on amount of capacity in two pools across AZs, around the clock, regardless of whether a single query was served. For a large corpus this baseline is a rounding error; for a tiny prototype it can be the dominant cost — and it is the most common reason a developer is startled by the first month's OpenSearch bill.
You have levers, but none make the baseline zero. You can create a collection without standby redundancy (a "development/test" posture) to roughly halve the OCU floor — appropriate for non-production workloads, at the cost of the HA guarantee. You can consolidate multiple indexes into fewer collections so you pay one baseline instead of several (the floor is per collection, so a handful of small Knowledge Bases each in its own auto-created collection multiplies the minimum). You can keep the corpus and the returned context tight so search OCUs do not scale up unnecessarily. And you can choose a smaller embedding dimension to cut storage and the memory footprint of the index. But the structural fact remains: OpenSearch Serverless has a standing cost, and if your workload is small or bursty, Aurora Serverless v2 with pgvector — which can scale its capacity much closer to zero when idle — is frequently cheaper.
The way to think about it: OpenSearch Serverless is priced like always-on managed capacity, not like a per-request API. That is great when you have steady, meaningful query volume and want hybrid search and zero ops — the baseline amortizes and the managed scaling earns its keep. It is poor value when you have a handful of documents and a query every few minutes, where you are mostly paying the redundancy floor to sit idle. Match the store to the workload shape, not to the default.
| Cost component | Unit | When you pay | The gotcha / lever |
|---|---|---|---|
| Indexing OCUs | Per OCU-hour | Ingestion / re-embedding + a warm baseline | A minimum floor runs continuously, even idle |
| Search OCUs | Per OCU-hour | Serving queries + a warm baseline | Separate pool with its own continuous minimum |
| Redundancy minimum | Floor across the two pools | Always (default = AZ-redundant) | The surprise line; halve it with a no-standby dev collection |
| Managed storage | Per GB-month (S3-backed) | Continuously, with corpus size | Smaller embedding dimension = less storage |
| Per collection, not per index | Baseline × number of collections | Always | Many tiny KBs = many baselines; consolidate |
| Data transfer / KMS | Standard AWS rates | With usage | Usually minor relative to OCUs |
OpenSearch Serverless bills warm OCU capacity (two pools: indexing + search) + storage, and the default AZ-redundant collection carries a standing minimum you pay even at zero traffic. Great for steady volume; expensive for a tiny prototype — where pgvector on Aurora Serverless v2 is usually cheaper. All of it is AWS-credit-eligible.
Most teams accept the defaults and are fine. But OpenSearch exposes real knobs on the k-NN field, and when recall, latency, memory, or cost matter, these are the levers. They live on the vector field's mapping, so the important ones are set at index-creation time.
The k-NN field has an engine that implements the approximate-nearest-neighbour index. The common choices are FAISS (Meta's library — broad algorithm support including both HNSW and IVF, supports vector quantization to shrink memory, and the usual pick for large vector workloads), Lucene (the engine built into OpenSearch — no extra native library, supports HNSW, integrates cleanly with filtering, and is a fine default for many corpora), and nmslib (the original HNSW implementation, still available but generally superseded by FAISS for new builds). For most Bedrock RAG, the auto-created index uses a sensible default; reach for FAISS when you need quantization or IVF at scale.
HNSW (Hierarchical Navigable Small World) is the default approximate-nearest-neighbour algorithm and the right choice for the overwhelming majority of RAG corpora: it gives excellent recall at low query latency, with the trade-off that the graph lives largely in memory (so memory scales with vector count and dimension). IVF (Inverted File index) partitions vectors into clusters and searches only the nearest clusters; it uses less memory and can be the better fit for very large corpora (tens of millions of vectors and up) where HNSW's memory footprint becomes the binding constraint, at some cost to recall/latency tuning effort. Rule of thumb: stay on HNSW until memory cost forces you to evaluate IVF (often paired with FAISS quantization).
The dimension of the k-NN field must equal your embeddings model's output dimension and is fixed for the life of the index — change the embeddings model (or its dimension) and you re-create the index and re-embed. Where the model supports it, a smaller dimension (e.g. choosing 512 or 256 on a model that allows it) cuts storage and the in-memory index size, speeding search and lowering cost at a modest recall cost — a real lever at scale. The distance metric (cosine similarity, Euclidean/L2, or inner/dot product) should match what your embeddings model was trained for; cosine is the common default for text embeddings.
Three HNSW parameters trade recall against memory and speed. m (the number of bi-directional links per graph node) raises recall and memory as it grows — typical values sit in the 16–48 range. ef_construction (how wide the search is while building the graph) improves index quality at the cost of slower indexing; a higher value builds a better graph once. ef_search (how wide the search is at query time) trades recall for latency on each query and can be tuned without rebuilding. The practical approach: leave the defaults until an evaluation set shows a recall gap, then raise ef_search first (cheap, no rebuild), then m/ef_construction (requires re-indexing) if you need more.
| Knob | Default-ish choice | Raise it to… | Cost of raising | Set at |
|---|---|---|---|---|
| Engine | Lucene or FAISS | Get quantization / IVF (FAISS) | Native library footprint | Index creation |
| Algorithm | HNSW | Cut memory at huge scale (IVF) | Recall/latency tuning effort | Index creation |
| Dimension | Match embeddings model | (Lower it) cut storage + memory | Modest recall loss | Index creation (fixed after) |
| Distance metric | Cosine (for text) | Match the model's training | — | Index creation |
| m (HNSW) | 16 | Higher recall | More memory | Index creation |
| ef_construction | Engine default | Better graph quality | Slower indexing | Index creation |
| ef_search | Engine default | Higher recall per query | Higher query latency | Query time (no rebuild) |
These are the three vector stores most teams weigh for Bedrock RAG. They are all valid; the right answer depends on the shape of your workload, what you already run, and how cost-sensitive you are at your current scale. Here is the honest trade-off, the same dimensions a real architecture review uses.
Amazon OpenSearch Serverless is the AWS-native default. Its standout advantages are zero setup behind Bedrock (auto-created), native hybrid search (vector + BM25 in one engine, which routinely beats vector-only retrieval on real corpora), full management with auto-scaling, and clean scale to large corpora. Its weakness is the standing OCU baseline covered in §V — it is rarely the cheapest option at small or bursty scale, and you pay for redundant capacity even when idle.
Amazon Aurora PostgreSQL with pgvector is the pragmatic choice when you already run Postgres or want to minimize both new infrastructure and cost at low volume. Vectors live in the same database as your relational data, so you can filter with SQL predicates and join to business tables in one query, and Aurora Serverless v2 can scale capacity down close to zero when idle — which is exactly why it is frequently the cheapest store for a prototype or a low-traffic internal tool. The trade-offs: hybrid search is not as turnkey as OpenSearch's single-engine BM25+vector, and a purpose-built vector engine pulls ahead at extreme scale or very high query concurrency. It is a fully supported Knowledge Bases store, so you can pick it from the same dropdown that offers OpenSearch.
Pinecone is a managed, vector-native database (third-party, selectable in Knowledge Bases and available via the AWS Marketplace). Because it does one thing, it offers strong vector performance, serverless scaling, and rich metadata filtering with minimal tuning — attractive when vector search is your core workload and you want a specialist rather than a general engine, or when your team already standardizes on it. The trade-offs are that it is a separate vendor (your vectors leave AWS-native services; Marketplace billing can route through your AWS invoice) and that, being vector-only, you bring your own keyword layer if you want hybrid search.
A useful way to collapse the decision: if you want it to just work behind Bedrock, want native hybrid, and have steady volume, use OpenSearch Serverless. If you already run Postgres or you are cost-sensitive at small/bursty scale, use Aurora pgvector. If vector search is your whole workload and you want a zero-tuning specialist (or you already use it), use Pinecone. All three are first-class Knowledge Bases options — and the supported list grows, so confirm current support in the AWS Bedrock docs.
The default is not automatically the right choice. Here is the honest decision guide — the situations where OpenSearch Serverless clearly earns its baseline, and the situations where another store is the better answer.
Reach for OpenSearch Serverless when one or more of these is true:
Choose a different store instead when:
Default to OpenSearch Serverless when you want native hybrid search, zero-setup managed search, and have steady volume. Override to Aurora pgvector when the workload is tiny/bursty or you already run Postgres (the usual cost-driven switch), to Pinecone for a zero-tuning vector-only specialist, or to Neptune Analytics for graph-shaped data. Match the store to the workload, not to the default.
The three stores most teams weigh for Bedrock RAG, on the dimensions that actually drive the decision. All three are first-class Bedrock Knowledge Bases options. Cost notes are representative as of 2026 — confirm current pricing on the relevant AWS or vendor pricing page.
| Dimension | OpenSearch Serverless | Aurora PostgreSQL (pgvector) | Pinecone |
|---|---|---|---|
| Managed by | AWS (Bedrock can auto-create) | AWS (you run Aurora) | Pinecone (third-party) |
| Setup behind Bedrock | Lowest — one-click auto-create | Low–medium (you run Aurora) | Medium — external account / Marketplace |
| Cost shape | Warm OCUs (2 pools) + storage; standing baseline | Aurora Serverless v2 ACUs — scales near zero idle | Pinecone pricing (serverless / pods) |
| Cheapest at small / bursty scale? | No — redundancy floor dominates | Usually yes | Varies (serverless tier helps) |
| Hybrid search (vector + keyword) | Native — one engine, BM25 + k-NN | Vector + SQL filters (BYO keyword) | Vector + metadata filters (BYO keyword) |
| Metadata / SQL filtering | Metadata filtering | Full SQL predicates + joins | Rich metadata filtering |
| Data stays AWS-native | Yes (your account/Region) | Yes (your account/Region) | No — third-party service |
| Pick it when | Native hybrid, zero-setup, steady volume | On Postgres / cost-sensitive / small-bursty | Vector-only specialist / already use it |
Situation: The team had a working Bedrock Knowledge Bases prototype but stalled on the vector store. They had accepted the default and auto-created an OpenSearch Serverless collection per Knowledge Base while experimenting — and the first month's OpenSearch bill, driven by the standing redundancy minimum across several near-idle collections, was far larger than the inference cost and spooked the founder. They could not tell whether OpenSearch was wrong for them or whether they had simply set it up badly, and the one engineer who understood it was fully committed to the core product. They did not want to keep burning runway on a vector database while still proving the assistant out.
What CloudRoute did: CloudRoute matched them in under 24 hours to a UK AWS partner with a GenAI/ML track record. The partner did the architecture call most teams skip: confirmed OpenSearch was actually the right store (they wanted native hybrid search and had steady internal query volume), then fixed the setup — consolidated the scattered per-KB collections into a single vector-search collection to pay one baseline instead of several, right-sized the embedding dimension to cut storage and index memory, and kept retrieval tight so search OCUs did not scale needlessly. They built it on Knowledge Bases with hybrid retrieval (k-NN + BM25) into the OpenSearch index, with metadata filtering for per-team scoping. In parallel, the partner filed a Bedrock POC credit application plus an Activate Portfolio application to fund the whole thing.
Outcome: A grounded, cited internal assistant went live in under three weeks with the OpenSearch bill brought down to a single justified baseline — and the entire cost stack (OCUs, storage, embeddings, RetrieveAndGenerate inference) was covered by the approved credits, so the team paid $0 during the build and early rollout. CloudRoute's commission was paid by the partner from AWS engagement funding, not by the customer.
corpus: ~15k docs · store: OpenSearch Serverless (consolidated, hybrid) · time to live: < 3 weeks · credits secured: POC + Activate · out-of-pocket during build: $0
Whatever the vector store would cost — the OpenSearch OCU baseline, storage, embeddings, and inference — AWS credits can cover it. CloudRoute routes you to the right credit pool (Activate up to $100K, Bedrock POC $10K–$50K, GenAI Accelerator up to $1M) and a vetted AWS partner to size the collection, tune the k-NN index, pick the right store for your workload, and ship the Bedrock retrieval. Customer pays $0.