A neutral, numbers-first walkthrough of the six real options — OpenSearch Serverless, Aurora PostgreSQL with pgvector, Pinecone, Redis / MemoryDB, Neptune Analytics, and S3 Vectors — scored on cost (including the OpenSearch Serverless OCU-minimum trap), latency, scale, metadata filtering, operational burden, and how each plugs into Bedrock Knowledge Bases. Ends in a decision table by corpus size and query volume.
Before comparing six products, fix the mental model. A vector database does one core job, and once you understand that job, the differences between the options stop looking like a feature checklist and start looking like what they really are: different points on a cost, latency, and operational-burden curve.
A vector database stores embeddings — high-dimensional numeric arrays (commonly 256 to 1,536 dimensions, depending on the embedding model) that represent the meaning of a chunk of text, an image, or other content. Its core job is approximate nearest-neighbor (ANN) search: given a query vector, return the k most similar stored vectors, fast, without comparing against every vector in the corpus by brute force. Almost every implementation does this with a graph index called HNSW (Hierarchical Navigable Small World) or an IVF (inverted-file) variant. The algorithm is largely commoditized; what differs is how each product packages, prices, and operates it.
In a retrieval-augmented generation (RAG) system, the vector database is the retrieval half. You embed your documents once (offline), store the vectors, and at query time you embed the user question, ask the vector store for the most similar chunks, and pass those chunks to the model as grounding context. The vector store never talks to the language model directly — it is plumbing that decides which of your documents the model gets to read. That makes it both load-bearing (bad retrieval is the number-one cause of bad RAG answers) and, in cost terms, a line item that runs 24/7 whether or not anyone is querying.
Two properties beyond raw similarity search separate a toy from a production store. First, metadata filtering: the ability to constrain a search to vectors matching structured attributes — tenant ID, document type, date range, access-control labels — ideally as a pre-filter (narrow the candidate set before the ANN search) rather than a post-filter (search first, then throw away results), because post-filtering can return too few results or none. Multi-tenant SaaS, permissioned document search, and any "search only my org's data" requirement live or die on filtering quality. Second, hybrid search: combining dense vector similarity with sparse keyword (BM25) matching, which materially improves recall on queries with exact terms, product codes, names, or acronyms that pure semantic search fumbles.
Hold those two properties — filtering and hybrid search — alongside three cost-and-ops realities, and you have the entire decision. The realities: a vector index is memory-hungry (HNSW graphs largely live in RAM, so cost scales with corpus size whether you query or not); it usually runs continuously (idle cost is real); and someone has to operate it (provisioning, scaling, patching, backups) unless you pay a managed service to do that for you. Every option below is a different bargain across those three. The rest of this guide prices each bargain.
These are the vector stores a team building on AWS actually chooses between in 2026. Each gets an honest profile: what it is, where it shines, where it hurts, and the cost mechanic that matters. Read this section for the shape of each option; the next sections price and rank them.
A note on scope: this guide covers the stores that integrate cleanly into an AWS-native RAG stack and that Amazon Bedrock Knowledge Bases can use as a backend (with one deliberate exception, Redis, noted below). Self-hosting a dedicated vector engine like Milvus, Weaviate, Qdrant, or Chroma on EC2/EKS is a viable seventh path, but it trades managed simplicity for full operational ownership and is justified by the same narrow conditions as any self-host decision — extreme scale economics or a specific feature need — so it is mentioned but not the focus.
What it is: the serverless tier of Amazon OpenSearch Service, with a vector engine that does ANN (HNSW/FAISS) plus full-text (BM25) search in one system — so it natively supports hybrid search and rich metadata filtering. It is the default vector store Bedrock Knowledge Bases provisions if you let it pick.
Where it shines: scale and hybrid search. When your corpus is large (millions of vectors), you need keyword + vector retrieval together, you want sophisticated filtering and aggregations, and you would rather not run a cluster, OpenSearch Serverless is the strong choice. It autoscales capacity for you.
Where it hurts — the OCU trap: it bills in OpenSearch Compute Units (OCUs) and enforces a floor of 2 OCUs for a vector/search collection — one for indexing, one for search — even if your index holds a thousand vectors. At roughly $0.24 per OCU-hour, that floor is about $0.48/hour, which is roughly $350/month per OCU and therefore on the order of $700/month minimum before you store a single byte of meaningful data, plus storage. (A redundant/production configuration pushes the OCU count higher.) This is the single most common cost surprise in AWS RAG. It is a great large-scale option and a terrible small-POC option.
What it is: Amazon Aurora PostgreSQL (or Amazon RDS for PostgreSQL) with the open-source pgvector extension, which adds a vector column type and HNSW/IVFFlat indexes to ordinary Postgres. Your embeddings live in a table next to your relational data, and metadata filtering is just a SQL WHERE clause.
Where it shines: teams that already run Postgres, small-to-mid corpora, and anyone who wants vectors and transactional/relational data in one system with one set of operational tools. Filtering is first-class SQL, joins to your existing tables are trivial, and there is no new datastore to learn. Aurora Serverless v2 scales capacity (in ACUs) with load and can scale down when idle, which keeps small-workload cost low. Bedrock Knowledge Bases supports Aurora as a backend.
Where it hurts: at very large scale (tens of millions of vectors and high QPS) a general-purpose relational engine is not as specialized as OpenSearch or a dedicated vector DB — index build times grow, memory pressure rises, and you may tune and shard harder than you would like. pgvector is excellent up to the low millions; beyond that, evaluate whether you have outgrown it.
What it is: a fully managed, purpose-built vector database delivered as SaaS, available through AWS Marketplace and deployable in AWS regions. It is the "I never want to think about vector-database operations" option, and Bedrock Knowledge Bases supports it as a backend.
Where it shines: developer experience and zero ops. Its serverless tier separates storage from compute and can scale to very large corpora with usage-based pricing, so you are not paying for idle capacity the way a fixed OCU floor charges you. For teams that value speed-to-ship and have no appetite for database administration, it is compelling.
Where it hurts: it is a third-party vendor, not an AWS-native service, so it sits outside your AWS bill, your consolidated AWS commitments and credits, your IAM model, and (depending on configuration) your VPC boundary — which can matter for procurement, data-residency, and security reviews. You are also subject to its pricing model rather than AWS's. The convenience is real; so is the dependency.
What it is: vector similarity search inside an in-memory data store. Amazon MemoryDB is a Redis-compatible, durable, in-memory database with native vector search; Amazon ElastiCache (Redis OSS) also supports vector search. Everything lives in RAM.
Where it shines: latency. When you need single-digit-millisecond retrieval — real-time recommendations, semantic caching, low-latency personalization in a hot request path — an in-memory store is hard to beat. If you already run Redis/MemoryDB for caching or session state, adding vector search reuses infrastructure you operate.
Where it hurts: cost per vector. RAM is the most expensive place to store data, so holding a large corpus entirely in memory is expensive, and it scales with corpus size regardless of query rate. It is a precision instrument for latency-critical, often smaller or hot-subset workloads — not the economical home for a large general RAG corpus. Note: it is not a first-class Bedrock Knowledge Bases backend the way OpenSearch, Aurora, Pinecone, and Neptune Analytics are, so RAG-on-Knowledge-Bases teams typically choose it only for custom retrieval paths.
What it is: the analytics engine of Amazon Neptune (AWS's graph database) with built-in vector search, so you can combine graph traversal with vector similarity in one query. Bedrock Knowledge Bases supports it, and it is the backbone of GraphRAG patterns.
Where it shines: when relationships matter as much as similarity. GraphRAG — retrieving not just semantically similar chunks but the entities and relationships connected to them — improves answer quality on highly interconnected knowledge (org structures, regulations, supply chains, knowledge graphs). If your data is genuinely a graph and you want retrieval that exploits that, Neptune Analytics is the specialized fit.
Where it hurts: it is a specialized, heavier-weight engine, not a general-purpose vector store, and it carries graph-database cost and complexity. For plain "find similar chunks" RAG with no relationship structure, it is overkill — you are paying for a graph engine you are not using.
What it is: the newest entrant — native vector storage and similarity search built directly into Amazon S3 via "vector buckets," with sub-second query latency at object-storage prices. It brings ANN search to the cheapest durable storage AWS offers, and integrates with Bedrock Knowledge Bases and OpenSearch.
Where it shines: cost at scale for latency-tolerant workloads. Because vectors sit in S3 rather than in RAM or a running cluster, the at-rest cost is dramatically lower and there is effectively no idle-compute floor — you pay for storage and queries. For very large corpora, archival or infrequently queried embeddings, or cost-sensitive RAG where sub-second (not single-digit-millisecond) latency is acceptable, it changes the economics. A common pattern: keep hot vectors in OpenSearch and tier cold vectors to S3 Vectors.
Where it hurts: it trades latency and feature richness for price. It targets sub-second, not in-memory-grade, latency, and it is younger than the alternatives, so the most demanding real-time and advanced-hybrid-search workloads still point to OpenSearch or Redis. Validate its filtering and latency against your specific workload before standardizing on it.
OpenSearch Serverless for scale + hybrid search (mind the ~$700/mo OCU floor). Aurora pgvector for Postgres shops and small-to-mid corpora. S3 Vectors for huge, cost-sensitive, latency-tolerant corpora. Pinecone to buy your way out of database ops. Redis/MemoryDB when single-digit-ms latency is non-negotiable. Neptune Analytics when relationships matter as much as similarity (GraphRAG).
Cost is where vector-database choices go wrong most often, because the pricing models are not comparable apples-to-apples: one bills fixed compute units, one bills database capacity that scales with load, one bills storage plus queries, one bills as a SaaS subscription. Translate them all into "what does this cost for my corpus and my traffic" and the decision usually makes itself.
The structural divide is between floor-based and usage-based pricing. A floor-based store (OpenSearch Serverless, with its 2-OCU minimum) costs roughly the same whether you store a thousand vectors or a million low-volume ones — the floor dominates small workloads. A usage-based store (S3 Vectors; Aurora Serverless v2 scaling down; Pinecone serverless) lets cost track actual storage and queries, so small workloads stay cheap and you pay more only as you grow. The single biggest cost mistake in AWS RAG is putting a small or early-stage workload on a floor-based store.
OpenSearch Serverless bills in OpenSearch Compute Units. A vector search collection requires a minimum of 2 OCUs — one for indexing and one for search — and you are billed for that minimum continuously, 24/7, regardless of how little data you hold or how few queries you run. At an on-demand rate on the order of $0.24 per OCU-hour, two OCUs run roughly $0.48/hour, which compounds to approximately $345 per OCU per month and therefore on the order of ~$700/month as a practical floor, before storage and before any redundancy. A standard production deployment with redundancy provisions additional OCUs, raising the floor further.
Why this ambushes teams: a developer wires up Bedrock Knowledge Bases, accepts the default vector store (OpenSearch Serverless), ships a small RAG proof-of-concept with a few thousand chunks, and is then surprised by a four-figure monthly bill for a workload that has barely any data or traffic. The store is excellent — at the scale it was designed for. For a POC or a small production corpus, that floor is pure waste.
The honest guidance: do not default to OpenSearch Serverless for small or early workloads. Use Aurora pgvector (scales down with Serverless v2) or S3 Vectors (storage + query pricing, no compute floor) while small, and move to OpenSearch Serverless when your corpus, your QPS, and your need for hybrid search at scale actually justify the floor — at which point the OCU model is reasonable and autoscales cleanly.
Aurora PostgreSQL pgvector: you pay for Aurora — compute (provisioned instances, or Serverless v2 in ACUs that scale with load and down toward a small floor when idle), storage, and I/O. For a small RAG corpus this can be a fraction of the OpenSearch floor, especially on Serverless v2; for a large one, a sized Aurora cluster is a predictable, well-understood cost. You are paying for a database you may already run.
S3 Vectors: storage at S3-class prices plus per-query/request charges, with no continuously running compute floor. This is the cheapest at-rest option for large or cold corpora by a wide margin — its whole reason for existing is to drop the cost of storing vectors at scale when you can tolerate sub-second rather than in-memory latency.
Pinecone: a SaaS subscription with usage-based (serverless) pricing that scales with stored vectors and operations, billed by the vendor (payable via AWS Marketplace). You are buying away operational work; price it against the engineering time you would otherwise spend operating a store, not just against AWS line items.
Redis / MemoryDB: you pay for in-memory nodes sized to hold the whole index in RAM, so cost scales with corpus size and is the highest per stored vector. Justified only when its latency is the requirement, or when you already run it.
Neptune Analytics: graph-engine pricing (memory-based capacity for the analytics engine plus storage). It carries the cost of a specialized engine; justified by GraphRAG value, not by plain similarity search.
Small corpus or early-stage (under ~1M vectors, modest QPS): Aurora pgvector or S3 Vectors — avoid the OpenSearch floor. Large corpus needing hybrid search at scale: OpenSearch Serverless earns its OCUs. Huge, cold, or cost-sensitive: S3 Vectors. Willing to pay to skip ops: Pinecone. The floor-vs-usage distinction explains most "why is this so expensive?" surprises.
Cost narrows the field; latency, scale, and filtering finish the decision. These are workload requirements, not vendor virtues — the question is never "which is fastest?" but "which clears the bar my workload actually sets?" Over-buying performance you do not need is just a more expensive way to be wrong.
Set your real targets before you shop. Most RAG question-answering is comfortable with retrieval in the tens-to-low-hundreds of milliseconds, because the language-model generation step that follows takes far longer anyway — shaving 10ms off retrieval is invisible next to a multi-second completion. Real-time recommendation, semantic caching, and personalization in a hot path are different: there, single-digit-millisecond retrieval can be the requirement. Know which world you are in, because it is the difference between "any of these work" and "you need an in-memory store."
Single-digit milliseconds: Redis / MemoryDB (in-memory) is the clear leader and usually the only option that guarantees it at scale.
Tens of milliseconds: OpenSearch Serverless, Aurora pgvector (well-indexed and sized), and Pinecone all land here comfortably — more than fast enough for RAG question-answering, where generation dominates total latency.
Sub-second: S3 Vectors targets this band by design — excellent for batch, archival, and cost-sensitive interactive use where sub-second is acceptable, not for hot real-time paths.
To the low millions of vectors: Aurora pgvector is comfortable and often the simplest, cheapest fit.
Millions to hundreds of millions: OpenSearch Serverless (autoscaling OCUs), Pinecone (serverless), and S3 Vectors (object-storage economics) are built for this band.
Billions / cost-dominated: S3 Vectors changes the economics by storing vectors at S3 prices; a tiered design — hot vectors in OpenSearch, cold in S3 Vectors — is an increasingly common large-scale pattern.
Richest filtering + native hybrid search: OpenSearch Serverless — it is a search engine first, so BM25 + vector fusion and complex metadata filtering/aggregations are first-class. This is its strongest differentiator and the reason to accept its cost at scale.
Filtering as plain SQL: Aurora pgvector — any WHERE clause filters your vector search, and you can join to relational tables, which is powerful for multi-tenant isolation and permissioned retrieval. Hybrid search is achievable (Postgres full-text + pgvector) but is more do-it-yourself than OpenSearch.
Metadata filtering, vendor-managed: Pinecone supports metadata filtering well within its managed model. S3 Vectors and Redis support filtering; validate the exact filtering semantics (pre- vs post-filter behavior) against your workload, since this is where stores differ most in practice and where multi-tenant correctness is won or lost.
The store you choose is something your team operates for years, and on AWS most teams are not building a custom retrieval engine — they are pointing Amazon Bedrock Knowledge Bases at a backend and letting it manage chunking, embedding, and retrieval. So two practical questions decide a lot: how much will this cost me in operational time, and does it drop cleanly into Knowledge Bases?
On operational burden, the spectrum runs from "fully managed, near-zero ops" to "you own it." Pinecone (SaaS) and S3 Vectors (it is just S3) sit at the low-ops end. OpenSearch Serverless removes cluster management and autoscales, so it is low-ops despite being powerful — the cost is financial (the OCU floor), not operational. Aurora pgvector is moderate: it is a managed database, but you still own schema, indexing strategy, sizing, and tuning — though if you already run Postgres, that burden is one you have already accepted. Redis/MemoryDB and Neptune Analytics are specialized engines that carry more operational and design weight, justified by their specific strengths.
The Bedrock Knowledge Bases angle is decisive for RAG teams. Knowledge Bases is AWS's managed RAG service: point it at an S3 data source and a supported vector store, and it handles ingestion (chunking + embedding) and retrieval for you. Its supported vector-store backends include Amazon OpenSearch Serverless (the default it will create for you), Amazon Aurora PostgreSQL (pgvector), Pinecone, Amazon Neptune Analytics (for GraphRAG), and increasingly Amazon S3 Vectors. If you want Knowledge Bases to manage your pipeline, choosing from that supported set keeps the integration first-class.
A crucial, money-saving subtlety: when you let Bedrock Knowledge Bases set everything up with defaults, it provisions OpenSearch Serverless — and you inherit the OCU floor. That is the right call for a large production knowledge base and the wrong call for a small POC. For small or early Knowledge Bases workloads, explicitly select Aurora pgvector or S3 Vectors as the backend instead of accepting the default. This one configuration choice is the difference between a small RAG POC costing a few dollars a month and costing several hundred.
Redis/MemoryDB is the deliberate exception to the Knowledge Bases path. It is not a first-class Knowledge Bases backend, so teams reach for it when they are building a custom retrieval layer (often for latency-critical or semantic-cache use cases) rather than using managed Knowledge Bases. If your plan is "Knowledge Bases manages my RAG," Redis is usually not the answer; if your plan is "I am hand-building a low-latency retrieval path," it may be exactly the answer.
Bedrock Knowledge Bases defaults to creating OpenSearch Serverless — which means the ~$700/mo OCU floor lands on your bill even for a tiny index. For small or POC knowledge bases, explicitly choose Aurora pgvector or S3 Vectors as the backend. Graduate to the OpenSearch default when scale and hybrid-search needs justify it.
Most regret in vector-database selection traces to a handful of repeatable errors. Naming them is the cheapest insurance available — each one is easy to avoid once you know it exists, and each one routinely costs teams either real money or weeks of rework.
The vector database does not live alone. Seeing where it sits in a complete RAG retrieval flow clarifies which decisions are the store's job and which belong to the layers around it — and prevents the common error of blaming the store for problems that are really upstream.
A production RAG retrieval pipeline has the same shape regardless of which store you pick. Ingestion (offline): source documents in S3 are chunked, each chunk is embedded by an embeddings model (such as Amazon Titan Text Embeddings or Cohere Embed on Bedrock), and the vectors plus their metadata are written to the vector store. Query (online): the user question is embedded with the same model, the vector store returns the top-k most similar chunks (optionally filtered by metadata and fused with keyword results in hybrid search), an optional re-ranking step reorders the candidates for relevance, and the surviving chunks are passed to the language model as grounding context.
The vector store owns exactly two steps in that flow: storing the vectors and returning the nearest neighbors (with filtering/hybrid logic). Everything else — chunking strategy, choice of embedding model, re-ranking, prompt assembly — sits in the layers around it, and most RAG quality problems live in those layers, not in the store. A useful discipline: when retrieval quality is poor, check chunking and embeddings and re-ranking before concluding you chose the wrong database. The store is responsible for being fast, correctly filtered, and affordable; it is not responsible for whether you chunked your documents sensibly.
This is also why store choice is lower-risk than it feels. Because Bedrock Knowledge Bases (or a thin retrieval interface in custom code) sits between your application and the store, swapping backends is a re-indexing job, not a rewrite — the same property called out in the mistakes section. Pick the store that fits your corpus size, QPS, latency, filtering, and budget today; keep the ingestion pipeline reproducible; and treat the store as a component you can upgrade, not a one-way door.
The whole guide collapses into this: find your row by corpus size and column by query volume, and read the recommended store. These are starting points for the common cases, not absolute laws — adjust for a hard latency requirement (push toward Redis/MemoryDB), heavy hybrid-search needs (push toward OpenSearch), or a graph-shaped corpus (Neptune Analytics).
| Corpus size | Low QPS (POC / internal) | Medium QPS (production app) | High QPS / real-time |
|---|---|---|---|
| Small (< ~1M vectors) | Aurora pgvector (cheapest); avoid OpenSearch floor | Aurora pgvector, or OpenSearch if hybrid search needed | Redis / MemoryDB if single-digit-ms required |
| Mid (~1M–10M) | Aurora pgvector or S3 Vectors | OpenSearch Serverless (hybrid + scale) or Aurora pgvector | OpenSearch Serverless; Redis for hot subset |
| Large (10M–100M+) | S3 Vectors (cost) or OpenSearch | OpenSearch Serverless, or Pinecone to skip ops | OpenSearch + Redis hot tier |
| Huge / cost-dominated | S3 Vectors | S3 Vectors hot/cold tier with OpenSearch | OpenSearch hot tier + S3 Vectors cold tier |
| Graph-shaped (any size) | Neptune Analytics (GraphRAG) | Neptune Analytics (GraphRAG) | Neptune Analytics + caching |
The full field on one screen. Read it against your two numbers (corpus size, QPS) plus your latency, filtering, and ops constraints. No row is a verdict on its own — the right pick is the option whose profile matches your workload at the lowest justified cost.
| Dimension | OpenSearch Serverless | Aurora pgvector | Pinecone | Redis / MemoryDB | Neptune Analytics | S3 Vectors |
|---|---|---|---|---|---|---|
| Cost shape | Floor-based (2-OCU min, ~$700/mo) | Aurora compute+storage (scales down) | SaaS usage-based | In-memory nodes (highest/vector) | Graph-engine capacity | Storage + query (lowest at rest) |
| Best corpus size | Millions → 100M+ | Up to low millions | Millions → very large | Smaller / hot subset | Graph-shaped, any size | Large → huge |
| Latency band | Tens of ms | Tens of ms | Tens of ms | Single-digit ms | Tens of ms (graph) | Sub-second |
| Hybrid (BM25+vector) | Native, first-class | DIY (PG full-text) | Managed | Limited | Graph + vector | Limited (validate) |
| Metadata filtering | Rich, first-class | Plain SQL WHERE | Good, managed | Supported | Graph-aware | Supported (validate) |
| Ops burden | Low (autoscaling) | Moderate (you own schema) | Near-zero (SaaS) | Moderate–high | Higher (specialized) | Near-zero (it is S3) |
| Bedrock KB backend | Yes (the default) | Yes | Yes | Not first-class | Yes (GraphRAG) | Yes |
| Reach for it when | Scale + hybrid search | Postgres shop / small-mid | Buy away ops | Single-digit-ms latency | Relationships matter | Cheap at huge scale |
Situation: The team had shipped a RAG assistant on Bedrock Knowledge Bases by accepting the defaults, which provisioned OpenSearch Serverless. With only ~200K chunks and modest query volume, they were nonetheless paying around $4,000/month — the OCU floor plus a redundant production configuration — for a workload that was tiny relative to that capacity. They assumed the cost was inherent to RAG and were about to cut features to afford it. They wanted someone who knew the AWS vector-store landscape to tell them whether the bill was avoidable before they made product sacrifices.
What CloudRoute did: Routed within a day to a vetted AWS partner with production RAG experience. The partner confirmed the corpus and QPS did not justify OpenSearch Serverless, re-pointed Bedrock Knowledge Bases at Aurora PostgreSQL (pgvector) on Serverless v2 — which scaled down with their low traffic — and re-indexed the corpus through the same ingestion pipeline. Metadata filtering moved to plain SQL WHERE clauses, which also tightened per-tenant isolation. Retrieval latency stayed in the tens-of-milliseconds range, invisible against generation time. The work was scoped and filed as an AWS-funded GenAI POC, so the migration effort was credit-covered.
Outcome: The vector-store line item dropped from ~$4,000/month to under $300/month with no change in answer quality and no features cut. Per-tenant filtering correctness improved as a side effect of moving to SQL. The team kept the embedding pipeline reproducible so they can re-index onto OpenSearch Serverless later if they ever reach the scale that justifies it. CloudRoute's commission was paid by the partner from AWS engagement funding — the customer paid $0.
monthly store cost: ~$4,000 → <$300 · quality change: none · re-index time: days · cost to customer: $0
CloudRoute routes you to a vetted AWS partner who picks the vector database for your corpus and traffic, builds the retrieval pipeline on Bedrock, and avoids the cost traps — often as an AWS-funded GenAI POC, so you pay $0. No procurement. No open-ended consulting bill.