for AWS partners →Talk to an AWS RAG partner →

vector databases on AWS · the 2026 decision guide

Choosing a vector database on AWS — the RAG-infrastructure decision, on real cost, latency, and ops burden.

Q: What is the best vector database on AWS for RAG?

There is no single best one — it depends on your corpus size and query volume, filtered by how much operational work you want to own. As a practical default: use Aurora PostgreSQL with pgvector for small-to-mid corpora or if your data already lives in Postgres; move to Amazon OpenSearch Serverless when you need scale plus hybrid (keyword + vector) search and the cost is justified; use Amazon S3 Vectors for huge, cost-sensitive, latency-tolerant corpora; choose Pinecone to buy away all database operations; and reach for Redis/MemoryDB only when single-digit-millisecond latency is a hard requirement. All except Redis are first-class Bedrock Knowledge Bases backends.

Q: Why is Amazon OpenSearch Serverless so expensive for a small RAG project?

Because it bills in OpenSearch Compute Units (OCUs) and enforces a minimum of 2 OCUs for a vector collection — one for indexing, one for search — billed continuously regardless of how little data or traffic you have. At roughly $0.24 per OCU-hour, that floor is about $0.48/hour, or on the order of ~$700/month before storage and before any redundancy (production setups provision more OCUs). This is the most common cost surprise in AWS RAG. It is an excellent store at scale and a poor choice for a small proof-of-concept. For small workloads, use Aurora pgvector or S3 Vectors instead, and switch to OpenSearch once scale and hybrid-search needs justify the floor.

Q: Should I use OpenSearch or pgvector for vector search on AWS?

Use Aurora PostgreSQL with pgvector when your corpus is small-to-mid (up to the low millions of vectors), your data already lives in Postgres, or you want vectors and relational data in one system with SQL-based metadata filtering — it scales down cheaply on Aurora Serverless v2 and avoids the OpenSearch cost floor. Use OpenSearch Serverless when you need large scale (millions to hundreds of millions of vectors), native hybrid search (BM25 keyword fused with vector similarity), rich filtering and aggregations, and the spend is justified. A common path is to start on pgvector and migrate to OpenSearch by re-indexing once you outgrow it.

Q: What is Amazon S3 Vectors and when should I use it?

Amazon S3 Vectors brings native vector storage and similarity search to Amazon S3 via vector buckets, offering sub-second query latency at object-storage prices. Use it for very large, cost-sensitive, or infrequently queried corpora where you can tolerate sub-second (rather than single-digit-millisecond) latency — it has the lowest at-rest cost of any option because vectors live in S3 rather than in RAM or a running cluster, and there is no continuous compute floor. A popular pattern is tiering: keep hot, frequently queried vectors in OpenSearch and cold vectors in S3 Vectors. It integrates with Bedrock Knowledge Bases and OpenSearch. Validate its filtering semantics and latency against your specific workload first.

Q: Is Pinecone worth it over AWS-native vector stores?

Pinecone is worth it when buying away all database operations is more valuable to you than AWS-native consolidation. It is a fully managed, purpose-built vector database (available via AWS Marketplace and supported by Bedrock Knowledge Bases) with strong developer experience and serverless usage-based pricing that avoids paying for idle capacity. The tradeoff: it is a third-party vendor sitting outside your AWS bill, your AWS credits and commitments, your IAM model, and potentially your VPC boundary, which can complicate procurement, security reviews, and data residency. Price it against the engineering time you would otherwise spend operating a store — for some teams that is a clear win; for others, AWS-native (Aurora pgvector or OpenSearch) is the better fit.

Q: When should I use Redis or MemoryDB for vector search?

Use Redis or Amazon MemoryDB for vector search only when single-digit-millisecond retrieval latency is a hard requirement — real-time recommendations, semantic caching, or low-latency personalization in a hot request path — or when you already run Redis/MemoryDB and want to reuse it. Because everything lives in RAM, it is the lowest-latency option but also the most expensive per stored vector, so it is a poor home for a large general RAG corpus. It is also not a first-class Bedrock Knowledge Bases backend, so teams typically use it for custom retrieval paths rather than managed RAG. For most RAG question-answering, where the language-model generation step dominates latency, an in-memory store is over-buying speed you will not notice.

Q: What is Neptune Analytics and when does GraphRAG make sense?

Amazon Neptune Analytics is the analytics engine of Amazon Neptune (AWS's graph database) with built-in vector search, letting you combine graph traversal with vector similarity in one query — the foundation of GraphRAG. It makes sense when relationships matter as much as similarity: retrieving not just semantically similar chunks but the entities and relationships connected to them, which improves answers on highly interconnected knowledge such as org structures, regulations, supply chains, or knowledge graphs. It is supported by Bedrock Knowledge Bases. For plain "find similar chunks" RAG with no relationship structure, it is overkill — you would be paying for a graph engine you are not using, so reserve it for genuinely graph-shaped data.

Q: Does Amazon Bedrock Knowledge Bases lock me into one vector database?

No. Bedrock Knowledge Bases supports multiple vector-store backends — Amazon OpenSearch Serverless (the default it provisions), Amazon Aurora PostgreSQL with pgvector, Pinecone, Amazon Neptune Analytics for GraphRAG, and Amazon S3 Vectors — and it abstracts ingestion and retrieval so the backend is swappable. The key caution: if you accept the defaults, Knowledge Bases creates OpenSearch Serverless and you inherit its ~$700/month OCU floor, which is wrong for a small POC. Explicitly choose Aurora pgvector or S3 Vectors for small workloads. Because embeddings are portable and the pipeline is reproducible, migrating between backends later is a re-indexing job, not a rewrite.

Q: How do I estimate the cost of a vector database on AWS?

Translate every option into "cost for my corpus and my traffic," because the pricing models are not directly comparable. OpenSearch Serverless is floor-based: roughly ~$700/month minimum (2 OCUs) regardless of size, plus storage, so it dominates small workloads and only makes sense at scale. Aurora pgvector is Aurora compute + storage, which scales down on Serverless v2 for small workloads and is predictable when sized for large ones. S3 Vectors is storage at S3 prices plus per-query charges with no compute floor — cheapest at rest. Pinecone is SaaS usage-based pricing billed by the vendor. Redis/MemoryDB scales with in-memory node size, the highest per vector. Estimate by your vector count, dimensions, and QPS, and remember to price the floor-vs-usage distinction — it explains most surprises.

A neutral, numbers-first walkthrough of the six real options — OpenSearch Serverless, Aurora PostgreSQL with pgvector, Pinecone, Redis / MemoryDB, Neptune Analytics, and S3 Vectors — scored on cost (including the OpenSearch Serverless OCU-minimum trap), latency, scale, metadata filtering, operational burden, and how each plugs into Bedrock Knowledge Bases. Ends in a decision table by corpus size and query volume.

Talk to an AWS RAG partner →→ jump to the decision table

options compared

OpenSearch floor

~$700/mo

cheapest at rest

S3 Vectors

decision inputs

TL;DR

There is no single best vector database on AWS — the right choice is a function of two numbers: how many vectors you store (corpus size) and how many queries per second you serve (query volume), filtered by how much operational work you are willing to own. Pick on those axes, not on benchmarks.
The most important cost fact in 2026: Amazon OpenSearch Serverless has an effective minimum of around $700/month because it bills in OpenSearch Compute Units (OCUs) and floors at 2 OCUs (indexing + search) even for a tiny index — a trap that ambushes teams who picked it for a small RAG proof-of-concept. For small corpora, Aurora pgvector or S3 Vectors are dramatically cheaper.
For most teams building RAG, the honest default is the cheapest store that meets your latency and filtering needs and integrates with Bedrock Knowledge Bases: Aurora PostgreSQL pgvector if your data already lives in Postgres or your corpus is small-to-mid; OpenSearch Serverless once you need scale and hybrid (keyword + vector) search and the spend is justified; S3 Vectors for huge, cost-sensitive, latency-tolerant corpora; Pinecone when you want zero database ops and will pay for it; Redis/MemoryDB only when single-digit-millisecond latency is the hard requirement.

first principles

IWhat a vector database actually does — and why the choice is a cost/latency tradeoff, not a feature contest

Before comparing six products, fix the mental model. A vector database does one core job, and once you understand that job, the differences between the options stop looking like a feature checklist and start looking like what they really are: different points on a cost, latency, and operational-burden curve.

A vector database stores embeddings — high-dimensional numeric arrays (commonly 256 to 1,536 dimensions, depending on the embedding model) that represent the meaning of a chunk of text, an image, or other content. Its core job is approximate nearest-neighbor (ANN) search: given a query vector, return the k most similar stored vectors, fast, without comparing against every vector in the corpus by brute force. Almost every implementation does this with a graph index called HNSW (Hierarchical Navigable Small World) or an IVF (inverted-file) variant. The algorithm is largely commoditized; what differs is how each product packages, prices, and operates it.

In a retrieval-augmented generation (RAG) system, the vector database is the retrieval half. You embed your documents once (offline), store the vectors, and at query time you embed the user question, ask the vector store for the most similar chunks, and pass those chunks to the model as grounding context. The vector store never talks to the language model directly — it is plumbing that decides which of your documents the model gets to read. That makes it both load-bearing (bad retrieval is the number-one cause of bad RAG answers) and, in cost terms, a line item that runs 24/7 whether or not anyone is querying.

Two properties beyond raw similarity search separate a toy from a production store. First, metadata filtering: the ability to constrain a search to vectors matching structured attributes — tenant ID, document type, date range, access-control labels — ideally as a pre-filter (narrow the candidate set before the ANN search) rather than a post-filter (search first, then throw away results), because post-filtering can return too few results or none. Multi-tenant SaaS, permissioned document search, and any "search only my org's data" requirement live or die on filtering quality. Second, hybrid search: combining dense vector similarity with sparse keyword (BM25) matching, which materially improves recall on queries with exact terms, product codes, names, or acronyms that pure semantic search fumbles.

Hold those two properties — filtering and hybrid search — alongside three cost-and-ops realities, and you have the entire decision. The realities: a vector index is memory-hungry (HNSW graphs largely live in RAM, so cost scales with corpus size whether you query or not); it usually runs continuously (idle cost is real); and someone has to operate it (provisioning, scaling, patching, backups) unless you pay a managed service to do that for you. Every option below is a different bargain across those three. The rest of this guide prices each bargain.

the landscape

IIThe six real options on AWS in 2026

These are the vector stores a team building on AWS actually chooses between in 2026. Each gets an honest profile: what it is, where it shines, where it hurts, and the cost mechanic that matters. Read this section for the shape of each option; the next sections price and rank them.

A note on scope: this guide covers the stores that integrate cleanly into an AWS-native RAG stack and that Amazon Bedrock Knowledge Bases can use as a backend (with one deliberate exception, Redis, noted below). Self-hosting a dedicated vector engine like Milvus, Weaviate, Qdrant, or Chroma on EC2/EKS is a viable seventh path, but it trades managed simplicity for full operational ownership and is justified by the same narrow conditions as any self-host decision — extreme scale economics or a specific feature need — so it is mentioned but not the focus.

Amazon OpenSearch Serverless (vector engine)

What it is: the serverless tier of Amazon OpenSearch Service, with a vector engine that does ANN (HNSW/FAISS) plus full-text (BM25) search in one system — so it natively supports hybrid search and rich metadata filtering. It is the default vector store Bedrock Knowledge Bases provisions if you let it pick.

Where it shines: scale and hybrid search. When your corpus is large (millions of vectors), you need keyword + vector retrieval together, you want sophisticated filtering and aggregations, and you would rather not run a cluster, OpenSearch Serverless is the strong choice. It autoscales capacity for you.

Where it hurts — the OCU trap: it bills in OpenSearch Compute Units (OCUs) and enforces a floor of 2 OCUs for a vector/search collection — one for indexing, one for search — even if your index holds a thousand vectors. At roughly $0.24 per OCU-hour, that floor is about $0.48/hour, which is roughly $350/month per OCU and therefore on the order of $700/month minimum before you store a single byte of meaningful data, plus storage. (A redundant/production configuration pushes the OCU count higher.) This is the single most common cost surprise in AWS RAG. It is a great large-scale option and a terrible small-POC option.

Aurora PostgreSQL with pgvector

What it is: Amazon Aurora PostgreSQL (or Amazon RDS for PostgreSQL) with the open-source pgvector extension, which adds a vector column type and HNSW/IVFFlat indexes to ordinary Postgres. Your embeddings live in a table next to your relational data, and metadata filtering is just a SQL WHERE clause.

Where it shines: teams that already run Postgres, small-to-mid corpora, and anyone who wants vectors and transactional/relational data in one system with one set of operational tools. Filtering is first-class SQL, joins to your existing tables are trivial, and there is no new datastore to learn. Aurora Serverless v2 scales capacity (in ACUs) with load and can scale down when idle, which keeps small-workload cost low. Bedrock Knowledge Bases supports Aurora as a backend.

Where it hurts: at very large scale (tens of millions of vectors and high QPS) a general-purpose relational engine is not as specialized as OpenSearch or a dedicated vector DB — index build times grow, memory pressure rises, and you may tune and shard harder than you would like. pgvector is excellent up to the low millions; beyond that, evaluate whether you have outgrown it.

Pinecone (AWS Marketplace / managed SaaS)

What it is: a fully managed, purpose-built vector database delivered as SaaS, available through AWS Marketplace and deployable in AWS regions. It is the "I never want to think about vector-database operations" option, and Bedrock Knowledge Bases supports it as a backend.

Where it shines: developer experience and zero ops. Its serverless tier separates storage from compute and can scale to very large corpora with usage-based pricing, so you are not paying for idle capacity the way a fixed OCU floor charges you. For teams that value speed-to-ship and have no appetite for database administration, it is compelling.

Where it hurts: it is a third-party vendor, not an AWS-native service, so it sits outside your AWS bill, your consolidated AWS commitments and credits, your IAM model, and (depending on configuration) your VPC boundary — which can matter for procurement, data-residency, and security reviews. You are also subject to its pricing model rather than AWS's. The convenience is real; so is the dependency.

Redis / Amazon MemoryDB (vector search)

What it is: vector similarity search inside an in-memory data store. Amazon MemoryDB is a Redis-compatible, durable, in-memory database with native vector search; Amazon ElastiCache (Redis OSS) also supports vector search. Everything lives in RAM.

Where it shines: latency. When you need single-digit-millisecond retrieval — real-time recommendations, semantic caching, low-latency personalization in a hot request path — an in-memory store is hard to beat. If you already run Redis/MemoryDB for caching or session state, adding vector search reuses infrastructure you operate.

Where it hurts: cost per vector. RAM is the most expensive place to store data, so holding a large corpus entirely in memory is expensive, and it scales with corpus size regardless of query rate. It is a precision instrument for latency-critical, often smaller or hot-subset workloads — not the economical home for a large general RAG corpus. Note: it is not a first-class Bedrock Knowledge Bases backend the way OpenSearch, Aurora, Pinecone, and Neptune Analytics are, so RAG-on-Knowledge-Bases teams typically choose it only for custom retrieval paths.

Amazon Neptune Analytics

What it is: the analytics engine of Amazon Neptune (AWS's graph database) with built-in vector search, so you can combine graph traversal with vector similarity in one query. Bedrock Knowledge Bases supports it, and it is the backbone of GraphRAG patterns.

Where it shines: when relationships matter as much as similarity. GraphRAG — retrieving not just semantically similar chunks but the entities and relationships connected to them — improves answer quality on highly interconnected knowledge (org structures, regulations, supply chains, knowledge graphs). If your data is genuinely a graph and you want retrieval that exploits that, Neptune Analytics is the specialized fit.

Where it hurts: it is a specialized, heavier-weight engine, not a general-purpose vector store, and it carries graph-database cost and complexity. For plain "find similar chunks" RAG with no relationship structure, it is overkill — you are paying for a graph engine you are not using.

Amazon S3 Vectors

What it is: the newest entrant — native vector storage and similarity search built directly into Amazon S3 via "vector buckets," with sub-second query latency at object-storage prices. It brings ANN search to the cheapest durable storage AWS offers, and integrates with Bedrock Knowledge Bases and OpenSearch.

Where it shines: cost at scale for latency-tolerant workloads. Because vectors sit in S3 rather than in RAM or a running cluster, the at-rest cost is dramatically lower and there is effectively no idle-compute floor — you pay for storage and queries. For very large corpora, archival or infrequently queried embeddings, or cost-sensitive RAG where sub-second (not single-digit-millisecond) latency is acceptable, it changes the economics. A common pattern: keep hot vectors in OpenSearch and tier cold vectors to S3 Vectors.

Where it hurts: it trades latency and feature richness for price. It targets sub-second, not in-memory-grade, latency, and it is younger than the alternatives, so the most demanding real-time and advanced-hybrid-search workloads still point to OpenSearch or Redis. Validate its filtering and latency against your specific workload before standardizing on it.

the one-line version

OpenSearch Serverless for scale + hybrid search (mind the ~$700/mo OCU floor). Aurora pgvector for Postgres shops and small-to-mid corpora. S3 Vectors for huge, cost-sensitive, latency-tolerant corpora. Pinecone to buy your way out of database ops. Redis/MemoryDB when single-digit-ms latency is non-negotiable. Neptune Analytics when relationships matter as much as similarity (GraphRAG).

the money

IIICost — the dimension that decides most choices (and the OCU trap in detail)

Cost is where vector-database choices go wrong most often, because the pricing models are not comparable apples-to-apples: one bills fixed compute units, one bills database capacity that scales with load, one bills storage plus queries, one bills as a SaaS subscription. Translate them all into "what does this cost for my corpus and my traffic" and the decision usually makes itself.

The structural divide is between floor-based and usage-based pricing. A floor-based store (OpenSearch Serverless, with its 2-OCU minimum) costs roughly the same whether you store a thousand vectors or a million low-volume ones — the floor dominates small workloads. A usage-based store (S3 Vectors; Aurora Serverless v2 scaling down; Pinecone serverless) lets cost track actual storage and queries, so small workloads stay cheap and you pay more only as you grow. The single biggest cost mistake in AWS RAG is putting a small or early-stage workload on a floor-based store.

The OpenSearch Serverless OCU minimum, spelled out

OpenSearch Serverless bills in OpenSearch Compute Units. A vector search collection requires a minimum of 2 OCUs — one for indexing and one for search — and you are billed for that minimum continuously, 24/7, regardless of how little data you hold or how few queries you run. At an on-demand rate on the order of $0.24 per OCU-hour, two OCUs run roughly $0.48/hour, which compounds to approximately $345 per OCU per month and therefore on the order of ~$700/month as a practical floor, before storage and before any redundancy. A standard production deployment with redundancy provisions additional OCUs, raising the floor further.

Why this ambushes teams: a developer wires up Bedrock Knowledge Bases, accepts the default vector store (OpenSearch Serverless), ships a small RAG proof-of-concept with a few thousand chunks, and is then surprised by a four-figure monthly bill for a workload that has barely any data or traffic. The store is excellent — at the scale it was designed for. For a POC or a small production corpus, that floor is pure waste.

The honest guidance: do not default to OpenSearch Serverless for small or early workloads. Use Aurora pgvector (scales down with Serverless v2) or S3 Vectors (storage + query pricing, no compute floor) while small, and move to OpenSearch Serverless when your corpus, your QPS, and your need for hybrid search at scale actually justify the floor — at which point the OCU model is reasonable and autoscales cleanly.

How the others price

Aurora PostgreSQL pgvector: you pay for Aurora — compute (provisioned instances, or Serverless v2 in ACUs that scale with load and down toward a small floor when idle), storage, and I/O. For a small RAG corpus this can be a fraction of the OpenSearch floor, especially on Serverless v2; for a large one, a sized Aurora cluster is a predictable, well-understood cost. You are paying for a database you may already run.

S3 Vectors: storage at S3-class prices plus per-query/request charges, with no continuously running compute floor. This is the cheapest at-rest option for large or cold corpora by a wide margin — its whole reason for existing is to drop the cost of storing vectors at scale when you can tolerate sub-second rather than in-memory latency.

Pinecone: a SaaS subscription with usage-based (serverless) pricing that scales with stored vectors and operations, billed by the vendor (payable via AWS Marketplace). You are buying away operational work; price it against the engineering time you would otherwise spend operating a store, not just against AWS line items.

Redis / MemoryDB: you pay for in-memory nodes sized to hold the whole index in RAM, so cost scales with corpus size and is the highest per stored vector. Justified only when its latency is the requirement, or when you already run it.

Neptune Analytics: graph-engine pricing (memory-based capacity for the analytics engine plus storage). It carries the cost of a specialized engine; justified by GraphRAG value, not by plain similarity search.

the cost heuristic

Small corpus or early-stage (under ~1M vectors, modest QPS): Aurora pgvector or S3 Vectors — avoid the OpenSearch floor. Large corpus needing hybrid search at scale: OpenSearch Serverless earns its OCUs. Huge, cold, or cost-sensitive: S3 Vectors. Willing to pay to skip ops: Pinecone. The floor-vs-usage distinction explains most "why is this so expensive?" surprises.

performance & capability

IVLatency, scale, and filtering — matching the store to the workload

Cost narrows the field; latency, scale, and filtering finish the decision. These are workload requirements, not vendor virtues — the question is never "which is fastest?" but "which clears the bar my workload actually sets?" Over-buying performance you do not need is just a more expensive way to be wrong.

Set your real targets before you shop. Most RAG question-answering is comfortable with retrieval in the tens-to-low-hundreds of milliseconds, because the language-model generation step that follows takes far longer anyway — shaving 10ms off retrieval is invisible next to a multi-second completion. Real-time recommendation, semantic caching, and personalization in a hot path are different: there, single-digit-millisecond retrieval can be the requirement. Know which world you are in, because it is the difference between "any of these work" and "you need an in-memory store."

Latency

Single-digit milliseconds: Redis / MemoryDB (in-memory) is the clear leader and usually the only option that guarantees it at scale.

Tens of milliseconds: OpenSearch Serverless, Aurora pgvector (well-indexed and sized), and Pinecone all land here comfortably — more than fast enough for RAG question-answering, where generation dominates total latency.

Sub-second: S3 Vectors targets this band by design — excellent for batch, archival, and cost-sensitive interactive use where sub-second is acceptable, not for hot real-time paths.

Scale

To the low millions of vectors: Aurora pgvector is comfortable and often the simplest, cheapest fit.

Millions to hundreds of millions: OpenSearch Serverless (autoscaling OCUs), Pinecone (serverless), and S3 Vectors (object-storage economics) are built for this band.

Billions / cost-dominated: S3 Vectors changes the economics by storing vectors at S3 prices; a tiered design — hot vectors in OpenSearch, cold in S3 Vectors — is an increasingly common large-scale pattern.

Filtering and hybrid search

Richest filtering + native hybrid search: OpenSearch Serverless — it is a search engine first, so BM25 + vector fusion and complex metadata filtering/aggregations are first-class. This is its strongest differentiator and the reason to accept its cost at scale.

Filtering as plain SQL: Aurora pgvector — any WHERE clause filters your vector search, and you can join to relational tables, which is powerful for multi-tenant isolation and permissioned retrieval. Hybrid search is achievable (Postgres full-text + pgvector) but is more do-it-yourself than OpenSearch.

Metadata filtering, vendor-managed: Pinecone supports metadata filtering well within its managed model. S3 Vectors and Redis support filtering; validate the exact filtering semantics (pre- vs post-filter behavior) against your workload, since this is where stores differ most in practice and where multi-tenant correctness is won or lost.

operations & integration

VOperational burden and Bedrock Knowledge Bases integration

The store you choose is something your team operates for years, and on AWS most teams are not building a custom retrieval engine — they are pointing Amazon Bedrock Knowledge Bases at a backend and letting it manage chunking, embedding, and retrieval. So two practical questions decide a lot: how much will this cost me in operational time, and does it drop cleanly into Knowledge Bases?

On operational burden, the spectrum runs from "fully managed, near-zero ops" to "you own it." Pinecone (SaaS) and S3 Vectors (it is just S3) sit at the low-ops end. OpenSearch Serverless removes cluster management and autoscales, so it is low-ops despite being powerful — the cost is financial (the OCU floor), not operational. Aurora pgvector is moderate: it is a managed database, but you still own schema, indexing strategy, sizing, and tuning — though if you already run Postgres, that burden is one you have already accepted. Redis/MemoryDB and Neptune Analytics are specialized engines that carry more operational and design weight, justified by their specific strengths.

The Bedrock Knowledge Bases angle is decisive for RAG teams. Knowledge Bases is AWS's managed RAG service: point it at an S3 data source and a supported vector store, and it handles ingestion (chunking + embedding) and retrieval for you. Its supported vector-store backends include Amazon OpenSearch Serverless (the default it will create for you), Amazon Aurora PostgreSQL (pgvector), Pinecone, Amazon Neptune Analytics (for GraphRAG), and increasingly Amazon S3 Vectors. If you want Knowledge Bases to manage your pipeline, choosing from that supported set keeps the integration first-class.

A crucial, money-saving subtlety: when you let Bedrock Knowledge Bases set everything up with defaults, it provisions OpenSearch Serverless — and you inherit the OCU floor. That is the right call for a large production knowledge base and the wrong call for a small POC. For small or early Knowledge Bases workloads, explicitly select Aurora pgvector or S3 Vectors as the backend instead of accepting the default. This one configuration choice is the difference between a small RAG POC costing a few dollars a month and costing several hundred.

Redis/MemoryDB is the deliberate exception to the Knowledge Bases path. It is not a first-class Knowledge Bases backend, so teams reach for it when they are building a custom retrieval layer (often for latency-critical or semantic-cache use cases) rather than using managed Knowledge Bases. If your plan is "Knowledge Bases manages my RAG," Redis is usually not the answer; if your plan is "I am hand-building a low-latency retrieval path," it may be exactly the answer.

the integration default to override

Bedrock Knowledge Bases defaults to creating OpenSearch Serverless — which means the ~$700/mo OCU floor lands on your bill even for a tiny index. For small or POC knowledge bases, explicitly choose Aurora pgvector or S3 Vectors as the backend. Graduate to the OpenSearch default when scale and hybrid-search needs justify it.

failure modes

VIThe five most common (and expensive) mistakes

Most regret in vector-database selection traces to a handful of repeatable errors. Naming them is the cheapest insurance available — each one is easy to avoid once you know it exists, and each one routinely costs teams either real money or weeks of rework.

Defaulting to OpenSearch Serverless for a small workload — By far the most common and most expensive error. Accepting the Bedrock Knowledge Bases default lands the ~$700/month OCU floor on a POC that has almost no data or traffic. Fix: use Aurora pgvector or S3 Vectors while small; move to OpenSearch when scale and hybrid search justify the floor.
Choosing on benchmarks instead of your two numbers — Public ANN benchmarks measure raw query speed on synthetic data and rarely reflect your corpus size, query volume, filtering needs, or cost. The decision is driven by your corpus size and QPS (and your ops appetite), not by who wins a leaderboard at a workload that is not yours.
Ignoring metadata filtering until multi-tenancy breaks it — A store that nails similarity search can still mishandle filtered search — especially post-filtering, which can silently return too few results or leak across tenants. Test filtered, multi-tenant queries (pre-filter behavior, correctness, recall) before committing, not after a customer sees another tenant's data.
Over-buying latency you do not need — Paying for in-memory single-digit-millisecond retrieval when your RAG generation step takes multiple seconds is spending money to optimize an invisible part of the pipeline. Match latency to the actual requirement; for most RAG, tens of milliseconds is plenty.
Treating the store as permanent and un-portable — Embeddings are portable — they are just arrays — and Bedrock Knowledge Bases abstracts the backend. You can start on the cheap, correct store for today and migrate when you outgrow it by re-indexing. Do not let fear of lock-in push you into an oversized, overpriced store on day one; pick for now, and keep the embedding pipeline reproducible so re-indexing later is routine.

putting it together

VIIA reference RAG retrieval stack — and how the store fits

The vector database does not live alone. Seeing where it sits in a complete RAG retrieval flow clarifies which decisions are the store's job and which belong to the layers around it — and prevents the common error of blaming the store for problems that are really upstream.

A production RAG retrieval pipeline has the same shape regardless of which store you pick. Ingestion (offline): source documents in S3 are chunked, each chunk is embedded by an embeddings model (such as Amazon Titan Text Embeddings or Cohere Embed on Bedrock), and the vectors plus their metadata are written to the vector store. Query (online): the user question is embedded with the same model, the vector store returns the top-k most similar chunks (optionally filtered by metadata and fused with keyword results in hybrid search), an optional re-ranking step reorders the candidates for relevance, and the surviving chunks are passed to the language model as grounding context.

The vector store owns exactly two steps in that flow: storing the vectors and returning the nearest neighbors (with filtering/hybrid logic). Everything else — chunking strategy, choice of embedding model, re-ranking, prompt assembly — sits in the layers around it, and most RAG quality problems live in those layers, not in the store. A useful discipline: when retrieval quality is poor, check chunking and embeddings and re-ranking before concluding you chose the wrong database. The store is responsible for being fast, correctly filtered, and affordable; it is not responsible for whether you chunked your documents sensibly.

This is also why store choice is lower-risk than it feels. Because Bedrock Knowledge Bases (or a thin retrieval interface in custom code) sits between your application and the store, swapping backends is a re-indexing job, not a rewrite — the same property called out in the mistakes section. Pick the store that fits your corpus size, QPS, latency, filtering, and budget today; keep the ingestion pipeline reproducible; and treat the store as a component you can upgrade, not a one-way door.

the decision, by the numbers

VIIIDecision table — pick by corpus size and query volume

The whole guide collapses into this: find your row by corpus size and column by query volume, and read the recommended store. These are starting points for the common cases, not absolute laws — adjust for a hard latency requirement (push toward Redis/MemoryDB), heavy hybrid-search needs (push toward OpenSearch), or a graph-shaped corpus (Neptune Analytics).

vector store on AWS by corpus size × query volume · 2026 starting points

Corpus size	Low QPS (POC / internal)	Medium QPS (production app)	High QPS / real-time
Small (< ~1M vectors)	Aurora pgvector (cheapest); avoid OpenSearch floor	Aurora pgvector, or OpenSearch if hybrid search needed	Redis / MemoryDB if single-digit-ms required
Mid (~1M–10M)	Aurora pgvector or S3 Vectors	OpenSearch Serverless (hybrid + scale) or Aurora pgvector	OpenSearch Serverless; Redis for hot subset
Large (10M–100M+)	S3 Vectors (cost) or OpenSearch	OpenSearch Serverless, or Pinecone to skip ops	OpenSearch + Redis hot tier
Huge / cost-dominated	S3 Vectors	S3 Vectors hot/cold tier with OpenSearch	OpenSearch hot tier + S3 Vectors cold tier
Graph-shaped (any size)	Neptune Analytics (GraphRAG)	Neptune Analytics (GraphRAG)	Neptune Analytics + caching

Cross-cutting overrides: choose Pinecone in any cell where buying away all database ops is worth more than AWS-native consolidation; choose Redis/MemoryDB in any cell where single-digit-millisecond latency is a hard requirement; for managed RAG, prefer a Bedrock Knowledge Bases-supported backend (OpenSearch, Aurora, Pinecone, Neptune Analytics, S3 Vectors) and explicitly override the OpenSearch default when small.

the six options, side by side

OpenSearch vs Aurora pgvector vs Pinecone vs Redis vs Neptune Analytics vs S3 Vectors

The full field on one screen. Read it against your two numbers (corpus size, QPS) plus your latency, filtering, and ops constraints. No row is a verdict on its own — the right pick is the option whose profile matches your workload at the lowest justified cost.

Dimension	OpenSearch Serverless	Aurora pgvector	Pinecone	Redis / MemoryDB	Neptune Analytics	S3 Vectors
Cost shape	Floor-based (2-OCU min, ~$700/mo)	Aurora compute+storage (scales down)	SaaS usage-based	In-memory nodes (highest/vector)	Graph-engine capacity	Storage + query (lowest at rest)
Best corpus size	Millions → 100M+	Up to low millions	Millions → very large	Smaller / hot subset	Graph-shaped, any size	Large → huge
Latency band	Tens of ms	Tens of ms	Tens of ms	Single-digit ms	Tens of ms (graph)	Sub-second
Hybrid (BM25+vector)	Native, first-class	DIY (PG full-text)	Managed	Limited	Graph + vector	Limited (validate)
Metadata filtering	Rich, first-class	Plain SQL WHERE	Good, managed	Supported	Graph-aware	Supported (validate)
Ops burden	Low (autoscaling)	Moderate (you own schema)	Near-zero (SaaS)	Moderate–high	Higher (specialized)	Near-zero (it is S3)
Bedrock KB backend	Yes (the default)	Yes	Yes	Not first-class	Yes (GraphRAG)	Yes
Reach for it when	Scale + hybrid search	Postgres shop / small-mid	Buy away ops	Single-digit-ms latency	Relationships matter	Cheap at huge scale

The most consequential single fact in this table is the OpenSearch Serverless cost shape: the ~$700/month OCU floor makes it the wrong default for small workloads and a strong choice once scale and hybrid search justify it. For self-hosting Milvus/Weaviate/Qdrant/Chroma on EC2/EKS, expect full operational ownership in exchange for control — justified by the same narrow conditions as any self-host decision.

want the store chosen and the RAG stack built for you?

Get matched with a vetted AWS partner who designs RAG retrieval on AWS — often AWS-funded

Start in 3 minutes →

a recent match

A $4,000/month vector-store bill, cut to under $300 — anonymized

inquiry · seed-stage b2b saas, knowledge assistant, US

Seed-stage B2B SaaS, ~12 engineers, on AWS, building a customer-facing knowledge assistant over ~200K document chunks

Situation: The team had shipped a RAG assistant on Bedrock Knowledge Bases by accepting the defaults, which provisioned OpenSearch Serverless. With only ~200K chunks and modest query volume, they were nonetheless paying around $4,000/month — the OCU floor plus a redundant production configuration — for a workload that was tiny relative to that capacity. They assumed the cost was inherent to RAG and were about to cut features to afford it. They wanted someone who knew the AWS vector-store landscape to tell them whether the bill was avoidable before they made product sacrifices.

What CloudRoute did: Routed within a day to a vetted AWS partner with production RAG experience. The partner confirmed the corpus and QPS did not justify OpenSearch Serverless, re-pointed Bedrock Knowledge Bases at Aurora PostgreSQL (pgvector) on Serverless v2 — which scaled down with their low traffic — and re-indexed the corpus through the same ingestion pipeline. Metadata filtering moved to plain SQL WHERE clauses, which also tightened per-tenant isolation. Retrieval latency stayed in the tens-of-milliseconds range, invisible against generation time. The work was scoped and filed as an AWS-funded GenAI POC, so the migration effort was credit-covered.

Outcome: The vector-store line item dropped from ~$4,000/month to under $300/month with no change in answer quality and no features cut. Per-tenant filtering correctness improved as a side effect of moving to SQL. The team kept the embedding pipeline reproducible so they can re-index onto OpenSearch Serverless later if they ever reach the scale that justifies it. CloudRoute's commission was paid by the partner from AWS engagement funding — the customer paid $0.

monthly store cost: ~$4,000 → <$300 · quality change: none · re-index time: days · cost to customer: $0

faq

Common questions

What is the best vector database on AWS for RAG?

There is no single best one — it depends on your corpus size and query volume, filtered by how much operational work you want to own. As a practical default: use Aurora PostgreSQL with pgvector for small-to-mid corpora or if your data already lives in Postgres; move to Amazon OpenSearch Serverless when you need scale plus hybrid (keyword + vector) search and the cost is justified; use Amazon S3 Vectors for huge, cost-sensitive, latency-tolerant corpora; choose Pinecone to buy away all database operations; and reach for Redis/MemoryDB only when single-digit-millisecond latency is a hard requirement. All except Redis are first-class Bedrock Knowledge Bases backends.

Why is Amazon OpenSearch Serverless so expensive for a small RAG project?

Because it bills in OpenSearch Compute Units (OCUs) and enforces a minimum of 2 OCUs for a vector collection — one for indexing, one for search — billed continuously regardless of how little data or traffic you have. At roughly $0.24 per OCU-hour, that floor is about $0.48/hour, or on the order of ~$700/month before storage and before any redundancy (production setups provision more OCUs). This is the most common cost surprise in AWS RAG. It is an excellent store at scale and a poor choice for a small proof-of-concept. For small workloads, use Aurora pgvector or S3 Vectors instead, and switch to OpenSearch once scale and hybrid-search needs justify the floor.

Should I use OpenSearch or pgvector for vector search on AWS?

Use Aurora PostgreSQL with pgvector when your corpus is small-to-mid (up to the low millions of vectors), your data already lives in Postgres, or you want vectors and relational data in one system with SQL-based metadata filtering — it scales down cheaply on Aurora Serverless v2 and avoids the OpenSearch cost floor. Use OpenSearch Serverless when you need large scale (millions to hundreds of millions of vectors), native hybrid search (BM25 keyword fused with vector similarity), rich filtering and aggregations, and the spend is justified. A common path is to start on pgvector and migrate to OpenSearch by re-indexing once you outgrow it.

What is Amazon S3 Vectors and when should I use it?

Amazon S3 Vectors brings native vector storage and similarity search to Amazon S3 via vector buckets, offering sub-second query latency at object-storage prices. Use it for very large, cost-sensitive, or infrequently queried corpora where you can tolerate sub-second (rather than single-digit-millisecond) latency — it has the lowest at-rest cost of any option because vectors live in S3 rather than in RAM or a running cluster, and there is no continuous compute floor. A popular pattern is tiering: keep hot, frequently queried vectors in OpenSearch and cold vectors in S3 Vectors. It integrates with Bedrock Knowledge Bases and OpenSearch. Validate its filtering semantics and latency against your specific workload first.

Is Pinecone worth it over AWS-native vector stores?

Pinecone is worth it when buying away all database operations is more valuable to you than AWS-native consolidation. It is a fully managed, purpose-built vector database (available via AWS Marketplace and supported by Bedrock Knowledge Bases) with strong developer experience and serverless usage-based pricing that avoids paying for idle capacity. The tradeoff: it is a third-party vendor sitting outside your AWS bill, your AWS credits and commitments, your IAM model, and potentially your VPC boundary, which can complicate procurement, security reviews, and data residency. Price it against the engineering time you would otherwise spend operating a store — for some teams that is a clear win; for others, AWS-native (Aurora pgvector or OpenSearch) is the better fit.

When should I use Redis or MemoryDB for vector search?

Use Redis or Amazon MemoryDB for vector search only when single-digit-millisecond retrieval latency is a hard requirement — real-time recommendations, semantic caching, or low-latency personalization in a hot request path — or when you already run Redis/MemoryDB and want to reuse it. Because everything lives in RAM, it is the lowest-latency option but also the most expensive per stored vector, so it is a poor home for a large general RAG corpus. It is also not a first-class Bedrock Knowledge Bases backend, so teams typically use it for custom retrieval paths rather than managed RAG. For most RAG question-answering, where the language-model generation step dominates latency, an in-memory store is over-buying speed you will not notice.

What is Neptune Analytics and when does GraphRAG make sense?

Amazon Neptune Analytics is the analytics engine of Amazon Neptune (AWS's graph database) with built-in vector search, letting you combine graph traversal with vector similarity in one query — the foundation of GraphRAG. It makes sense when relationships matter as much as similarity: retrieving not just semantically similar chunks but the entities and relationships connected to them, which improves answers on highly interconnected knowledge such as org structures, regulations, supply chains, or knowledge graphs. It is supported by Bedrock Knowledge Bases. For plain "find similar chunks" RAG with no relationship structure, it is overkill — you would be paying for a graph engine you are not using, so reserve it for genuinely graph-shaped data.

Does Amazon Bedrock Knowledge Bases lock me into one vector database?

No. Bedrock Knowledge Bases supports multiple vector-store backends — Amazon OpenSearch Serverless (the default it provisions), Amazon Aurora PostgreSQL with pgvector, Pinecone, Amazon Neptune Analytics for GraphRAG, and Amazon S3 Vectors — and it abstracts ingestion and retrieval so the backend is swappable. The key caution: if you accept the defaults, Knowledge Bases creates OpenSearch Serverless and you inherit its ~$700/month OCU floor, which is wrong for a small POC. Explicitly choose Aurora pgvector or S3 Vectors for small workloads. Because embeddings are portable and the pipeline is reproducible, migrating between backends later is a re-indexing job, not a rewrite.

How do I estimate the cost of a vector database on AWS?

Translate every option into "cost for my corpus and my traffic," because the pricing models are not directly comparable. OpenSearch Serverless is floor-based: roughly ~$700/month minimum (2 OCUs) regardless of size, plus storage, so it dominates small workloads and only makes sense at scale. Aurora pgvector is Aurora compute + storage, which scales down on Serverless v2 for small workloads and is predictable when sized for large ones. S3 Vectors is storage at S3 prices plus per-query charges with no compute floor — cheapest at rest. Pinecone is SaaS usage-based pricing billed by the vendor. Redis/MemoryDB scales with in-memory node size, the highest per vector. Estimate by your vector count, dimensions, and QPS, and remember to price the floor-vs-usage distinction — it explains most surprises.

Want the right vector store chosen — and the RAG stack built — without the trial and error?

CloudRoute routes you to a vetted AWS partner who picks the vector database for your corpus and traffic, builds the retrieval pipeline on Bedrock, and avoids the cost traps — often as an AWS-funded GenAI POC, so you pay $0. No procurement. No open-ended consulting bill.

Get matched in 24h →→ see the data & AI persona detail

matched within< 24h

common store savings5–10×

cost to you$0