for AWS partners →Build managed RAG on AWS for $0 →

amazon bedrock knowledge bases · managed RAG · 2026

Amazon Bedrock Knowledge Bases — managed RAG, end to end.

Q: What are Amazon Bedrock Knowledge Bases?

Amazon Bedrock Knowledge Bases is a fully-managed retrieval-augmented-generation (RAG) pipeline. You point it at your data (S3, a web crawler, Confluence, SharePoint, or Salesforce), choose an embeddings model and a vector store, and Bedrock handles parsing, chunking, embedding, storing the vectors, and keeping them in sync as the source changes. You then query it through the Retrieve API (get relevant chunks) or RetrieveAndGenerate API (get a grounded, cited answer in one call) — without building or operating the retrieval infrastructure yourself.

Q: What data sources can a Bedrock Knowledge Base connect to?

As of 2026, supported first-party connectors include Amazon S3 (documents in a bucket — the default), a Web Crawler (public web pages within a scope you set), Atlassian Confluence (wiki spaces/pages), Microsoft SharePoint (sites and document libraries), and Salesforce (objects like Knowledge articles and cases). You can attach multiple data sources to one Knowledge Base, and each connector has its own sync model. The connector list expands over time — confirm current support in the AWS Bedrock documentation.

Q: What chunking strategies does Bedrock support, and which should I use?

Bedrock supports fixed-size chunking (set token length with overlap — a solid default), semantic chunking (splits at meaning boundaries for coherent chunks — good for long prose), hierarchical chunking (small child chunks for precise retrieval, larger parent chunks returned for context — good for technical docs and long manuals), no chunking (one chunk per file — for short, self-contained docs), and custom chunking via a Lambda. Chunking is the highest-leverage retrieval-quality decision: start with fixed-size, then move long or layout-heavy corpora to semantic or hierarchical based on what the model actually retrieves.

Q: What is FM parsing in Bedrock Knowledge Bases?

FM (foundation-model) parsing uses a multimodal foundation model to read complex documents — PDFs with tables, multi-column layouts, scanned pages, charts, or meaningful images — and produce a faithful structured representation, rather than naive text extraction. It costs more per document because you are paying a model to read each page, but it is often the difference between a layout-heavy corpus (financial reports, engineering specs) being usable or useless. Use standard parsing by default and enable FM parsing for sources where tables and layout carry the meaning.

Q: Which vector store should I use with a Bedrock Knowledge Base?

Bedrock supports Amazon OpenSearch Serverless (the default — Bedrock can auto-create it; best for fast start and most production), Amazon Aurora PostgreSQL with pgvector (best if you already run Postgres and want low cost at low volume), Pinecone (a purpose-built managed vector DB — reuse if you already have it), Redis Enterprise Cloud (very low latency / existing Redis), and Amazon Neptune Analytics (vector + graph, for GraphRAG over relationship-rich data). Default to OpenSearch Serverless for simplicity or Aurora pgvector to minimize new infrastructure; reuse Pinecone/Redis if standardized on them.

Q: What is the difference between the Retrieve and RetrieveAndGenerate APIs?

Retrieve embeds your query, searches the vector store, and returns the top-matching chunks (text, relevance score, and metadata) without calling a generation model — you keep full control of the prompt, model, and any re-ranking. RetrieveAndGenerate does the whole RAG loop in one call: it retrieves, builds the prompt, calls a foundation model you specify, and returns a natural-language answer with citations back to the source chunks, and it supports multi-turn sessions. Use RetrieveAndGenerate to ship a cited "chat with your docs" experience fast; use Retrieve when you need control over generation or are feeding an agent or Flow.

Q: When should I use managed Knowledge Bases instead of building RAG myself?

Use managed Knowledge Bases when your needs are standard — index documents, retrieve relevant chunks, generate cited answers — because it is far faster to launch (hours vs weeks), cheaper to operate (AWS runs the pipeline and sync), and good enough for most products. Build DIY RAG when you need exotic retrieval (custom re-rankers, a particular hybrid keyword+vector setup, an embeddings model not on Bedrock) or you are building a RAG platform rather than a feature. A common middle path is using the Retrieve API for managed ingestion/retrieval while owning generation and orchestration yourself.

Q: How much do Bedrock Knowledge Bases cost, and can AWS credits cover it?

There is no single price — the cost is a stack: embeddings (per input token to embed your corpus and each query — very cheap per token), the vector store (usually the largest standing cost; OpenSearch Serverless has a baseline minimum, while Aurora Serverless v2 + pgvector is often cheapest at low volume), inference (normal Bedrock token cost when you use RetrieveAndGenerate, including retrieved context as input), plus FM parsing per page if enabled and S3 storage. At prototype scale it is typically single-digit to low-tens of dollars a month, growing with corpus size and query volume. Every layer is AWS-credit-eligible and draws down your AWS credits automatically — the relevant pools are AWS Activate (up to $100K), a Bedrock/generative-AI POC pool ($10K–$50K) aimed at exactly this kind of use case, and the GenAI Accelerator (up to $1M). These are largely partner-filed via the AWS Partner Network, which is why teams route through a partner; CloudRoute matches you to the right pool and a vetted AWS partner who files the application and builds the Knowledge Base, so the customer pays $0. Confirm current rates on the AWS pricing page.

A complete, neutral reference for Amazon Bedrock Knowledge Bases in 2026: what they are (a fully-managed retrieval-augmented-generation pipeline), the data sources they connect (S3, web crawler, Confluence, SharePoint, Salesforce), how ingestion works (chunking strategies, parsing including FM parsing for complex docs), which embeddings model and vector store to pick, the Retrieve and RetrieveAndGenerate APIs, metadata filtering, when managed beats DIY — and how AWS credits make the whole build $0.

Build managed RAG on AWS for $0 →→ jump to vector-store options

what it is

managed RAG

default vector store

OpenSearch Serverless

core APIs

Retrieve / RetrieveAndGenerate

cost with credits

TL;DR

Knowledge Bases is Bedrock's fully-managed RAG pipeline: you point it at your data (S3, a website, Confluence, SharePoint, Salesforce), and it handles ingestion — parsing, chunking, embedding, and writing vectors to a vector store — then answers questions over that data through two APIs. You do not build or operate the retrieval plumbing yourself.
The two things you tune are the ingestion pipeline (chunking strategy — fixed, semantic, or hierarchical; and parsing — standard or foundation-model parsing for complex PDFs) and the storage stack (which embeddings model — Titan or Cohere — and which vector store — OpenSearch Serverless by default, or Aurora pgvector, Pinecone, Redis, or Neptune Analytics). Those two choices drive both answer quality and cost.
You consume it through Retrieve (get the relevant chunks back for your own prompt) or RetrieveAndGenerate (Bedrock retrieves and a model writes the grounded answer with citations in one call). The cost is a stack — embeddings + the vector store + inference — and at prototype scale it is small; AWS credits (Activate up to $100K, a Bedrock/GenAI POC pool $10K–$50K, the GenAI Accelerator up to $1M) cover it. CloudRoute routes you to the credit pool and a vetted partner to build it, so you pay $0.

the concept

IWhat Amazon Bedrock Knowledge Bases actually are

Knowledge Bases is the part of Amazon Bedrock that turns "ask questions over my own documents" from a multi-service engineering project into a managed feature. The clearest one-line definition: it is a fully-managed retrieval-augmented-generation (RAG) pipeline.

To see why that matters, it helps to know what RAG is and why teams build it. A foundation model only knows what was in its training data; it has never seen your internal wiki, your product manuals, your support tickets, or last quarter's contracts. Retrieval-augmented generation fixes that by retrieving the relevant snippets of your data at question time and putting them into the model's context, so the answer is grounded in your facts rather than the model's memory. RAG is how most enterprise "chat with your documents" and "answer from our knowledge base" products work, because it is cheaper, faster to update, and more auditable than fine-tuning a model on the same data.

Building RAG by hand means stitching together at least five moving parts: a way to load documents from wherever they live; a parser to extract clean text (and tables and images) from messy formats like PDF; a chunker to split that text into retrieval-sized pieces; an embeddings model to turn each chunk into a vector; a vector store to hold those vectors and search them by similarity; and then the query-time logic to embed the question, retrieve the closest chunks, assemble a prompt, call a model, and return a cited answer. Each piece is a service to deploy, secure, scale, and keep in sync as the source data changes.

Knowledge Bases collapses all of that into one managed service. You declare a data source and a vector store, choose an embeddings model and a chunking strategy, and Bedrock runs the ingestion pipeline for you — parsing, chunking, embedding, and writing the vectors — and keeps it in sync when the underlying data changes. At query time you call one of two APIs and get back either the relevant chunks or a fully-grounded, cited answer. You never write the retrieval loop, and you never operate the embedding or sync infrastructure.

It is worth being precise about what is and is not managed. Bedrock manages the pipeline — the orchestration of parse → chunk → embed → store → retrieve. It does not hide the vector store: you bring (or let it create) a real vector database that you can see and pay for. And it does not remove your decisions — chunking strategy, embeddings model, and vector store are all yours to choose, and they materially change quality and cost. Knowledge Bases removes the undifferentiated heavy lifting, not the architecture decisions.

the one-sentence definition

Amazon Bedrock Knowledge Bases is a fully-managed RAG pipeline: point it at your data, pick an embeddings model and a vector store, and it handles parsing, chunking, embedding, syncing, and retrieval — exposed through the Retrieve and RetrieveAndGenerate APIs. You get grounded, cited answers over your own data without building the plumbing.

where your data lives

IIData sources — what you can point a Knowledge Base at

A Knowledge Base is only as useful as the data it can reach. Bedrock supports a growing set of first-party data-source connectors so you can index data where it already lives rather than copying it into a bucket first.

The foundational source is Amazon S3 — you put documents (PDF, plain text, HTML, Markdown, Word, CSV, and more) in a bucket, point the Knowledge Base at the prefix, and it ingests them. S3 is the path most teams start with because almost any pipeline can drop files into a bucket. Beyond S3, Bedrock offers connectors that crawl or sync from systems of record so the source content stays where its owners maintain it:

Amazon S3 — The default. Index documents from a bucket/prefix — PDF, TXT, HTML, Markdown, DOCX, CSV, and more. Best when you control the files or can export to S3 on a schedule. Supports attaching a sidecar metadata file per document for filtering (see §VI).
Web Crawler — Point it at one or more seed URLs and it crawls public web pages within the scope and rate limits you set, ingesting the page content. Good for public docs sites, marketing knowledge, or any browsable corpus you do not have as files.
Confluence — Connects to Atlassian Confluence and syncs spaces/pages so an internal wiki becomes queryable. Respects the connector's configured scope; pairs with metadata filtering to keep space- or label-level boundaries.
Microsoft SharePoint — Syncs SharePoint sites and document libraries — the most common home for enterprise documents — so files stay governed in SharePoint while the Knowledge Base indexes them.
Salesforce — Connects to Salesforce objects (e.g. Knowledge articles, cases) so support and CRM content can ground answers. Useful for customer-facing assistants that need product and account context.

A few practical notes. First, you can attach multiple data sources to one Knowledge Base, so a single assistant can answer across S3 documents, a Confluence wiki, and a website at once. Second, each connector has its own sync model — you trigger or schedule an ingestion job, and the Knowledge Base reflects additions, changes, and deletions from the source on the next sync. Third, connectors honor the scope you configure (which spaces, sites, URL patterns, or prefixes), which is the first line of access control — though sensitive deployments should also lean on metadata filtering and Bedrock Guardrails. The exact connector list and capabilities expand over time, so confirm current support in the AWS Bedrock documentation when you scope a build.

the ingestion pipeline

IIIHow ingestion works — parsing, chunking, and embedding

Ingestion is where a Knowledge Base earns its keep, and it is also where the two highest-leverage quality decisions live: how documents are parsed, and how they are chunked. Get these right and retrieval is sharp; get them wrong and the model retrieves noise no matter how good it is.

When you run a sync, the pipeline executes four steps for every document: parse (extract clean text and structure from the source format), chunk (split that text into retrieval-sized pieces), embed (turn each chunk into a vector with the embeddings model you chose), and store (write the vectors plus their source text and metadata into the vector store). The two steps you actively configure are parsing and chunking.

Parsing — standard text extraction vs FM parsing

Standard parsing extracts the text from a document and works well for clean, text-first files. It struggles with complex documents — PDFs full of tables, multi-column layouts, scanned pages, charts, or images that carry meaning. For those, Bedrock offers foundation-model (FM) parsing: instead of naive text extraction, a multimodal foundation model reads the page and produces a faithful structured representation — preserving tables as tables, capturing the content of figures, and respecting layout. FM parsing costs more per document (you are paying a model to read each page) but is often the difference between a financial-report or engineering-spec corpus being usable or useless. The honest guidance: use standard parsing by default, and turn on FM parsing for sources where layout and tables carry the meaning.

Chunking strategies — fixed, semantic, and hierarchical

Chunking decides how text is cut into the pieces that get embedded and retrieved, and it is the single biggest lever on retrieval quality. Bedrock supports several strategies. Fixed-size chunking splits text into chunks of a set token length with a configurable overlap between neighbours — simple, predictable, and a fine default. Semantic chunking uses embeddings to find natural topic boundaries and splits there, so each chunk is a coherent idea rather than an arbitrary span — better for prose where a fixed cut might slice a thought in half. Hierarchical chunking builds parent/child chunks: small child chunks are embedded and searched for precision, but the larger parent chunk is what gets returned to the model for context — combining sharp retrieval with enough surrounding text to answer well. You can also supply no chunking (treat each file as one chunk) when documents are already short and self-contained, or use a custom transformation (e.g. via a Lambda) for bespoke logic.

Embedding and storing

Each chunk is then passed to the embeddings model, which returns a vector — a list of numbers that captures the chunk's meaning so that semantically similar text lands near it in vector space. Those vectors, along with the original chunk text and any metadata, are written to the vector store. From that point the corpus is queryable: a question is embedded the same way, and the store returns the chunks whose vectors are closest. Re-running a sync after the source changes updates only what changed, keeping the index current.

bedrock knowledge bases chunking strategies compared · 2026

Strategy	How it splits	Strength	Watch out for	Good for
Fixed-size	Set token length + overlap	Simple, predictable, cheap	Can cut mid-idea	General default, uniform docs
Semantic	At meaning boundaries (via embeddings)	Coherent, self-contained chunks	Extra embedding cost to find boundaries	Long prose, mixed-topic docs
Hierarchical	Small child chunks + larger parents	Precise retrieval, rich context returned	More config + storage	Technical docs, long manuals
None (per-file)	One chunk per document	Keeps short docs whole	Poor for long files	FAQs, short articles
Custom (Lambda)	Your own transformation	Full control	You own the logic	Bespoke formats / rules

Chunking is the highest-leverage retrieval-quality decision. Many teams start with fixed-size, then move long or layout-heavy corpora to semantic or hierarchical once they see what the model retrieves. Pair hierarchical chunking with FM parsing for complex technical PDFs.

choosing the model

IVEmbeddings model choice — Titan and Cohere

The embeddings model turns text into the vectors that power retrieval. Bedrock lets you choose which one a Knowledge Base uses, and the choice is a quiet but real lever on both quality and cost — and it is effectively permanent for a given index.

The two main families on Bedrock are Amazon Titan Text Embeddings and Cohere Embed. Titan Text Embeddings is Amazon's own embeddings model, available in versions that trade off vector dimensionality and cost; it is the common default and is well-integrated and inexpensive. Cohere Embed is a strong alternative, offered in English and multilingual variants — the multilingual model is the usual pick when your corpus or your users span many languages. Both are billed per input token (the output vector is not charged), at the very low embeddings rates covered in §VII.

Two technical points matter when you choose. First, dimensionality: embeddings models output vectors of a fixed size (a few hundred to a couple of thousand numbers). Larger vectors can capture more nuance but cost more to store and search; some models let you pick a smaller dimension to save on storage and latency. Second — and this is the one teams forget — the embeddings model and the index are bound together. Vectors from one model are not comparable to vectors from another, so you cannot swap embeddings models without re-embedding the entire corpus into a fresh index. Choose deliberately up front, because changing later means a full re-ingestion.

For most English-language corpora, Titan Text Embeddings is a sensible, low-cost default. Reach for Cohere's multilingual model when language coverage is a first-class requirement. Either way, the embeddings model is a smaller quality lever than chunking and parsing — pick a reasonable one and spend your tuning effort on the ingestion pipeline first.

where the vectors live

VVector store options — OpenSearch Serverless, Aurora, Pinecone, Redis, Neptune Analytics

A Knowledge Base needs a vector store to hold and search the embeddings. This is the one piece Bedrock does not abstract away — you choose the store, you can see it, and you pay for it directly. The choice affects cost, latency, operational model, and whether you reuse infrastructure you already run.

Bedrock can create and manage a vector store for you (the quickstart path) or connect to one you already operate. The default and fastest way to get started is Amazon OpenSearch Serverless; the alternatives matter when you have an existing database investment, specific cost targets, or a vendor preference. Here is the practical rundown:

Amazon OpenSearch Serverless (default) — The zero-setup option — Bedrock can provision a vector collection for you in a click. Fully managed, scales automatically, no servers. The trade-off is that serverless capacity has a baseline cost even at low volume, so it can feel expensive for a tiny prototype. Best for: getting started fast and most production deployments that want managed search.
Amazon Aurora PostgreSQL (pgvector) — Uses the pgvector extension in Aurora PostgreSQL. Attractive when you already run Postgres/Aurora — vectors live alongside your relational data, you reuse existing operational know-how, and Aurora Serverless v2 can scale to low cost. Best for: teams standardized on Postgres who want one database for everything.
Pinecone — A purpose-built managed vector database (third-party) with a strong reputation for vector search at scale and low query latency. Connect an existing Pinecone index to the Knowledge Base. Best for: teams already using Pinecone or wanting a dedicated, vendor-managed vector layer.
Redis Enterprise Cloud — Redis with vector-search capability — very low latency and a fit if you already run Redis for caching/state. Best for: latency-sensitive retrieval and teams with existing Redis infrastructure.
Amazon Neptune Analytics — Combines vector search with graph capabilities, enabling GraphRAG-style patterns where relationships between entities augment pure semantic similarity. Best for: highly connected, relationship-rich data where graph context improves answers.

The default decision tree is simple. If you just want it to work, let Bedrock provision OpenSearch Serverless. If you already run Aurora/Postgres and want to minimize new infrastructure (and cost at low volume), use pgvector. If your organization already standardizes on Pinecone or Redis, reuse it. If your data is graph-shaped and relationships matter, evaluate Neptune Analytics. The supported-store list grows over time, so confirm current options in the AWS docs — but for the large majority of builds, OpenSearch Serverless or Aurora pgvector is the right answer.

how you query it

VIThe Retrieve and RetrieveAndGenerate APIs (and metadata filtering)

Once a Knowledge Base is built and synced, you query it through two APIs. Which one you call decides how much of the RAG loop Bedrock runs versus how much you keep control of — and metadata filtering is the feature that makes retrieval precise and access-aware.

Retrieve — get the relevant chunks back

The Retrieve API takes a query, embeds it, searches the vector store, and returns the top-matching chunks — the source text, a relevance score, and the location/metadata of each. Crucially, it does not call a generation model. You get the raw retrieved context and do whatever you want with it: assemble your own prompt, mix in other context, route to a specific model, run your own re-ranking, or feed it into an agent or a Bedrock Flow. Retrieve is the right call when you want the quality of managed retrieval but full control over the generation step.

RetrieveAndGenerate — grounded, cited answers in one call

The RetrieveAndGenerate API does the whole RAG loop in a single request: it retrieves the relevant chunks, constructs the prompt, calls a foundation model you specify, and returns a natural-language answer with citations back to the source chunks. It also supports multi-turn conversations, carrying session context so follow-up questions work. This is the fastest path to a working "chat with your docs" experience — one API call, grounded answer, citations included — and it is what most teams use until they need the finer control of Retrieve. The citations are not a nice-to-have: they are what make the answer auditable and let your UI link users back to the source.

Metadata filtering — precision and access control

Every chunk can carry metadata — fields like document type, author, date, department, product line, or tenant ID — supplied via a sidecar metadata file in S3 or pulled from the source connector. At query time you can apply metadata filters so retrieval only considers chunks matching a condition (e.g. department = "finance", year >= 2024, or tenant = "acme"). This does two things: it sharpens relevance by excluding irrelevant chunks before similarity search, and it is a key building block for access control and multi-tenancy — you scope each user's queries to the data they are allowed to see. Combined with Bedrock Guardrails for content safety, metadata filtering is how a single Knowledge Base safely serves many users or tenants.

which API to use

Use RetrieveAndGenerate to ship a cited "chat with your data" experience fast — one call does retrieval + generation + citations. Use Retrieve when you need control of the generation step — your own prompt, model routing, re-ranking, or feeding an agent/Flow. Apply metadata filters on either for precision and per-user/tenant access scoping.

the build decision

VIIManaged Knowledge Bases vs DIY RAG — when each wins

Knowledge Bases is not the only way to do RAG on AWS. You can build it yourself — with your own loaders, a framework like LangChain or LlamaIndex, an embeddings call, and a vector store — and for some teams that is the right call. Here is the honest trade-off.

Managed Knowledge Bases wins on time-to-value and operational burden: you get a synced, parsed, chunked, embedded, queryable corpus with citations in hours, not weeks, and AWS operates the pipeline. DIY RAG wins on control and flexibility: you can use any embeddings model (including ones not on Bedrock), implement custom retrieval logic like hybrid keyword+vector search or sophisticated re-ranking, do unusual chunking, or integrate retrieval steps that the managed pipeline does not expose. The middle path is common too — use Retrieve for managed ingestion and retrieval, but own the generation and orchestration yourself.

A useful way to decide: if your RAG needs are standard — index documents, retrieve relevant chunks, generate cited answers — managed Knowledge Bases will be faster, cheaper to operate, and good enough, and you should reach for DIY only when you hit a specific wall. If you already know you need exotic retrieval (custom re-rankers, hybrid search tuned a particular way, an embeddings model only available elsewhere) or you are building a RAG platform rather than a RAG feature, DIY gives you the control. Most product teams should start managed and graduate specific pieces to custom as concrete requirements emerge — see the rag-on-aws sibling for the full architectural picture.

managed bedrock knowledge bases vs DIY RAG on AWS · 2026

Dimension	Managed Knowledge Bases	DIY RAG (your own stack)
Time to first answer	Hours — declare source + store, sync	Days–weeks — build loaders, pipeline, query loop
Who operates it	AWS manages the pipeline + sync	You operate every component
Parsing + chunking	Built-in (incl. FM parsing, semantic/hierarchical)	You implement or wire a framework
Embeddings model	Titan or Cohere on Bedrock	Any model, anywhere
Vector store	OpenSearch Serverless / Aurora / Pinecone / Redis / Neptune	Any store you choose and run
Retrieval control	Retrieve / RetrieveAndGenerate + metadata filters	Fully custom (hybrid search, re-rankers, etc.)
Citations + sync	Built-in	You build them
Best for	RAG as a feature; standard needs; fast launch	RAG as a platform; exotic retrieval; max control

The hybrid path — managed ingestion/retrieval via Retrieve plus your own generation/orchestration — gives much of DIY's control with little of its operational cost. Start managed; graduate specific pieces only when a concrete requirement forces it.

the cost stack

VIIIWhat a Knowledge Base costs — and how AWS credits make it $0

A Knowledge Base does not have a single price; it has a cost stack of three layers plus parsing. Understanding the layers tells you where the money goes, how to keep it small, and why AWS credits cover all of it during the build.

The three recurring cost layers are: (1) embeddings — you pay the embeddings model per input token to embed your corpus at ingest and to embed every query (very cheap per token, but it scales with corpus size and re-ingestion); (2) the vector store — this is usually the largest standing cost, because OpenSearch Serverless and the managed alternatives carry an ongoing capacity charge whether or not you are querying (Aurora Serverless v2 with pgvector is often the cheapest at low volume); and (3) inference — when you use RetrieveAndGenerate, you pay the normal Bedrock token cost for the model that writes the answer, including the retrieved context as input tokens. On top of those, FM parsing adds a per-page model charge at ingestion time for complex documents, and you pay for the underlying S3 storage and any data-source connector costs.

Two cost patterns are worth internalizing. First, the retrieved context dominates inference input cost — every answer ships several chunks of your documents into the model as input tokens, so retrieval tuning (returning fewer, better chunks) is a cost lever, not just a quality lever; prompt caching on stable instructions helps too. Second, the vector store is the cost you pay even when idle — for a small or bursty workload, a serverless Postgres/pgvector store is frequently cheaper than always-on managed search. At prototype scale the whole stack is typically single-digit to low-tens of dollars a month; it grows with corpus size and query volume.

Which is exactly why so many teams build this on AWS credits and pay nothing out of pocket. Every layer here — embeddings, the vector store, FM parsing, and the generation inference — is credit-eligible and draws down your AWS credits automatically. The relevant pools are AWS Activate (commonly up to $100K for institutionally-funded startups), a dedicated Bedrock / generative-AI POC pool ($10K–$50K) aimed squarely at proving out a use case exactly like a RAG assistant, and the competitive Generative AI Accelerator (up to $1M). Most of these pools are partner-filed through the AWS Partner Network rather than a public form — which is the gap CloudRoute fills: we match you to the right pool for your stage and to a vetted AWS DevOps/ML partner who files the credit application and builds the Knowledge Base (data-source wiring, chunking and parsing tuning, vector-store selection, the Retrieve/RetrieveAndGenerate integration). The customer pays $0 — AWS funds the credits, AWS pays the partner through engagement-funding programs, and the partner pays CloudRoute a routing commission. See AWS credits for generative-AI startups and Bedrock POC funding for the full mechanics.

the cost stack in one line

embeddings (per token) + vector store (standing capacity — usually the biggest line) + inference (per token on RetrieveAndGenerate) + FM parsing (per page, if used) + S3. All of it is AWS-credit-eligible — which is why the build can be $0 while you prove the workload out.

vector store comparison

The five vector-store options side by side

The vector store is the one component you pick and pay for directly, and it is the most consequential infrastructure decision in a Knowledge Base. Here is how the five supported options compare on the dimensions that actually drive the choice. Cost notes are representative as of 2026 — confirm current pricing on the relevant AWS or vendor pricing page.

Vector store	Managed by	Setup effort	Cost shape	Standout strength	Pick it when
OpenSearch Serverless	AWS (Bedrock can auto-create)	Lowest — one click	Serverless capacity, baseline minimum	Zero-setup, auto-scaling	You want it to just work / most production
Aurora PostgreSQL (pgvector)	AWS (you run Aurora)	Low–medium	Aurora Serverless v2 scales low	Reuse your Postgres; cheap at low volume	You already run Postgres/Aurora
Pinecone	Pinecone (third-party)	Medium — external account	Pinecone pricing (pods/serverless)	Purpose-built vector DB at scale	You already use / prefer Pinecone
Redis Enterprise Cloud	Redis (third-party)	Medium — external account	Redis Enterprise pricing	Very low query latency	Latency-critical / existing Redis
Neptune Analytics	AWS	Medium	Neptune Analytics capacity	Vector + graph (GraphRAG)	Relationship-rich, connected data

Default to OpenSearch Serverless for speed, or Aurora pgvector to minimize new infrastructure and cost at low volume. Reuse Pinecone/Redis if you already run them. Reach for Neptune Analytics when graph relationships materially improve answers. The supported list grows — check the AWS Bedrock docs.

before you wire up a single connector

Get AWS credits that cover the whole RAG stack — and a partner to build it (you pay $0)

Get matched in 24h →

a recent match

A support assistant grounded in 40k docs — built on $0 — anonymized

inquiry · Series-A B2B SaaS, support automation, Berlin

Series-A B2B SaaS, 24 people, ~40,000 support articles + product PDFs across Confluence and S3

Situation: The team wanted a customer-facing support assistant that answered from their own documentation with citations — not a generic chatbot. Their content was split between a Confluence wiki and a pile of layout-heavy product PDFs in S3 (tables, diagrams, multi-column specs). An earlier DIY attempt with a hand-built pipeline had stalled: parsing the PDFs was producing garbage, retrieval was returning irrelevant chunks, and nobody owned operating the embedding/sync infrastructure. They also did not want to spend runway on inference and a vector database while still proving the feature out.

What CloudRoute did: CloudRoute matched them in under 24 hours to an EU AWS partner with RAG experience. The partner built it on managed Knowledge Bases: connected both the Confluence and S3 data sources to a single Knowledge Base; turned on FM parsing for the complex PDFs and hierarchical chunking so retrieval stayed precise while returning enough context; used Titan Text Embeddings into an Aurora pgvector store (the team already ran Postgres, keeping the standing cost low); shipped the assistant on RetrieveAndGenerate for one-call cited answers, with metadata filtering to scope answers by product line. In parallel, the partner filed a Bedrock POC credit application plus an Activate Portfolio application to fund the build.

Outcome: A cited, grounded support assistant was live in under three weeks, answering from the real corpus with source links — and the entire cost stack (embeddings, the Aurora vector store, RetrieveAndGenerate inference, FM parsing) was covered by the approved credits, so the team paid $0 during the build and early rollout. CloudRoute's commission was paid by the partner from AWS engagement funding, not by the customer.

corpus: ~40k docs across Confluence + S3 · time to live: < 3 weeks · credits secured: POC + Activate · out-of-pocket during build: $0

faq

Common questions

What are Amazon Bedrock Knowledge Bases?

Amazon Bedrock Knowledge Bases is a fully-managed retrieval-augmented-generation (RAG) pipeline. You point it at your data (S3, a web crawler, Confluence, SharePoint, or Salesforce), choose an embeddings model and a vector store, and Bedrock handles parsing, chunking, embedding, storing the vectors, and keeping them in sync as the source changes. You then query it through the Retrieve API (get relevant chunks) or RetrieveAndGenerate API (get a grounded, cited answer in one call) — without building or operating the retrieval infrastructure yourself.

What data sources can a Bedrock Knowledge Base connect to?

As of 2026, supported first-party connectors include Amazon S3 (documents in a bucket — the default), a Web Crawler (public web pages within a scope you set), Atlassian Confluence (wiki spaces/pages), Microsoft SharePoint (sites and document libraries), and Salesforce (objects like Knowledge articles and cases). You can attach multiple data sources to one Knowledge Base, and each connector has its own sync model. The connector list expands over time — confirm current support in the AWS Bedrock documentation.

What chunking strategies does Bedrock support, and which should I use?

Bedrock supports fixed-size chunking (set token length with overlap — a solid default), semantic chunking (splits at meaning boundaries for coherent chunks — good for long prose), hierarchical chunking (small child chunks for precise retrieval, larger parent chunks returned for context — good for technical docs and long manuals), no chunking (one chunk per file — for short, self-contained docs), and custom chunking via a Lambda. Chunking is the highest-leverage retrieval-quality decision: start with fixed-size, then move long or layout-heavy corpora to semantic or hierarchical based on what the model actually retrieves.

What is FM parsing in Bedrock Knowledge Bases?

FM (foundation-model) parsing uses a multimodal foundation model to read complex documents — PDFs with tables, multi-column layouts, scanned pages, charts, or meaningful images — and produce a faithful structured representation, rather than naive text extraction. It costs more per document because you are paying a model to read each page, but it is often the difference between a layout-heavy corpus (financial reports, engineering specs) being usable or useless. Use standard parsing by default and enable FM parsing for sources where tables and layout carry the meaning.

Which vector store should I use with a Bedrock Knowledge Base?

Bedrock supports Amazon OpenSearch Serverless (the default — Bedrock can auto-create it; best for fast start and most production), Amazon Aurora PostgreSQL with pgvector (best if you already run Postgres and want low cost at low volume), Pinecone (a purpose-built managed vector DB — reuse if you already have it), Redis Enterprise Cloud (very low latency / existing Redis), and Amazon Neptune Analytics (vector + graph, for GraphRAG over relationship-rich data). Default to OpenSearch Serverless for simplicity or Aurora pgvector to minimize new infrastructure; reuse Pinecone/Redis if standardized on them.

What is the difference between the Retrieve and RetrieveAndGenerate APIs?

Retrieve embeds your query, searches the vector store, and returns the top-matching chunks (text, relevance score, and metadata) without calling a generation model — you keep full control of the prompt, model, and any re-ranking. RetrieveAndGenerate does the whole RAG loop in one call: it retrieves, builds the prompt, calls a foundation model you specify, and returns a natural-language answer with citations back to the source chunks, and it supports multi-turn sessions. Use RetrieveAndGenerate to ship a cited "chat with your docs" experience fast; use Retrieve when you need control over generation or are feeding an agent or Flow.

When should I use managed Knowledge Bases instead of building RAG myself?

Use managed Knowledge Bases when your needs are standard — index documents, retrieve relevant chunks, generate cited answers — because it is far faster to launch (hours vs weeks), cheaper to operate (AWS runs the pipeline and sync), and good enough for most products. Build DIY RAG when you need exotic retrieval (custom re-rankers, a particular hybrid keyword+vector setup, an embeddings model not on Bedrock) or you are building a RAG platform rather than a feature. A common middle path is using the Retrieve API for managed ingestion/retrieval while owning generation and orchestration yourself.

How much do Bedrock Knowledge Bases cost, and can AWS credits cover it?

There is no single price — the cost is a stack: embeddings (per input token to embed your corpus and each query — very cheap per token), the vector store (usually the largest standing cost; OpenSearch Serverless has a baseline minimum, while Aurora Serverless v2 + pgvector is often cheapest at low volume), inference (normal Bedrock token cost when you use RetrieveAndGenerate, including retrieved context as input), plus FM parsing per page if enabled and S3 storage. At prototype scale it is typically single-digit to low-tens of dollars a month, growing with corpus size and query volume. Every layer is AWS-credit-eligible and draws down your AWS credits automatically — the relevant pools are AWS Activate (up to $100K), a Bedrock/generative-AI POC pool ($10K–$50K) aimed at exactly this kind of use case, and the GenAI Accelerator (up to $1M). These are largely partner-filed via the AWS Partner Network, which is why teams route through a partner; CloudRoute matches you to the right pool and a vetted AWS partner who files the application and builds the Knowledge Base, so the customer pays $0. Confirm current rates on the AWS pricing page.

Build managed RAG on AWS — funded

Whatever a Knowledge Base would cost — embeddings, the vector store, FM parsing, inference — AWS credits can cover it. CloudRoute routes you to the right credit pool (Activate up to $100K, Bedrock POC $10K–$50K, GenAI Accelerator up to $1M) and a vetted AWS partner to wire the data sources, tune chunking and parsing, pick the vector store, and ship the Retrieve/RetrieveAndGenerate integration. Customer pays $0.

Get matched in 24h →→ see the AI-team persona detail

matched within< 24h

GenAI credit ceilingup to $1M

cost to you$0