for AWS partners →Have a partner build your AI search — funded by AWS credits →

ai search on aws · the 2026 build guide

How to add AI search to your app on AWS (2026).

AI search — semantic, vector, and hybrid search — makes your product's search box understand meaning, not just match keywords, so a query for "can't log in" finds the "authentication troubleshooting" article. This is the full build guide: keyword vs semantic vs hybrid explained, the reference architecture (embed → index → query → re-rank → optionally generate an answer), every vector-store option on AWS, choosing embeddings (Titan vs Cohere), re-ranking for precision, generative answers via Amazon Bedrock, how to tune relevance, what it costs — and when to reach for the fully-managed Amazon Kendra instead of building it yourself.

Have a partner build your AI search — funded by AWS credits →→ jump to the reference architecture

search modes

pipeline stages

managed option

Amazon Kendra

credits to fund it

up to $100K

TL;DR

AI search replaces (or augments) keyword matching with semantic search: you turn each document into a vector with an embedding model, store the vectors in a vector index, and at query time find the items whose meaning is closest to the query — so it matches intent, synonyms, and phrasing the user never typed. On AWS the canonical pipeline is embed → index → query → re-rank → (optional) generate an answer.
Pure semantic search is not the goal — hybrid search is. Keyword/BM25 search nails exact terms, product names, SKUs, and acronyms; vector search nails meaning and synonyms. Fusing both (e.g. Reciprocal Rank Fusion) beats either alone on almost every real corpus. On AWS you can run both in one engine with Amazon OpenSearch Serverless, in Postgres with Aurora pgvector, or skip the build entirely with Amazon Kendra (fully-managed intelligent search with connectors and built-in ranking).
The build decision is build-vs-buy: assemble it on Bedrock embeddings + a vector store for maximum control and lowest token cost, or use Amazon Kendra for a managed search service with 40+ data-source connectors, ML ranking, and access-control filtering out of the box. Either way, search infrastructure and GenAI inference bills add up; CloudRoute routes you to AWS credits (Activate Portfolio up to $100K, Bedrock/GenAI POC $10K–$50K, GenAI Accelerator up to $1M) and vetted ML partners who build it — you pay $0.

the core idea

IWhat "AI search" means — and the problem it actually solves

AI search is search that understands meaning. Instead of matching the literal words in a query against the literal words in your documents, it compares the <em>semantics</em> of the query to the semantics of each item — so "laptop won't charge" surfaces the "battery and power adapter troubleshooting" page even though they share almost no words.

Traditional search is lexical: it scores documents by how often the query's terms appear (TF-IDF / BM25), optionally with stemming and synonym lists you maintain by hand. It is fast, exact, and explainable — and it fails the moment a user phrases something differently from your content. Search for "remote work policy" when the document says "telecommuting guidelines" and lexical search returns nothing. Semantic search fixes this by comparing meaning instead of words.

The mechanics: offline, you run each document (or product, ticket, FAQ, listing) through an embedding model that converts text into a vector — a list of numbers that encodes meaning, where semantically similar text lands near each other in vector space. You store those vectors in a vector index. At query time, you embed the user's query with the same model and ask the index for the nearest vectors (approximate nearest-neighbour search). The results are ranked by semantic closeness, so synonyms, paraphrases, and intent all match without anyone maintaining a synonym list.

Where this shows up in a product: the search box in a SaaS app, help centre, or docs site; product / catalogue search in e-commerce (find by description, not just title); internal knowledge search across wikis and tickets; recommendations and "more like this"; and as the retrieval layer underneath a chat assistant. The last one is RAG — retrieval-augmented generation — and AI search is the retrieval half of it. This page is about making search itself better; if your goal is specifically a grounded chatbot over your docs, see the dedicated RAG-on-AWS guide.

On AWS, every stage of this maps to a managed service, and there is also a fully-managed end-to-end option (Amazon Kendra) if you would rather buy than build. The next section sets up the decision that determines everything else: keyword vs semantic vs hybrid.

the one-sentence definition

AI search = embed your content and the query into vectors with the same model, then return the items whose meaning is nearest the query — so search matches intent and synonyms, not just the exact words the user typed.

the central distinction

IIKeyword vs semantic vs hybrid search — and why hybrid usually wins

The single most important thing to understand before building is that semantic search is not strictly better than keyword search — it is better at different things. The production answer on almost every real corpus is to run both and fuse the results.

Each mode has a failure shape. Keyword (lexical / BM25) search is exact and unbeatable for precise terms — a part number, a SKU, an error code, a person's name, an acronym — but it returns nothing when the user's words differ from your content's words. Semantic (vector) search is the opposite: it shines on intent, synonyms, and natural-language questions, but it can miss or blur an exact token (it might rank a conceptually-similar product above the exact SKU someone typed) and it can return plausible-but-wrong neighbours. Hybrid search runs both and combines the scores, so exact matches and semantic matches both surface.

The standard way to combine them is Reciprocal Rank Fusion (RRF) — score each result by its rank position in each list and sum, which needs no score-normalisation and is robust across very different scoring scales. The typical pattern: run a BM25 query and a vector query in parallel, fuse with RRF, then (often) re-rank the fused top-N with a cross-encoder for final precision. Amazon OpenSearch supports this hybrid flow natively because it is both a keyword engine and a vector engine in one system.

The practical guidance: start hybrid, do not start pure-vector. Teams that ship pure semantic search are frequently surprised when it cannot find an exact product code that lexical search would have nailed instantly. Hybrid is a small amount of extra wiring for a large, reliable quality gain — and it is the configuration that most often clears a relevance bar that pure-vector alone misses.

keyword vs semantic vs hybrid search · what each is good and bad at

Dimension	Keyword / BM25	Semantic / vector	Hybrid (fused)
Matches on	Exact terms, stems	Meaning, synonyms, intent	Both — fused
Wins at	SKUs, codes, names, acronyms	Natural-language questions, paraphrase	Real mixed queries
Fails at	Different wording than content	Exact tokens; plausible-wrong neighbours	Few blind spots
Needs embeddings?	No	Yes	Yes
Synonym list to maintain?	Yes (by hand)	No	No
Explainability	High (you see the matched terms)	Lower (distance in vector space)	Medium
AWS implementation	OpenSearch / Aurora full-text	OpenSearch k-NN / pgvector / Kendra	OpenSearch hybrid + RRF

Hybrid is the production default. Reciprocal Rank Fusion is the usual combiner because it needs no score normalisation across the very different scales of BM25 and cosine similarity. Re-rank the fused top-N for final precision.

end to end

IIIThe AI-search reference architecture on AWS, stage by stage

A semantic / hybrid search system runs five logical stages. Two are offline (indexing), three are at query time. Almost every relevance problem traces back to a specific stage, so it pays to know each one before you build.

It helps to split the stages into an indexing phase (embed → index) that runs whenever your content changes, and a query phase (query → re-rank → optionally generate) that runs on every search. The table below maps each stage to the AWS service that typically implements it. Note that if you choose Amazon Kendra, it collapses all five stages behind a single managed API — that build-vs-buy choice is section VII.

Indexing phase — embed and index

1. Embed. Run each searchable item — a doc, product, FAQ, ticket, listing — through an embedding model (Amazon Titan Text Embeddings v2 or Cohere Embed, both on Amazon Bedrock) to produce a vector. For long documents you first split them into passages (chunks) so each vector represents one coherent unit; for short items (a product title + description) one vector per item is fine. The model and its output dimensions are fixed for the life of the index — change the model and you must re-embed everything.

2. Index. Write each vector plus the original text and structured metadata (id, title, category, price, tags, ACL/tenant, timestamp) into a vector index with an approximate-nearest-neighbour (ANN) algorithm such as HNSW. On AWS this is Amazon OpenSearch Serverless, Aurora PostgreSQL with pgvector, or — if you went managed — Kendra's own index. Metadata is not optional: it is what lets you filter (in-stock only, this tenant only, this category) and what powers hybrid search's keyword side.

Query phase — query, re-rank, (optionally) generate

3. Query. Embed the incoming query with the same model and run a nearest-neighbour search for the top-K candidates (K is commonly 20–100 at this stage). In a hybrid setup you also run a BM25 keyword query in parallel and fuse the two result lists (RRF). Apply metadata filters here — never after generation — so a user only ever sees items they are entitled to and that match their facets.

4. Re-rank. Pass the fused top-K through a cross-encoder re-ranker (Amazon Rerank or Cohere Rerank on Bedrock) that scores each candidate against the query directly and keeps the best handful. This is the highest-leverage precision step — re-ranking routinely turns a mediocre result list into a sharp one, especially for the top 3–5 positions users actually look at.

5. Generate (optional). If you want a direct answer rather than a list of links — an "AI answer" or "answer box" above the results — pass the re-ranked top results to a generation model on Bedrock (Claude, Amazon Nova, Llama, Mistral) with an instruction to answer only from those results and cite them. This is exactly the RAG pattern; for pure search (return ranked items), you stop at stage 4.

the five AI-search stages mapped to AWS services · representative as of 2026

Stage	Phase	What it does	Typical AWS service
1. Embed	Indexing (offline)	Text → vectors	Titan Text Embeddings v2 / Cohere Embed (Bedrock)
2. Index	Indexing (offline)	Store vectors + metadata for ANN search	OpenSearch Serverless / Aurora pgvector / Kendra
3. Query	Query (real time)	Nearest-neighbour (+ BM25 hybrid + filters)	OpenSearch k-NN + hybrid / pgvector / Kendra
4. Re-rank	Query (real time)	Score + keep the best few results	Amazon Rerank / Cohere Rerank (Bedrock)
5. Generate (optional)	Query (real time)	Direct cited answer above results	Claude / Nova / Llama / Mistral (Bedrock)

Stages 1–4 are "AI search." Add stage 5 only if you want a generated answer box on top of ranked results (that is RAG). Amazon Kendra collapses all five behind one managed Retrieve / Query API — see section VII.

the build, in order

IVStep-by-step: adding semantic + hybrid search to your app

Here is the fastest credible path from a keyword-only search box to AI-powered hybrid search on AWS, using Bedrock embeddings and OpenSearch Serverless. Each step maps to a stage above; the order matters because every step depends on the one before it being clean.

Step 1 — Get clean, searchable records — Pull the items you want searchable (rows from a database, articles from a CMS, products from a catalogue) into a normalised form: an id, the text to embed (title + body, or title + description), and the metadata to filter/return (category, price, tags, tenant, ACL, updated-at). For long documents, chunk them (300–800 tokens, ~10–20% overlap) so each vector is one coherent passage.
Step 2 — Enable Bedrock model access — In the Bedrock console, request access in your Region to an embedding model (Titan Text Embeddings v2 or Cohere Embed) and — only if you plan an answer box — a generation model (Claude, Nova, Llama, or Mistral). Pick your Region for data-residency and latency reasons up front.
Step 3 — Embed and index — Call the Bedrock embeddings API for each record and write the vector + text + metadata into an OpenSearch Serverless collection configured for vector (k-NN) search with an HNSW index. Backfill the whole corpus once, then index new/changed records going forward. Store the embedding model + dimension in your index config so future-you knows what produced these vectors.
Step 4 — Wire the query path (start hybrid) — On each search: embed the query, run a k-NN vector query and a BM25 keyword query against OpenSearch, and fuse them with Reciprocal Rank Fusion. Apply metadata filters in the query (in-stock, tenant, category) so filtering happens before ranking, not after. Return a candidate top-K of 20–100.
Step 5 — Add re-ranking for precision — Pass the fused top-K to Amazon Rerank or Cohere Rerank on Bedrock, keep the best 5–10, and return those to the UI. This is the single biggest quality jump for the top results users actually click. Skip re-ranking on trivial exact-match queries to save cost and latency.
Step 6 — (Optional) add a generated answer box — If you want an answer above the list, send the re-ranked results to a Bedrock generation model with a strict "answer only from these results and cite them" prompt, attach a Bedrock Guardrail, and render the answer with citations. This turns search into search-plus-RAG.
Step 7 — Measure relevance and tune — Build a labelled query set (real queries + which results should rank highly) and track relevance metrics (nDCG, recall@K, MRR) on every change so you can tell whether a new chunk size, embedding model, or fusion weight actually helped. Tuning blind is the most common reason AI-search projects stall. Section VI covers the metrics and the knobs.

where the vectors live

VVector index options on AWS — OpenSearch, Aurora pgvector, Kendra

The vector index holds your embeddings and answers nearest-neighbour queries. For an in-app search feature the three relevant AWS options are OpenSearch Serverless, Aurora PostgreSQL with pgvector, and — at the fully-managed end — Amazon Kendra, which is an index and a search engine in one.

Amazon OpenSearch Serverless is the default for building search on AWS, and the natural choice for AI search specifically because it is both a vector engine (k-NN) and a mature keyword engine (BM25) in one system — so native hybrid search and Reciprocal Rank Fusion work out of the box, plus faceting, filtering, and aggregations you already expect from a search backend. It is fully managed and auto-scales; the trade is that it bills by OpenSearch Compute Units (OCUs) with a baseline minimum, so it can feel expensive for a very small index even when idle.

Aurora PostgreSQL with the pgvector extension is the pragmatic choice when your app already runs on Postgres. Your vectors live next to your relational data, so you can combine a vector search with SQL WHERE filters and joins to business tables in a single query — ideal for product search where you must filter by price, stock, and category. With HNSW indexing it scales comfortably into the millions of vectors. Native keyword search is weaker than OpenSearch's (Postgres full-text plus pgvector is workable hybrid, but not as turnkey), and at very high query concurrency a purpose-built engine pulls ahead.

Amazon Kendra is a different animal: a fully-managed intelligent-search service, not just an index. It ingests from 40+ connectors (S3, SharePoint, Confluence, Salesforce, databases, web crawl), builds and tunes its own semantic ranking, supports natural-language queries and FAQ matching, and enforces document-level access control by reading source-system ACLs — all without you running an embedding pipeline or a vector store. You trade per-engine control and granular cost tuning for speed-to-launch and far less to operate. The full build-vs-buy comparison is in section VII.

Two more AWS-native vector options exist and are worth knowing for adjacent needs: Amazon MemoryDB / Redis (in-memory, single-digit-millisecond vector queries) when latency is the hard constraint, and the newer S3 Vectors capability for very large, cost-optimised vector sets with infrequent queries. For an in-product search feature, though, the decision is almost always among the three in the table below.

aws vector / search index options for ai search · representative as of 2026 — check the AWS pricing page for current rates

Option	What it is	Hybrid search	Best fit	Cost shape	Watch-out
OpenSearch Serverless	Managed vector + keyword engine	Native (vector + BM25 + RRF)	In-app search wanting one engine for everything	Per OCU + storage; baseline minimum	Baseline cost stings tiny indexes
Aurora PostgreSQL (pgvector)	Managed Postgres + vector extension	Vector + SQL/full-text filters	Apps already on Postgres; faceted product search	Per Aurora instance / ACU + storage	Keyword side weaker than OpenSearch
Amazon Kendra	Fully-managed intelligent search	Built-in semantic + keyword ranking	Buy-not-build; 40+ connectors; ACL-aware search	Per index edition (Developer / Enterprise) + queries	Less control; index pricing is a fixed baseline

OpenSearch Serverless is the path of least resistance for building. pgvector is the path of least new infrastructure if you already run Postgres. Kendra is the path of least engineering if you want managed search with connectors and access control out of the box.

making it actually relevant

VIEmbeddings, re-ranking, and relevance tuning — where search quality is won

Whether AI search feels magic or mediocre comes down to retrieval quality, not the fanciness of any single component. Four knobs dominate: the embedding model, chunking, hybrid fusion, and re-ranking — and you tune all four against a labelled query set, not by feel.

Embedding model — Titan v2 vs Cohere

On Bedrock the two mainstream embedding families are Amazon Titan Text Embeddings v2 and Cohere Embed. Titan v2 is AWS-native, inexpensive, and supports configurable output dimensions (e.g. 256 / 512 / 1024) — smaller dimensions cut index size and speed up search at a modest recall cost, which is a real lever for a large catalogue. Cohere Embed (English and multilingual variants) scores strongly on retrieval benchmarks and is a frequent pick for multilingual search or search-heavy products.

Two rules outweigh the choice itself. First, embed queries and documents with the same model and version — mixing them produces incomparable vectors and silently wrecks relevance. Second, changing the embedding model means re-embedding the entire corpus, so treat it as semi-permanent and benchmark candidates on a sample of your own queries before committing — leaderboards rarely predict your domain.

Chunking and what you embed

For document or article search, how you split text into chunks is the highest-variance decision: chunks too large dilute the vector (one embedding trying to represent many topics) and hurt precision; chunks too small lose the context that makes a passage answerable. A sensible default is 300–800 tokens with 10–20% overlap, then tune. For short structured items (products, listings), the lever is instead what you embed — title alone vs title + description vs title + key attributes — and concatenating the most search-relevant fields usually beats embedding the whole record verbatim.

Hybrid fusion and re-ranking

Hybrid fusion (vector + BM25 via RRF) is the first big relevance gain; re-ranking is the second and larger one. A re-ranker is a cross-encoder that reads the query and each candidate together and scores true relevance — far more accurate than ANN distance. The standard pattern: cast a wide net cheaply (fused top-50 to top-100), then re-rank and keep only the best 5–10 to show. On AWS, Amazon Rerank and Cohere Rerank run on Bedrock. If you do one thing to fix a struggling search, add re-ranking before you touch the embedding model.

Measuring relevance

Tune against numbers, not vibes. Build a labelled set of real queries with judged-relevant results, then track nDCG (rank-aware quality), recall@K (did the right items make the candidate set), and MRR (how high the first good result lands). Run it on every change — new chunk size, new fusion weights, re-ranking on/off — so you can prove an improvement instead of guessing. Pair offline metrics with online signals (click-through rate, zero-result rate, search-to-conversion) once it is live, because real user behaviour catches what an offline set misses.

the debugging order

When results are bad, debug retrieval in order: (1) is the right item even in the index (ingestion / what you embedded)? (2) does it come back in the candidate set (embedding model / chunking / recall@K)? (3) does hybrid fusion surface it above noise (add BM25 + RRF)? (4) does re-ranking push it into the top-5? Tune the embedding model or generation last — most AI-search quality problems are in retrieval and ranking, not the model.

build vs buy

VIIBuild it yourself vs Amazon Kendra — the central decision

Before choosing an embedding model or a vector store, decide whether to build the pipeline at all. Amazon Kendra is AWS's fully-managed intelligent-search service; building on Bedrock + OpenSearch is the DIY path. This one choice sets how much you build, how much you control, and how fast you ship.

The honest framing: Kendra if you want managed search with connectors and access control fast; build if you need control, custom ranking, or the lowest per-query cost at scale. Kendra is a search service — it crawls 40+ data sources, builds and tunes its own semantic ranking, answers natural-language queries, matches FAQs, and (critically for enterprise) reads source-system ACLs so each user only sees what they are permitted to, with no embedding pipeline or vector store to run. The trade is less granular control over ranking and embeddings, and a pricing model that is a fixed index baseline plus queries rather than something you tune stage by stage.

Building on Bedrock + OpenSearch Serverless gives you the opposite: full control of the embedding model, chunking, hybrid fusion weights, re-ranking, and exactly what you index — and typically a lower marginal cost per query at high volume because you are paying for tokens and OCUs rather than a managed-search premium. You own the connectors (you write the ingestion), the access-control logic (metadata filters you design), and the maintenance. For a custom in-product search box with bespoke relevance rules, or a very high-QPS workload where per-query cost dominates, building usually wins. For "make our internal wiki and Confluence and SharePoint searchable, with permissions, by next month," Kendra usually wins.

A common pattern is to prototype on Kendra to prove the use case and get a strong baseline in days, then migrate to a built pipeline only if a concrete requirement — custom ranking, a vector store you already run, or cost at scale — forces it. Many teams never need to: Kendra is enough. Others start built because search relevance is their product and they want every knob. The comparison table makes the trade explicit.

the pragmatic rule

Choose Amazon Kendra when speed, connectors, and out-of-the-box access control matter more than control — especially for enterprise knowledge search across many sources with permissions. Build on Bedrock + OpenSearch when you need custom ranking, custom chunking, an existing vector store, or the lowest per-query cost at high volume. Prototype on Kendra; graduate to built only when a hard requirement forces it.

shipping it for real

VIIIProduction concerns and the AI-search cost stack

A search demo and a production search feature differ on freshness, access control, latency, and a bill that scales with usage. Each has a concrete AWS answer, and the cost stack has predictable line items that surprise teams who budgeted only for embeddings.

On freshness, your index is only as current as your last sync: trigger re-embedding on a data-change event (an S3 event or DB CDC stream via Lambda, or a scheduled job) and store an updated-at in each record so stale items can be filtered or down-weighted. On access control, enforce it in the query — tag each record with ACL / tenant metadata at index time and apply a filter on every search so a user only ever retrieves what they are entitled to; for multi-tenant SaaS, filter by tenant at minimum, or use separate indexes for hard isolation. (Kendra does this for you by reading source ACLs.) On latency, end-to-end time is query + re-rank + optional generation; cache query embeddings for repeated searches, keep re-ranking to a sensible top-N, and stream any generated answer so perceived latency stays low.

The cost figures below are representative as of 2026 to show the shape of the bill — always check the AWS pricing page (and the third-party vendor for any non-AWS component) for current rates. For pure semantic/hybrid search the dominant cost is the always-on index baseline (OpenSearch OCUs or the Kendra index edition); embeddings are a one-time-per-corpus cost plus updates; per-query embedding and re-ranking are small; and generation only appears if you added an answer box — at which point generation tokens usually become the largest line.

ai-search cost stack on aws · representative shape as of 2026 — check the AWS pricing page for current rates

Cost line	When you pay	Driver	Main lever to control it
Embeddings (indexing)	One-time per corpus + on updates	Total tokens embedded	Chunk size; smaller embedding dimensions; only re-embed changed records
Search index	Continuous (baseline)	OpenSearch OCUs / Kendra edition / Aurora ACUs	Right-size the engine; pgvector if Postgres already runs; tune dimensions
Query embeddings	Per query	Search volume	Negligible per call; cache embeddings for repeated queries
Re-ranking	Per query	Candidates re-ranked × queries	Re-rank top-50/100, not top-1000; skip on trivial exact-match queries
Generation (only with an answer box)	Per query	Input + output tokens × model price	Cheaper model for easy queries; fewer chunks; prompt caching; tight max-tokens

For pure search (return ranked items), the always-on index baseline dominates — right-size it to corpus size, not peak imagination. Add an answer box and generation tokens usually become the biggest line; Bedrock prompt caching and re-ranking to a few tight results cut it the most.

build vs buy, side by side

Amazon Kendra vs build-your-own (Bedrock + OpenSearch) AI search

This is the comparison that decides your architecture. Read it as "Kendra if speed, connectors, and access control matter most; build if you need control, custom ranking, or lowest per-query cost at scale."

Dimension	Amazon Kendra (managed)	Build (Bedrock + OpenSearch Serverless)
Time to first search	Days — connect a source, it indexes + ranks	Days to weeks — build embed→index→query→re-rank
Pipeline you maintain	Almost none — AWS runs ingestion + ranking	All of it — embeddings, index, hybrid, re-ranking
Data ingestion	40+ built-in connectors (S3, SharePoint, Confluence, Salesforce…)	You write ingestion for each source
Ranking control	Managed semantic ranking + relevance tuning knobs	Full — your embeddings, fusion weights, re-ranker
Hybrid search	Built in	Native in OpenSearch (vector + BM25 + RRF)
Access control	Reads source-system ACLs automatically	You design metadata filters / per-tenant isolation
Embedding model choice	Managed (not yours to pick)	Titan v2 / Cohere — your choice and dimensions
Cost shape	Fixed index edition + per-query	Per OCU + tokens — lower marginal cost at high QPS
Best for	Enterprise knowledge search across many sources, fast, with permissions	Custom in-product search, bespoke ranking, high volume

A common path: prototype on Kendra to prove the use case and get a strong baseline, then migrate to a built pipeline only when a concrete requirement — custom ranking, an existing vector store, or cost at scale — forces it. Many teams never need to migrate.

building this for real?

Have a vetted AWS partner build your AI search — and let AWS credits pay for it

Start in 3 minutes →

a recent match

Semantic + hybrid product search — anonymized

inquiry · seed-to-Series-A e-commerce SaaS, US

Series-A e-commerce enablement SaaS, ~20 people, ~400k product SKUs across merchant catalogues, keyword-only search box with a high zero-result rate

Situation: Their in-app product search was lexical only: shoppers who typed descriptions or synonyms ("rain jacket" when the listing said "waterproof shell") got zero results, and the zero-result rate was visibly hurting conversion. They wanted semantic search that still matched exact SKUs and brand names, with per-merchant isolation so one merchant's catalogue never leaked into another's results. The two engineers who could build it were committed to the core roadmap, and the projected Bedrock + OpenSearch bill made the founder hesitate to start.

What CloudRoute did: Routed within 24 hours to a US-region AWS partner with a search / GenAI track record. The partner built it on AWS: Titan v2 embeddings over title + key attributes, OpenSearch Serverless as the vector + keyword engine with native hybrid search and Reciprocal Rank Fusion, Cohere Rerank for top-result precision, per-merchant metadata filtering for isolation, and a 300-query labelled set scored on nDCG and recall@K to tune chunking and fusion weights. The whole engagement was funded by AWS credits the partner filed for — Activate Portfolio plus a Bedrock POC allocation.

Outcome: Hybrid semantic search in production in about 5 weeks. Zero-result rate fell sharply while exact SKU and brand lookups still resolved instantly; per-merchant isolation enforced at query time. The build and the first months of search + inference ran on AWS credits — the customer paid $0. CloudRoute's commission was paid by the partner from AWS engagement funding.

engagement window: ~5 weeks · founder time: ~7 hours · stack: Titan v2 + OpenSearch Serverless (hybrid + RRF) + Cohere Rerank · cost to customer: $0

faq

Common questions

What is the difference between keyword search, semantic search, and hybrid search?

Keyword (lexical / BM25) search matches the literal terms in the query against the terms in your content — fast and exact, great for SKUs, codes, and names, but it returns nothing when the user's wording differs from your content. Semantic (vector) search compares meaning using embeddings, so it matches synonyms, paraphrases, and intent — but it can miss an exact token and sometimes returns plausible-but-wrong neighbours. Hybrid search runs both and fuses the results (usually with Reciprocal Rank Fusion), so exact and semantic matches both surface. On real corpora hybrid almost always beats either alone, which is why it is the production default.

How do I add semantic search to my app on AWS?

The build path: (1) get clean records with an id, the text to embed, and filter/return metadata; (2) enable a Bedrock embedding model (Titan Text Embeddings v2 or Cohere Embed); (3) embed each record and index the vectors + metadata in Amazon OpenSearch Serverless (k-NN); (4) at query time embed the query, run a vector query and a BM25 keyword query, and fuse them with RRF — this is hybrid search; (5) re-rank the fused top-K with Amazon Rerank or Cohere Rerank and return the best few; (6) optionally add a generated answer box with a Bedrock model. Then tune relevance against a labelled query set using nDCG and recall@K. If you would rather not build the pipeline, Amazon Kendra does all of this as a managed service.

Should I build AI search myself or use Amazon Kendra?

Use Amazon Kendra when you want managed intelligent search fast: it has 40+ data-source connectors (S3, SharePoint, Confluence, Salesforce, databases, web crawl), builds and tunes its own semantic ranking, answers natural-language queries, and reads source-system ACLs so each user only sees permitted documents — no embedding pipeline or vector store to run. Build on Bedrock + OpenSearch Serverless when you need control over the embedding model, chunking, hybrid fusion weights, and re-ranking, want to reuse an existing vector store, or are optimising per-query cost at high volume. A common approach is to prototype on Kendra for a fast baseline and migrate to a built pipeline only if a hard requirement forces it.

Which vector store should I use for AI search on AWS?

For an in-app search feature, OpenSearch Serverless is usually best because it is both a vector engine (k-NN) and a keyword engine (BM25) in one system, so native hybrid search and RRF work out of the box — though its OCU baseline can sting a tiny index. Aurora PostgreSQL with pgvector is the path of least new infrastructure if your app already runs on Postgres, and it lets you combine vector search with SQL filters and joins (ideal for faceted product search), though its keyword side is weaker. Amazon Kendra is the buy-not-build option — a managed index and search engine with connectors and access control. Amazon MemoryDB/Redis is for ultra-low-latency cases.

Which embedding model is better for search — Amazon Titan or Cohere?

Both run on Bedrock and both are strong — benchmark on your own queries rather than a leaderboard. Amazon Titan Text Embeddings v2 is AWS-native, inexpensive, and supports configurable output dimensions (smaller dimensions shrink the index and speed up search at a small recall cost — a real lever for large catalogues). Cohere Embed (English + multilingual) scores highly on retrieval benchmarks and is a common pick for multilingual search. Two rules regardless of choice: embed queries and documents with the same model and version, and remember that changing the embedding model later means re-embedding the entire corpus, so treat it as semi-permanent.

Do I need to add generated AI answers, or is ranked search enough?

They are different features. AI search returns a ranked list of relevant items (stages 1–4: embed, index, query, re-rank) — that alone fixes the "I can't find it" problem and is enough for most product search, docs search, and catalogue search. Adding a generated "answer box" (stage 5) sends the top results to a Bedrock model that writes a direct, cited answer above the list — that is retrieval-augmented generation (RAG), and it is the right move for help centres and assistants where users want an answer, not links. Start with ranked search; add generation when a direct answer clearly beats a list. The generation step also adds the largest per-query cost.

How do I make AI search results more relevant?

Tune four knobs against a labelled query set, in order of leverage: (1) re-ranking — add Amazon Rerank or Cohere Rerank over the top-50/100 candidates; it is the biggest single jump for the top results users click; (2) hybrid search — combine vector + BM25 with RRF so exact terms and meaning both surface; (3) chunking / what you embed — right-size passages (300–800 tokens, 10–20% overlap) or concatenate the most search-relevant fields for short records; (4) embedding model and dimensions. Measure with nDCG, recall@K, and MRR offline, then watch click-through and zero-result rate online. Debug retrieval before you blame the model — most quality problems are in retrieval and ranking.

What does AI search on AWS cost?

For pure semantic/hybrid search the line items are: a one-time embedding of the corpus (re-embed only on updates), a continuous search-index baseline (OpenSearch OCUs, an Aurora instance, or a Kendra index edition), per-query question embeddings (negligible), and per-query re-ranking. The always-on index baseline usually dominates, so right-size it to corpus size rather than peak imagination. If you add a generated answer box, generation tokens typically become the largest cost — control them with a cheaper model for easy queries, fewer re-ranked chunks, and Bedrock prompt caching. Figures are representative as of 2026 — check the AWS pricing page for current rates.

Add AI search to your app — funded by AWS credits

CloudRoute routes you to a vetted AWS search / GenAI partner who designs and ships it — semantic + hybrid search on Bedrock embeddings and OpenSearch (or managed Amazon Kendra), the right vector store, re-ranking, access control, and relevance tuning. AWS credits fund the build and the inference. You pay $0.

Get matched with an AI-search build partner →→ see the AI-team persona detail

matched within< 24h

credits to fund itup to $100K

cost to you$0