recommendation engine on aws · the 2026 build guide

How to build a recommendation engine on AWS (2026).

A recommendation engine shows each user the items they are most likely to want next — products, content, jobs, songs, courses. On AWS there are three ways to build one, and the modern answer combines them: Amazon Personalize for managed collaborative filtering, Bedrock embeddings plus vector search for semantic "more like this" and cold-start, and an LLM to re-rank the shortlist and write a human-readable reason for each pick. This is the full build guide — the three approaches and when each fits, a hybrid architecture (retrieve → LLM re-rank → generate explanations), the cold-start problem, real-time vs batch, what it costs, and a step-by-step.

approaches
3
managed service
Amazon Personalize
pipeline stages
4
credits to fund it
up to $100K
TL;DR
  • There are three building blocks for recommendations on AWS, and they answer different questions. Amazon Personalize is managed collaborative filtering — "users like this user also engaged with these items" — and it is the fastest path to a behavioural recommender. Embeddings + vector search (Bedrock embeddings stored in OpenSearch Serverless or Aurora pgvector) give content-based "more like this" from item attributes, which is what solves cold-start for brand-new items and users. An LLM (Claude, Amazon Nova on Bedrock) is not the recommender — it is the re-ranker and the explainer: it reorders a shortlist against rich context and writes a one-line reason for each pick.
  • The modern production pattern is hybrid, not pick-one: a fast candidate-generation layer (Personalize and/or vector search) proposes a few hundred items, then an optional LLM re-ranks the top tens against business rules and live context, then optionally generates a short natural-language explanation ("because you watched X and rate documentaries highly"). Candidate generation must be cheap and fast; LLM re-ranking is applied only to the shortlist so the token bill stays bounded.
  • The hard parts are cold-start (no history for a new user or item), the real-time-vs-batch split (precompute nightly vs respond to a live click), and a cost stack that grows with traffic. Recommendation infrastructure and GenAI inference bills add up; CloudRoute routes you to AWS credits (Activate Portfolio up to $100K, Bedrock/GenAI POC $10K–$50K, GenAI Accelerator up to $1M) and vetted ML partners who build it — you pay $0.
the core idea

IWhat a recommendation engine does — and the three questions it has to answer

A recommendation engine predicts, for a given user and context, which items from your catalogue they are most likely to want — then ranks those items so the best handful surface first. The whole problem is ranking a catalogue per user; everything else is how you compute that ranking.

Under the hood, every recommender is solving one of three sub-problems, and the right AWS building block depends on which one dominates your product. The first is behavioural similarity: "people who interacted with what this user interacted with also liked these items." That is collaborative filtering, and it is what Amazon Personalize is built for — it learns from your interaction logs (views, clicks, purchases, watches, ratings) without you ever describing the items. The second is content similarity: "this item is about the same things as items the user already likes." That is content-based filtering, and on AWS it is implemented with embeddings — you turn each item's text/attributes into a vector and find nearest neighbours. The third is contextual ranking with reasons: given a shortlist, reorder it against the live situation (time of day, what is in the cart, a stated intent, business rules) and explain why — which is where a large language model earns its place.

These are not competing religions; they are complementary layers. Collaborative filtering is powerful but blind to brand-new items and brand-new users (the cold-start problem, section V). Content-based filtering handles cold-start gracefully but can be myopic — it recommends more of the same and misses the surprising-but-loved item that only behaviour reveals. An LLM re-ranker adds judgement and explanation but is too slow and expensive to score a whole catalogue. The strongest systems use each where it is strong: behaviour for the bulk signal, content for cold-start and diversity, an LLM for the final polish on a short list.

Where this shows up in a product: "recommended for you" rows on a homepage; "more like this" / "similar items" on a detail page; "frequently bought together" at checkout; personalized search ranking and re-ranking of a feed; next-best-content in media and learning; and cold-start onboarding ("you're new — here's a good starting set"). Each of these is the same ranking machine pointed at a different surface with different context.

On AWS, every one of these layers maps to a managed service, which is why AWS is a common place to build recommendations end to end. The next three sections take each building block in turn, then section IV assembles them into the hybrid architecture most production systems converge on.

the one-sentence definition

A recommendation engine ranks your catalogue per user and context. On AWS you build it from three layers — Amazon Personalize for behavioural collaborative filtering, Bedrock embeddings + vector search for content-based similarity and cold-start, and an LLM to re-rank the shortlist and explain each pick.

approach 1 — managed behaviour

IIAmazon Personalize — managed collaborative filtering

Amazon Personalize is AWS's fully-managed recommendation service. You give it your interaction history; it trains and hosts the same class of deep-learning recommenders Amazon.com uses, and exposes them behind a real-time API — no model code, no serving infrastructure to run.

The mental model: you load three kinds of data and Personalize does the rest. Interactions are the spine — a stream of user-item events (view, click, add-to-cart, purchase, watch, rating) with a timestamp, and ideally an event type and value. Items and Users are optional metadata datasets (item category, price, genre, brand; user segment, age band) that improve quality and, importantly, help with cold-start. You pick a recipe (Personalize's name for a pre-built algorithm) for the job you want, train a solution, deploy a campaign (a real-time endpoint) or run a batch inference job, and call GetRecommendations.

The recipes map to the common product surfaces. User-Personalization is the default "recommended for you" recipe and the one to reach for first — it blends collaborative filtering with item metadata and automatically explores new items. Similar-Items powers "more like this" using co-occurrence plus item attributes. Personalized-Ranking takes a list you supply (search results, a curated row, a feed) and reorders it for the specific user — this is the recipe you use to personalize an existing ranking rather than generate one. Trending-Now and Popularity handle "what is hot" and anonymous fallbacks. A newer capability lets Personalize incorporate richer item content, which narrows the gap with pure embeddings approaches.

Personalize handles real-time signals through an event tracker: as a user clicks during a session, you stream those events in and recommendations adapt within the session — a genuinely live recommender, not a nightly batch. It also exposes recommendation filters (a small expression language) so you can exclude already-purchased items, restrict to in-stock or region-eligible items, or enforce business rules at serving time without rebuilding the model.

The honest limits. Personalize is collaborative-first, so it is strongest once you have meaningful interaction volume — AWS guidance is on the order of 1,000+ interactions from users with a handful of events each before quality is reliable; below that, behaviour alone is too sparse and you lean on metadata recipes and the embeddings layer instead. It is a managed black box: you tune inputs, recipes, and filters, not the model architecture, and you cannot drop in your own embeddings or scoring function. And it has its own cost shape (training hours + real-time TPS capacity + per-recommendation), covered in section VII. For most teams those trade-offs are well worth it — Personalize is the fastest route to a real behavioural recommender on AWS.

when Personalize is the right first move

Reach for Amazon Personalize when you already have a stream of user-item interactions and want a behavioural "recommended for you" / "similar items" / "personalized ranking" recommender without training or hosting models yourself. Start with the User-Personalization recipe, add an event tracker for in-session real-time, and use recommendation filters for business rules. If you have little or no interaction history yet, start with the embeddings layer (section III) and add Personalize as data accrues.

approach 2 — content & cold-start

IIIEmbeddings + vector search — content-based recommendations

The second building block is content-based: represent each item as an embedding vector and recommend by nearest-neighbour similarity. This is the same machinery as semantic search, pointed at items instead of queries — and it is the layer that gracefully handles brand-new items and users that collaborative filtering cannot see.

The mechanics: offline, you run each item through an embedding model on Amazon Bedrock — Amazon Titan Text Embeddings v2 or Cohere Embed for text (title + description + key attributes), and Titan Multimodal Embeddings when images matter, as they do for fashion, home, and other visual catalogues. You store those vectors plus item metadata in a vector index: Amazon OpenSearch Serverless (vector + keyword in one engine) or Amazon Aurora PostgreSQL with pgvector (vectors next to your relational data). At recommendation time you find the items whose vectors are nearest to a reference — and there are three useful references, which is what makes this layer so flexible.

You can recommend relative to an item ("more like this product" — nearest neighbours of the item the user is viewing), relative to a user profile vector (average or recency-weighted blend of the embeddings of items the user has engaged with — a cheap, effective personalization that needs no model training), or relative to a text intent ("cosy winter outerwear under $150" embedded and matched against the catalogue). The same vector index serves all three. Because the signal is the item's content, not other users' behaviour, a freshly-added item is recommendable the moment it is embedded — there is no waiting for it to accumulate interactions.

That is the cold-start superpower (section V), but content-based recs have a known weakness: they are myopic. They recommend more of what the user already likes and rarely surface the unexpected item that only collective behaviour reveals — the documentary that people who like your favourite thriller also love, despite sharing no attributes. They also depend heavily on the quality of your item text: thin, templated, or missing descriptions produce thin embeddings and bland recommendations. The fix is not to abandon either approach but to combine embeddings with collaborative filtering and let each cover the other's blind spot — which is exactly the hybrid architecture in the next section.

One practical note that matters for cost and latency: a user-profile vector is just arithmetic over item vectors you already store, so per-user personalization here is nearly free at query time — you embed items once, and recommending for a user is a single nearest-neighbour search. That makes the embeddings layer a strong default for early-stage products that do not yet have the interaction volume Personalize needs.

three references for embeddings-based recommendations · all served from one vector index
Recommend relative toWhat you embed / queryPowers the surfaceCold-start behaviour
An itemNearest neighbours of the viewed item's vector"More like this" / "similar items"Works for any embedded item, day one
A user profileRecency-weighted blend of the user's item vectors"Recommended for you" (content-based)Works after the user's first interaction
A text intentAn embedded natural-language queryIntent-driven / onboarding picksWorks with zero history at all
All three are nearest-neighbour searches over the same item embeddings in OpenSearch Serverless or Aurora pgvector. Build a user-profile vector by averaging (recency-weighted) the embeddings of items the user engaged with — no model training required.
the production pattern

IVThe hybrid architecture — retrieve, re-rank with an LLM, generate explanations

Almost every strong production recommender on AWS is a pipeline, not a single model: a cheap, fast layer generates candidates, then progressively more expensive layers refine a shrinking shortlist. This "candidate generation → ranking → (optional) explanation" funnel is how you get both quality and a bounded cost.

The funnel has a strict economic logic: each stage is more expensive per item than the last, so each stage operates on fewer items than the last. Candidate generation runs over the whole catalogue and must be milliseconds-cheap; the LLM stage runs over tens of items and can afford to be smart and slow. Get this ordering wrong — e.g. asking an LLM to score the whole catalogue — and the system is both too slow and far too expensive. Get it right and you spend frontier-model intelligence only where it changes the answer.

Stage 1 — Candidate generation (hundreds of items, milliseconds)

Produce a few hundred plausible items fast, recall over precision. In practice you union two sources: Amazon Personalize (behavioural candidates — what this user is likely to engage with) and vector search (content candidates — items similar to what they like, plus fresh/cold-start items Personalize cannot see yet). Merging the two lists is what gives the funnel both behavioural depth and cold-start coverage. Apply hard business filters here — in-stock, region-eligible, not-already-purchased, policy-allowed — using Personalize filters and vector-store metadata filters, so disallowed items never reach the expensive stages.

Stage 2 — Ranking / re-ranking (tens of items)

Now order the shortlist precisely. The lightweight option is Amazon Personalize Personalized-Ranking, which reorders a supplied list for the user cheaply and at scale — the right default for high-traffic surfaces. The heavyweight option is an LLM re-ranker (Claude or Amazon Nova on Bedrock): you pass the candidate items plus rich context (the user's recent activity, stated preferences, what is in the cart, the time and surface, and your merchandising rules in plain language) and ask the model to return the best-ordered subset with reasons. The LLM is unmatched at fusing messy, qualitative signals and honouring nuanced rules ("favour higher-margin items, but never show two from the same brand back-to-back, and respect the user's stated dislike of horror") that are painful to encode as features. The cost discipline is simple: only ever re-rank the top tens, and reserve the LLM stage for high-value surfaces where ranking quality moves a real metric.

A cross-encoder re-ranker (Amazon Rerank or Cohere Rerank on Bedrock) is a middle option — cheaper and lower-latency than a generative LLM, strong when the ranking signal is text-relevance of the item to a query or intent. Many systems use the cross-encoder for relevance and an LLM only when they also want generated explanations or complex rule-following.

Stage 3 — Generated explanations (optional, the top few)

For the handful of items you will actually display, an LLM on Bedrock can write a short, grounded reason for each — "because you watched Planet Earth and rate nature documentaries highly" or "pairs with the jacket in your cart; same waterproof rating." Explanations measurably lift trust and click-through, and they turn an opaque ranking into something a user (and your support team) can reason about. Keep them grounded in real signals you pass in, attach a Bedrock Guardrail, cache aggressively (most explanations recur), and treat them as a presentation-layer nicety that must never block the core ranking if the model is slow or unavailable.

the recommendation funnel on aws · each stage smarter, slower, and over fewer items
StageItems in → outGoalTypical AWS serviceLatency budget
1. Candidate generationCatalogue → ~hundredsHigh recall, cheapPersonalize + OpenSearch/pgvector (unioned)Single-digit ms
2. RankingHundreds → ~tensPrecise order + rulesPersonalize Personalized-Ranking (light) / LLM rerank (heavy)Tens of ms (light) / ~1s (LLM)
3. Explanation (optional)Tens → the displayed fewHuman-readable reasonClaude / Nova on Bedrock~1s, cache-backed, non-blocking
Cost and latency rise sharply left-to-right, so item counts fall sharply left-to-right. Run candidate generation over the whole catalogue; let the LLM touch only the shortlist. Stage 3 is presentation polish — design it to degrade gracefully (skip the explanation, keep the ranking) if the model is slow.
the hardest part

VCold-start — new users, new items, and a brand-new system

The defining hard problem in recommendations is cold-start: you cannot recommend from behaviour that does not exist yet. It shows up in three distinct forms, and each has a specific AWS-native answer — which is, in large part, why the hybrid architecture exists.

Naming the three forms keeps the fixes straight. New-user cold-start: a visitor with no history — you do not yet know what they like. New-item cold-start: an item just added to the catalogue with no interactions — collaborative filtering literally cannot rank it because no one has touched it. New-system cold-start: you are launching and have little interaction data for anyone, so behaviour-only models are starved across the board. The mistake is treating cold-start as one problem; the fixes differ.

  • New user — fall back to content, popularity, and a fast onboarding signal — With no history, lean on the content layer and non-personalized signals: popularity / trending (Personalize Trending-Now or a simple top-sellers list), any context you do have (geo, referrer, device, the category they landed on), and an optional 2–3 tap onboarding ("pick a few you like") that immediately yields a user-profile vector for embeddings-based recs. Personalize's User-Personalization recipe also explores new users automatically. Each subsequent click sharpens the picture within the session via the event tracker.
  • New item — content-based recs make it recommendable on day one — This is exactly what embeddings solve. The moment a new item is embedded from its title/description/attributes (and image, via Titan Multimodal), it is a candidate in vector search — no interactions needed. Feeding item metadata into Personalize also lets its metadata-aware recipes place new items sensibly. Practical rule: every new item gets embedded at ingestion time so it is never invisible to the recommender.
  • New system — start content-first and let behaviour accrue — Below the interaction volume Personalize needs for reliable collaborative filtering (roughly 1,000+ interactions across enough users), do not force it. Launch on the embeddings layer (item-to-item and profile-vector recs) plus popularity, instrument every interaction from day one, and introduce Personalize once the logs are rich enough. The hybrid design means this is a smooth upgrade, not a re-platform — you are adding a candidate source, not replacing the system.
  • LLM-assisted cold-start — reason about items with no behavioural data — When there is genuinely no behavioural signal, an LLM can reason from attributes and sparse context to produce a sensible starter ranking and explain it — "you said you like strategy games and have 20 minutes, so start here." It is a strong stopgap precisely where collaborative filtering is blind, and it doubles as the explanation layer once real data arrives.
why the hybrid design is cold-start insurance

A pure-collaborative recommender (Personalize alone) is blind to new items and weak for new users and new systems. Adding the embeddings layer makes every item recommendable from the instant it is embedded and gives new users a content-and-onboarding path, while Personalize contributes the behavioural depth once data exists. The two layers are each other's cold-start insurance — which is the core reason production systems run both.

when you compute it

VIReal-time vs batch — when to precompute and when to respond live

A recommender can compute results ahead of time (batch) or on demand (real-time), and most production systems do both: precompute the expensive parts nightly, respond to live signals at request time. Choosing the split per surface is a major lever on both cost and freshness.

The two modes trade freshness against cost. Batch precomputes recommendations for known users on a schedule (Personalize batch inference jobs writing to S3, or a nightly job that materializes per-user lists into DynamoDB) — cheap per user and trivially fast to serve (a key lookup), but stale between runs and unable to react to what the user just did. Real-time computes at request time (a Personalize campaign endpoint, or a live vector search and optional LLM re-rank) — current to the latest click and able to use live context, but it pays the inference cost on every request and must hit a latency budget. The art is matching each surface to the mode it actually needs.

A robust default is a hybrid of the two: precompute the heavy candidate set and a base ranking in batch, store it keyed by user, then at request time apply a thin real-time layer — fold in this session's clicks (Personalize event tracker), re-filter for current stock/eligibility, and optionally LLM-re-rank only if it is a high-value surface. The user sees fresh, contextual results without you paying full real-time inference on every impression. Where staleness is harmless — a weekly "picked for you" email, an onboarding set, a low-traffic page — pure batch is the cheaper, simpler correct choice.

real-time vs batch recommendations on aws · pick per surface
ModeHow it works on AWSFreshnessCost shapeBest-fit surfaces
Batch (precomputed)Personalize batch job / nightly job → S3 / DynamoDB; serve by key lookupStale between runsCheap per user; pay compute on a scheduleEmail recs, onboarding sets, low-traffic pages
Real-timePersonalize campaign endpoint / live vector search (+ optional LLM rerank)Current to the latest eventPay inference per request; latency budgetHomepage rows, in-session, post-click "more like this"
Hybrid (precompute + live layer)Batch candidates + base ranking, then real-time session/context layerFresh where it mattersBounded — heavy work batched, thin work liveHigh-traffic personalized surfaces (the common default)
Most production systems are hybrid: precompute the expensive candidate generation and base ranking, then apply a cheap real-time layer (session events, live filters, optional LLM re-rank on high-value surfaces). Reserve pure real-time for surfaces where the latest click must change the result.
the build, in order

VIIStep-by-step: building a hybrid recommender on AWS

Here is the fastest credible path from zero to a hybrid recommendation engine on AWS. Each step builds on the last; the order matters because candidate generation must be solid before re-ranking and explanations are worth adding.

  • Step 1 — Instrument and collect interactions — Capture user-item events (view, click, add-to-cart, purchase, watch, rating) with user id, item id, timestamp, and event type — stream them via the Personalize event tracker and also land them in S3 for training and analysis. This data is the spine of the whole system; start collecting before you build anything else, because models are only as good as the interaction history behind them.
  • Step 2 — Embed your catalogue — Run every item through a Bedrock embedding model (Titan Text Embeddings v2 or Cohere Embed for text; Titan Multimodal if images matter) over title + description + key attributes, and write the vectors + metadata (category, price, brand, stock, tenant, region) into OpenSearch Serverless or Aurora pgvector. Embed new items at ingestion time so nothing is ever cold to the content layer.
  • Step 3 — Stand up content-based recs first — With the vector index live, ship item-to-item ("more like this") via nearest-neighbour search, and per-user recs via a recency-weighted user-profile vector. This gives you a working recommender on day one — crucially, one that handles cold-start — before you have the interaction volume Personalize needs.
  • Step 4 — Add Amazon Personalize for behaviour — Once interaction logs are rich enough (~1,000+ interactions across enough users), import Interactions (+ Items/Users metadata) into a Personalize dataset group, train a User-Personalization solution, and deploy a campaign or batch job. Add recommendation filters for business rules (in-stock, not-already-purchased, region). You now have two candidate sources.
  • Step 5 — Union the candidates and apply filters — Build a candidate-generation layer that calls both Personalize and vector search, merges and de-dupes the lists into a few hundred items, and applies hard filters (stock, eligibility, policy) so disallowed items never advance. This unioned shortlist is the input to ranking.
  • Step 6 — Rank the shortlist — For high-traffic surfaces, reorder with Personalize Personalized-Ranking (cheap, scalable). For high-value surfaces where quality and rules matter, add an LLM re-ranker (Claude or Nova on Bedrock) that takes the top tens plus rich context and merchandising rules in plain language and returns the best-ordered subset. Only ever re-rank the top tens to keep the token bill bounded.
  • Step 7 — (Optional) generate explanations — For the few items you display, have a Bedrock model write a short grounded reason per item ("because you watched X"). Attach a Guardrail, cache hard, and make it non-blocking so a slow model never holds up the ranking. This is the trust-and-CTR upgrade once the ranking itself is good.
  • Step 8 — Measure and tune — Track offline ranking metrics (precision@K, recall@K, nDCG, MAP) against held-out interactions, then run online A/B tests on the metrics that matter — click-through, add-to-cart, watch-time, conversion, and coverage/diversity so you do not collapse into a popularity echo chamber. Tune candidate mix, top-K, and when to invoke the LLM against the numbers, not by feel.
shipping it for real

VIIIProduction concerns and the recommendation cost stack

A recommendation demo and a production recommender differ on feedback loops, freshness, latency, and a bill that scales with traffic. Each has a concrete AWS answer, and the cost stack has predictable line items teams miss when they budget only for the model.

On the feedback loop, recommendations change behaviour, which changes the next training data — so guard against feedback loops that narrow the catalogue (the model only ever shows popular items, so only popular items get interactions, so they look even more popular). Track coverage and diversity alongside accuracy, and keep an exploration component (Personalize's User-Personalization explores by design; content-based candidates inject variety) so good-but-obscure items still surface. On freshness, retrain or incrementally update on a cadence that matches your catalogue and behaviour drift, embed new items at ingestion, and keep an item's stock/price/eligibility in metadata so serving-time filters reflect reality. On latency, the budget is candidate-gen + ranking + optional explanation; keep candidate generation in single-digit milliseconds, cap how many items reach the LLM, cache explanations, and stream or pre-generate anything user-visible so perceived latency stays low. On graceful degradation, always have a fallback ranking (popularity / last-good batch list) so a slow or failed model never yields an empty row.

The cost figures below are representative as of 2026 to show the shape of the bill — always check the AWS pricing page (and any third-party vendor) for current rates. For a behaviour-led system the steady costs are Personalize training plus real-time capacity; for a content-led system it is the always-on vector-index baseline plus one-time embedding; and the moment you add LLM re-ranking or explanations, generation tokens can become the largest line — which is exactly why the funnel restricts the LLM to a short shortlist.

recommendation cost stack on aws · representative shape as of 2026 — check the AWS pricing page for current rates
Cost lineWhen you payDriverMain lever to control it
Amazon Personalize — trainingPer training / solution-versionTraining hours × data sizeRetrain on a sensible cadence, not constantly; right-size datasets
Amazon Personalize — real-timeContinuous (per provisioned TPS) + per recommendationCampaign min-TPS + request volumeProvision realistic TPS; batch where staleness is fine
Embeddings (catalogue)One-time per catalogue + on new itemsItems × tokens (or images) embeddedEmbed only new/changed items; smaller dimensions; concatenate key fields
Vector indexContinuous (baseline)OpenSearch OCUs / Aurora ACUs + storageRight-size the engine; pgvector if Postgres already runs; tune dimensions
LLM re-rank + explanationsPer request (when used)Input + output tokens × model priceOnly top-N to the LLM; cheaper model for easy cases; cache; prompt caching; tight max-tokens
A content-only recommender can run on just the embedding + vector-index lines. Adding Personalize adds training + real-time capacity; adding an LLM stage adds per-request tokens that can dominate — which is why the funnel restricts the LLM to the shortlist. Prompt caching and re-ranking only the top tens cut the LLM line the most.
the three approaches, side by side

Amazon Personalize vs embeddings vs LLM — what each is for

This is the comparison that frames the whole build. Read it as "Personalize for behaviour, embeddings for content + cold-start, an LLM for re-ranking and explanations" — and remember the strongest systems combine all three rather than choosing one.

DimensionAmazon PersonalizeEmbeddings + vector searchLLM (Claude / Nova on Bedrock)
Core methodCollaborative filtering (behaviour)Content-based similarity (vectors)Contextual re-ranking + explanation
Primary signalUser-item interactionsItem attributes / text / imagesRich context + the shortlist
Role in the funnelCandidate gen + light rankingCandidate gen (esp. cold-start)Re-rank top-N + generate reasons
Cold-start (new item)Weak without metadataStrong — recommendable once embeddedStrong — reasons from attributes
Cold-start (new user)Explores; needs some signalProfile vector after 1st interactionStrong — reasons from sparse context
Data needed to start~1,000+ interactions for reliabilityJust item contentNone (uses whatever you pass)
PersonalizationPer-user, learnedPer-user via profile vectorPer-request via context
Cost shapeTraining + real-time TPS + per-recOne-time embed + index baselinePer-request tokens (largest if unbounded)
LatencyLow (managed endpoint)Low (ANN search)Higher (~1s) — shortlist only
Best atBehavioural "recommended for you""More like this", cold-start, diversityFinal ranking, rule-following, explanations
These are layers, not alternatives. The production default is hybrid: Personalize + vector search generate candidates, Personalized-Ranking or an LLM re-ranks the shortlist, and an LLM optionally explains the few items shown. Start with the one or two layers your data supports today and add the rest as you grow.
building this for real?
Have a vetted AWS partner build your recommendation engine — and let AWS credits pay for it
Start in 3 minutes →
a recent match

A hybrid product recommender — anonymized

inquiry · Series-A commerce media platform, US
Series-A commerce + content platform, ~25 people, ~200k items (products + articles), fast catalogue turnover, mostly popularity-based "recommendations" and a high new-item dead-zone

Situation: Their "recommended for you" rows were really just best-sellers, so newly-added items got almost no exposure (a textbook new-item cold-start dead-zone) and engagement on the rows was flat. They wanted genuine personalization that also surfaced fresh items immediately, honoured merchandising rules (margin, brand spacing, regional eligibility), and showed a short reason per pick to build trust — without their two ML-capable engineers leaving the core roadmap. The projected Personalize + Bedrock + OpenSearch bill made the founder hesitate to start.

What CloudRoute did: Routed within 24 hours to a US-region AWS partner with a personalization / GenAI track record. The partner built it on AWS as a hybrid funnel: Titan v2 embeddings over each item (title + attributes) stored in OpenSearch Serverless for item-to-item and user-profile-vector candidates, Amazon Personalize (User-Personalization + Personalized-Ranking) for behavioural candidates and base ranking once the interaction logs were rich enough, a unioned candidate layer with stock/eligibility filters, a Claude re-ranker on the top ~30 items for merchandising rules, and Claude-generated one-line explanations on the displayed few (Guardrail-protected and cached). New items were embedded at ingestion so they entered candidates on day one. The whole engagement was funded by AWS credits the partner filed for — Activate Portfolio plus a Bedrock POC allocation.

Outcome: A hybrid recommender in production in about 6 weeks. The new-item dead-zone closed (fresh items entered candidates immediately via the embeddings layer), click-through on the recommendation rows rose against the popularity baseline in an A/B test, and merchandising rules plus per-item explanations shipped on the high-value surfaces. The build and the first months of training + inference ran on AWS credits — the customer paid $0. CloudRoute's commission was paid by the partner from AWS engagement funding.

engagement window: ~6 weeks · founder time: ~8 hours · stack: Amazon Personalize + Titan v2 + OpenSearch Serverless + Claude (re-rank + explanations) · cost to customer: $0

faq

Common questions

How do I build a recommendation engine on AWS?
Combine three building blocks in a funnel. (1) Candidate generation: union Amazon Personalize (behavioural collaborative filtering from your interaction logs) with vector search (Bedrock embeddings of your items in OpenSearch Serverless or Aurora pgvector) to get a few hundred items, applying business filters. (2) Ranking: reorder the shortlist with Personalize Personalized-Ranking for high-traffic surfaces, or an LLM (Claude / Nova on Bedrock) for high-value surfaces that need rich context and rule-following. (3) Optional explanations: have a Bedrock model write a one-line reason per displayed item. Start with the one or two layers your data supports today — content-based embeddings need only item text; Personalize needs ~1,000+ interactions — and add the rest as you grow.
What is Amazon Personalize and when should I use it?
Amazon Personalize is AWS's fully-managed recommendation service: you give it user-item interaction history (plus optional item/user metadata) and it trains and hosts deep-learning recommenders behind a real-time API, with no model code or serving infrastructure to run. Use it when you already have a stream of interactions and want behavioural "recommended for you" (User-Personalization recipe), "similar items" (Similar-Items), or to personalize an existing list (Personalized-Ranking). It supports in-session real-time via an event tracker and business rules via recommendation filters. It is strongest with meaningful interaction volume (roughly 1,000+ interactions across enough users); below that, lean on embeddings-based content recommendations first.
Amazon Personalize vs embeddings vs an LLM — which should I use?
They answer different questions and are best combined. Amazon Personalize does collaborative filtering — it learns from behaviour ("users like you also liked…") and is the fastest path to a behavioural recommender, but needs interaction volume and is weak on brand-new items. Embeddings + vector search do content-based similarity from item attributes — they power "more like this", solve cold-start (a new item is recommendable the moment it is embedded), and need only item content, but can be myopic. An LLM is not the recommender; it re-ranks a shortlist against rich context and business rules and writes human-readable explanations, but it is too slow/expensive to score a whole catalogue. The production default is a hybrid funnel using all three where each is strong.
How do I handle the cold-start problem on AWS?
Treat its three forms separately. New user (no history): fall back to popularity/trending and content, use any context (geo, referrer, landing category), and optionally a 2–3 tap onboarding that yields a user-profile vector for embeddings-based recs. New item (no interactions): embed it at ingestion so it is immediately a candidate in vector search — this is exactly what content-based recommendations solve — and feed item metadata into Personalize's metadata-aware recipes. New system (little data overall): launch content-first on the embeddings layer plus popularity, instrument every interaction, and add Amazon Personalize once logs are rich enough (~1,000+ interactions). An LLM can also reason from attributes to produce a sensible starter ranking where there is no behavioural signal at all.
Should recommendations be computed in real time or batch?
Both, per surface. Batch precomputes recommendations on a schedule (Personalize batch inference to S3, or a nightly job materializing per-user lists into DynamoDB) — cheap and fast to serve but stale between runs; ideal for email recs, onboarding sets, and low-traffic pages. Real-time computes at request time (a Personalize campaign endpoint, or live vector search plus optional LLM re-rank) — current to the latest click and able to use live context, but you pay inference per request under a latency budget; ideal for homepage rows, in-session updates, and post-click "more like this". Most production systems are hybrid: precompute the heavy candidate set and base ranking, then apply a thin real-time layer for session events, live filters, and optional LLM re-ranking on high-value surfaces.
How does an LLM help with recommendations if it is not the recommender?
Two jobs, both on a short list rather than the whole catalogue. First, re-ranking: pass the top tens of candidates plus rich context (recent activity, stated preferences, cart contents, time, surface) and your merchandising rules in plain language, and the LLM (Claude or Nova on Bedrock) returns the best-ordered subset — it excels at fusing messy qualitative signals and honouring nuanced rules that are painful to encode as features. Second, explanations: for the few items you display, it writes a short grounded reason ("because you watched X and rate documentaries highly"), which lifts trust and click-through. Keep the LLM to the shortlist so cost stays bounded, ground explanations in real signals, attach a Guardrail, cache, and make the explanation layer non-blocking so a slow model never holds up the ranking.
How do I measure whether my recommendations are any good?
Use offline metrics to iterate and online tests to decide. Offline, against held-out interactions, track ranking metrics — precision@K, recall@K, nDCG, and MAP — plus coverage and diversity so you can see if the system is collapsing into a popularity echo chamber. Online, run A/B tests on the business metric the surface exists to move: click-through, add-to-cart, watch-time, conversion, or retention. Watch the feedback loop — recommending only popular items starves everything else of the interactions it needs to ever be recommended — and keep an exploration component so good-but-obscure items still surface. Tune candidate mix, top-K, and when to invoke the LLM against these numbers rather than by feel.
What does a recommendation engine on AWS cost?
It depends on which layers you run. A content-only recommender needs just a one-time catalogue embedding (re-embed only new items) plus an always-on vector-index baseline (OpenSearch OCUs or Aurora ACUs). Adding Amazon Personalize adds training cost (per solution version) plus real-time capacity (provisioned TPS on a campaign) plus per-recommendation charges. Adding an LLM re-ranking/explanation stage adds per-request generation tokens, which can become the largest line — which is exactly why the funnel sends only the top tens to the LLM and caches explanations. The biggest levers: batch where staleness is fine, provision realistic Personalize TPS, embed only changed items, and use prompt caching and tight max-tokens on the LLM. Figures are representative as of 2026 — check the AWS pricing page for current rates.

Build your recommendation engine on AWS — funded by AWS credits

CloudRoute routes you to a vetted AWS personalization / GenAI partner who designs and ships it — Amazon Personalize for behaviour, Bedrock embeddings + a vector store for content and cold-start, and an LLM to re-rank and explain — as a hybrid funnel, real-time or batch, with the rules and evaluation that make it production-grade. AWS credits fund the build and the inference. You pay $0.

matched within< 24h
credits to fund itup to $100K
cost to you$0
Build a recommendation engine on AWS (2026) — the build guide · CloudRoute