Adding GenAI to a single-user app is easy. Adding it to a multi-tenant SaaS product is a different problem: every customer's data has to stay isolated, every customer's usage has to be attributed and billed, every customer needs their own rate limits and safety policy, and none of it can leak across tenants. This is the reference architecture for shipping GenAI features in a SaaS product on Amazon Bedrock in 2026 — per-tenant isolation, cost attribution with application inference profiles, per-customer rate-limiting and Guardrails, usage-based cost passthrough, and customer-data security. The headline: AWS credits — Activate Portfolio up to $100K, Bedrock/GenAI POC $10K–$50K, the GenAI Accelerator up to $1M — can fund the whole build, which is why this is effectively $0 via CloudRoute.
Most "add AI to your product" guides assume one user and one data set. A SaaS product is the opposite: many customers (tenants) sharing one application, each with their own data, their own contract, and their own expectation that nobody else can see or influence their information. The hard part of GenAI in SaaS is not choosing a model — it is making sure the model behaves correctly per tenant.
The center of gravity for SaaS GenAI on AWS is Amazon Bedrock: a fully-managed service that lets you call foundation models from Anthropic (Claude), Meta (Llama), Mistral, Amazon (Nova and Titan), Cohere, Stability AI, AI21, and DeepSeek through a single API, with no servers to manage. Crucially for SaaS, your prompts and outputs are not used to train the base models and stay in your AWS account and Region — so when you tell a customer "your data is not used to train anyone's model," that is a property of the platform, not a promise you have to engineer. The complete platform reference lives at Amazon Bedrock.
But Bedrock by itself does not make your feature multi-tenant. That is your job, and it comes down to four properties that a single-tenant prototype never has to think about. Isolation: tenant A's prompt must never retrieve tenant B's documents, and tenant B's data must never appear in tenant A's answer. Attribution: you need to know exactly how much inference each tenant consumed, because GenAI is now a real variable cost of serving them. Governance per tenant: different customers need different rate limits (an enterprise plan gets more than a free trial) and sometimes different safety policies (a healthcare tenant needs stricter PII handling than a marketing tenant). Security: the customer's data flowing through the model is often the most sensitive data they have given you, and your contracts and compliance posture now extend to it.
Get those four wrong and the failure modes are severe in a way a single-user app never faces: a cross-tenant data leak is a breach, an un-attributed bill means GenAI silently erodes your margin, missing rate limits let one customer's runaway usage degrade everyone, and a permissive Guardrail on a regulated tenant is a compliance finding. Get them right and GenAI becomes just another well-behaved feature of your platform. The rest of this page is how Bedrock's primitives map onto each of the four, the common features built on top, how to attribute and pass through the cost, and the credits that pay for the build.
Single-user GenAI is a model problem. SaaS GenAI is a tenancy problem: isolation + attribution + per-tenant governance + customer-data security. Bedrock gives you a primitive for each — metadata-filtered Knowledge Bases, application inference profiles, per-request Guardrails, and in-account/in-Region data handling. Compose them per tenant and the model choice becomes the easy part.
Isolation is the property you cannot get wrong, because the failure is a cross-tenant data leak. In a GenAI SaaS feature, isolation has to hold at three layers: the documents a tenant's queries can retrieve, the conversation/state that belongs to a tenant, and the IAM boundary around the whole call. Bedrock supports all three, but you have to choose a model deliberately.
The most common isolation question is about retrieval — RAG over the tenant's own documents. There are two viable patterns, and the right one depends on how strong your isolation guarantee needs to be. The first is a shared Knowledge Base with per-tenant metadata filtering: every chunk is tagged with a tenantId, and every query is forced (server-side, never from the client) to filter on the calling tenant's ID, so retrieval can only ever return that tenant's passages. This is operationally simple and cost-efficient — one index for everyone — and is the right default for most SaaS. The second is a Knowledge Base (or vector index) per tenant: stronger physical isolation, easier to reason about for a strict compliance story, but more moving parts and cost as tenant count grows. Detail on building the retrieval layer lives at Bedrock Knowledge Bases and RAG on AWS.
The decisive rule for either pattern is that the tenant scope is applied on the server, derived from the authenticated session — never passed from or trusted from the client. The client says "I am a user in tenant Acme" by presenting a token; your backend resolves that token to a tenant ID and injects the metadata filter (or selects the per-tenant index) itself. A GenAI feature that lets the browser specify which tenant's documents to search is the same vulnerability class as an IDOR in a REST API, just harder to spot because it hides inside a prompt.
Isolation also has to hold at the IAM and data-storage layers, not only at retrieval. Tenant documents in Amazon S3 should be separated by prefix or bucket with policies that prevent any cross-tenant read; conversation history and state keyed by tenant; and the inference call made under a role scoped to exactly the models and resources that tenant's feature needs. The Bedrock call carries no implicit knowledge of your tenants — isolation is a property of how you wrap it. Done well it is invisible; the only way to notice it is when it fails, which is exactly why it has to be designed and tested up front rather than added after a customer asks the security question. Many products run both patterns at once: the shared metadata-filtered index for the long tail of tenants, a dedicated index for the few enterprise accounts that require demonstrable physical separation — the same application code, just a different retrieval target resolved from the tenant record.
In a SaaS product, GenAI is a variable cost of goods sold: every tenant who uses the AI feature burns tokens you pay AWS for. If you cannot attribute that cost per tenant, you cannot price the feature, meter it, or pass it through — and you risk a high-usage customer quietly destroying the margin on their plan. This section covers both halves: attributing the cost (the single most SaaS-specific Bedrock capability, the application inference profile) and then recovering it through a usage-based pricing model.
A Bedrock application inference profile is a wrapper you create around a model (or a cross-Region set of model copies) and call instead of the raw model ID. Its defining feature for SaaS is that it is a taggable, trackable resource: attach cost-allocation tags, route a slice of traffic through it, and every invocation's usage and cost is attributed to that profile in AWS Cost Explorer and the Cost and Usage Report (CUR). It turns Bedrock spend from one opaque line into spend broken out by whatever dimension you tagged — and for SaaS, that dimension is the tenant. The full primitive is at Bedrock application inference profiles.
The practical pattern is one profile per tenant (for a manageable tenant count or high-value accounts) or one per tenant tier / cohort (when you have thousands of small tenants and per-tenant granularity is overkill), tagged with the tenant or tier identifier, with that tenant's calls routed through it. Now "what did tenant Acme cost us in inference last month?" is a Cost Explorer filter, not a forensic exercise — and that one fact is what usage-based pricing, per-seat AI add-ons, margin analysis, and customer-facing meters all depend on, without your building a token-counting pipeline. The profile is also where you enable cross-Region inference, so the same construct that gives attribution also gives a sturdier inference path — see cross-Region inference.
Complement profile-level attribution with application-level token logging: Bedrock model-invocation logging records input/output token counts per request, which you stamp with the tenant ID and aggregate yourself for real-time meters that do not wait for the daily CUR. The two are layers — logs give near-real-time, per-request granularity for live usage displays and rate-limit accounting; profiles give the authoritative, dollar-denominated cost in the billing system of record. A serious SaaS metering setup uses both.
With per-tenant cost in hand, how you recover it becomes a pricing decision made with real numbers. Four common models sit on a spectrum from "absorb it" to "pass it through." Bundled / absorbed: AI is part of the plan and you eat the cost — simplest for the customer, only safe when per-tenant cost is low and bounded by rate limits. Credit / quota: each plan includes an allowance of AI credits or messages, with overage billed or an upgrade prompted — the most common SaaS pattern because it caps exposure and creates a natural upsell. Metered passthrough: bill actual usage at a marked-up rate, for when AI cost is large and variable. Paid add-on: a separate SKU or per-seat upcharge that decouples AI cost from the base plan.
Whichever you pick, the metering pipeline is the same one you built for attribution: real-time, read per-request token counts from model-invocation logging, stamp them with the tenant ID, and aggregate into a live counter that drives meters, quota enforcement, and overage triggers; authoritatively, the inference profile gives the dollar cost per tenant in the CUR to reconcile against billing. The gap between "tokens used" (real-time, approximate) and "dollars billed by AWS" (daily, exact) is normal — display and enforce on the former, true-up on the latter.
What makes any of these models profitable is keeping the underlying cost low, since your margin is plan-price minus AWS cost. The cost levers that keep startup GenAI cheap apply here, multiplied across tenants: default most calls to a small model, turn on prompt caching for the system prompt and shared context (a large win when the same instructions ride every tenant's calls), run offline work like corpus embedding as batch inference at roughly half price, retrieve instead of stuffing documents, and reach for Provisioned Throughput only once aggregate volume is high and steady. Per-tenant rate limits cap the tail so no single customer breaks the unit economics. Full cost detail at Bedrock pricing.
| Model | How the customer pays | Your cost exposure | Best when | Metering needed |
|---|---|---|---|---|
| Bundled / absorbed | Nothing extra — AI is part of the plan | You absorb it | Per-tenant AI cost is small & bounded by rate limits | Internal only (margin watch) |
| Credit / quota | Plan includes an allowance; overage or upgrade beyond | Capped by quota | You want predictability + a natural upsell | In-product meter + quota enforcement |
| Metered passthrough | Pays for actual usage at a marked-up rate | Passed through | AI cost is large & variable; usage varies widely | Full per-tenant token/cost metering |
| Paid add-on / per-seat | Separate AI SKU or per-seat upcharge | Recovered via the add-on price | AI is a distinct premium capability | Per-tenant attribution for margin |
| Mechanism | What it gives you | Granularity | Latency | Best for |
|---|---|---|---|---|
| Application inference profile (per tenant) | Authoritative tagged cost per tenant in Cost Explorer / CUR | Per tenant | Daily (CUR) | High-value accounts; exact billing-of-record cost |
| Application inference profile (per tier/cohort) | Tagged cost per plan/cohort without thousands of profiles | Per tier | Daily (CUR) | Thousands of small tenants; margin analysis by plan |
| Model-invocation logging + tenant stamp | Per-request input/output token counts you aggregate yourself | Per request / per tenant | Near real-time | In-product usage meters; rate-limit accounting |
| Both, layered | Real-time meter (logs) + authoritative cost (profiles) | Per request → per tenant | Real-time + daily | Usage-based pricing done properly |
Multi-tenancy means one tenant's behaviour cannot be allowed to harm the others, and different tenants legitimately need different rules. Two controls carry this: rate-limiting per customer (so usage is fair and bounded by plan) and Guardrails per tenant (so safety and compliance policy can differ by customer). Both are standard Bedrock-era SaaS engineering; neither is exotic.
Rate-limiting per customer protects three things at once: your shared throughput (one tenant cannot exhaust your account-level Bedrock capacity and starve everyone else), your margin (a tenant on a $49 plan cannot quietly run $4,000 of inference), and your abuse surface (a compromised or malicious tenant is contained). You implement it in your application layer — a token-bucket or quota per tenant, enforced before the call reaches Bedrock — typically tiered by plan: a free trial gets a small allowance, a growth plan more, an enterprise plan a negotiated ceiling. Because you are already attributing usage per tenant for billing (Section III), you have the counters to enforce limits from the same data. When a tenant hits their ceiling you degrade gracefully — queue, throttle, or prompt an upgrade — rather than failing the whole product.
Guardrails per tenant is the governance counterpart. A Bedrock Guardrail is a configurable safety layer — denied topics, content filters, PII detection and redaction, word filters, and contextual-grounding checks — that you apply to a model call. The SaaS-relevant fact is that you select which Guardrail applies on a per-request basis, so different tenants can run under different policies: a healthcare customer under a Guardrail that aggressively redacts PII and blocks clinical-advice patterns, a general business customer under a lighter one, an internal/admin context under a permissive one. You maintain a small set of Guardrail configurations (often one per compliance posture or plan tier rather than one literally per tenant) and resolve the right one from the tenant record at call time. The full configuration reference is at Bedrock Guardrails.
The architectural point that ties Sections II–IV together: tenant context is resolved once, server-side, at the start of every request, and then drives all three controls. From the authenticated session you derive the tenant ID, and that single value selects the retrieval scope (which documents), the inference profile (whose cost), the rate-limit bucket (whose quota), and the Guardrail (whose policy). Build that resolution step well — a small, well-tested piece of middleware — and multi-tenancy stops being scattered through the codebase and becomes one clean seam that every GenAI call passes through.
Resolve one tenant ID from the authenticated session per request, then let it fan out to all four controls: retrieval scope (metadata filter / index), application inference profile (cost attribution), rate-limit bucket (fair usage), and Guardrail (safety policy). One seam, tested once, and every GenAI call inherits correct multi-tenant behaviour.
Almost every GenAI feature a SaaS product ships is one of four shapes: an in-app assistant, document Q&A, content generation, or semantic search. The useful insight is that all four are built from the same small set of Bedrock primitives — they differ in how those primitives are composed, not in what they are. Build the tenant-context seam once and any of the four becomes mostly product work.
Underneath, every one of these features runs through the Converse API (one request schema across all models, so you can swap or route models without re-integrating), optionally a Knowledge Base for grounding in the tenant's data, and a Guardrail for safety — all scoped by the tenant ID resolved at the edge of the request and metered through that tenant's application inference profile. The differences below are about which of those pieces you wire together and how. For a step-by-step on the assistant shape specifically, see build a chatbot on AWS.
The economic upshot is that these four features share almost all of their infrastructure: the same Knowledge Base and embeddings serve both document Q&A and semantic search; the same Converse + Guardrail + tenant-context seam serve the assistant and content generation; and one set of application inference profiles meters all four per tenant. You are not building four systems — you are building one multi-tenant GenAI substrate and exposing four product surfaces on top of it. That shared substrate is also why the default model choice matters so much for cost: most calls across all four features can go to a small model, with frontier escalation reserved for the hard minority. See Amazon Nova and Claude on Bedrock for the model tiers, and Bedrock Agents for the multi-step assistant.
When you add GenAI to a SaaS product, your customers' data now flows through a model, and your security and compliance posture has to extend to that path. The good news is that Bedrock's defaults are strong; the work is wiring them up correctly per tenant and being able to explain the data flow to a customer's security team.
Start from what Bedrock gives you by default, because it answers the questions customers ask first. Your data is not used to train the foundation models. Prompts and outputs stay within your AWS account and the Region you call, so data residency is a Region choice, not a custom build — important when a tenant requires EU or other in-Region processing. Calls are authenticated and authorized through IAM, encrypted in transit and at rest, and the whole path sits inside your VPC/account boundary; you can keep traffic off the public internet with PrivateLink if a tenant requires it. These are platform properties you can put in a security questionnaire, not promises you have to engineer from scratch — which is precisely why Bedrock, rather than a third-party model API, is the defensible choice when you are handling other companies' data.
On top of the defaults, the SaaS-specific security work is mostly about isolation and minimization. Enforce tenant scoping server-side on every retrieval and every action (Section II) so the model can structurally never reach across tenants. Use Guardrails to detect and redact PII before it reaches the model where a tenant's policy requires it (Section IV). Scope IAM roles tightly so the inference path can touch only the models and resources it needs. Log model invocations for audit, and treat those logs as sensitive (they contain prompt and completion content) — store them per your retention and access rules. Minimize what you send: retrieve the few relevant chunks rather than a tenant's whole corpus, both for cost and to shrink the blast radius of anything that goes wrong.
Finally, GenAI security has to live inside your broader compliance story. If you carry SOC 2, ISO 27001, HIPAA, or similar, the GenAI feature is now in scope: the data flow through Bedrock, the per-tenant isolation, the Guardrail policies, the logging and retention, and the access controls all become things an auditor will examine and a prospect's security review will probe. Because Bedrock runs inside your AWS account under your existing controls, the GenAI path generally inherits the account-level posture you already have rather than creating a separate compliance island — which is a large part of why building on Bedrock is easier to get through enterprise procurement than bolting on an external model vendor. This is also exactly the kind of build where a partner who has shipped multi-tenant GenAI under SOC 2 before saves you from learning the isolation and audit requirements the hard way.
Here is the whole thing assembled: a concrete reference architecture for a GenAI feature in a multi-tenant SaaS product on Bedrock, with every one of the four properties — isolation, attribution, governance, security — wired in. It is deliberately built from managed pieces so a small team can run it without an ML platform group.
Trace a single request through it. A user in tenant Acme triggers an AI action. Your application authenticates them and, in one server-side step, resolves the tenant ID from the session — the seam everything hangs off. That tenant ID immediately selects four things: the retrieval scope (a metadata filter of tenantId = Acme against the shared Knowledge Base, or Acme's dedicated index), the application inference profile for Acme (so the cost lands attributed to them), the rate-limit bucket for Acme's plan (checked before any spend), and the Guardrail for Acme's compliance posture. The request then runs through the Converse API — retrieve Acme's relevant chunks, call a small default model (escalating to frontier only if the step needs it) under Acme's Guardrail and inference profile — and returns a grounded, governed answer. Model-invocation logging records the token counts stamped with Acme for the live meter; the inference profile feeds the authoritative cost into the CUR for billing.
What makes this architecture operable for a small team is that every box is a managed AWS service, mapped layer by layer in the table below. The only thing you actually build is the thin, well-tested application layer that resolves the tenant and fans it out to those services, plus the product surface on top. There is no GPU fleet, no vector database to operate, and no model serving to keep warm.
For the non-foundation-model needs some SaaS products have — a custom recommendation, forecasting, or vision model trained on your own data — Amazon SageMaker slots in alongside Bedrock in the same account, funded by the same credits, under the same tenant-isolation and IAM discipline; reach for it only when you genuinely need to own training or run classical ML (the head-to-head is at Bedrock vs SageMaker). The architecture above is the same one a vetted AWS partner would stand up — there is nothing proprietary in it, which is the point: it is reference-grade precisely so you can build it or have it built and know it is right.
| Layer | AWS service | Role | How it enforces tenancy |
|---|---|---|---|
| Tenant context | Your app + IAM | Resolve tenant ID from the authenticated session | The single server-side seam; never trusts the client |
| Document storage | Amazon S3 | Hold each tenant's source documents | Prefix/bucket separation + isolation policies |
| Retrieval (RAG) | Bedrock Knowledge Base | Chunk, embed, store vectors, retrieve grounded context | Per-tenant metadata filter or index-per-tenant |
| Inference | Bedrock + Converse API | Generate answers across one schema; route models | Call under the tenant's application inference profile |
| Cost attribution | Application inference profiles | Tag + track spend per tenant in Cost Explorer / CUR | One profile per tenant or per tier, tagged |
| Safety / compliance | Bedrock Guardrails | Filter content, redact PII, block denied topics | Per-request Guardrail selected from tenant policy |
| Fair usage | Your app (rate limiter) | Bound usage per plan; protect shared capacity | Token-bucket / quota keyed by tenant + plan |
| Metering / audit | Model-invocation logging | Per-request token counts for meters + audit | Stamp each record with the tenant ID |
A capable SaaS engineering team can build the multi-tenant architecture above — none of the primitives is secret. But there are two recurring reasons to route to a vetted AWS partner, and one of them is the reason the whole build can cost you nothing.
The first reason is getting multi-tenancy right the first time. The four properties — isolation, attribution, governance, security — are exactly the places where a subtle mistake is expensive: a retrieval filter that trusts the client, a Guardrail that is too permissive for a regulated tenant, attribution that is approximate enough to erode margin, an isolation gap that surfaces in a customer's security review. A partner who has shipped multi-tenant GenAI on Bedrock before — often under SOC 2 or HIPAA — sets these seams correctly and can stand in front of a prospect's security team. For a team adding its first AI feature to a product that already carries customer-data obligations, that experience is worth more than the time it saves.
The second reason is the credits, and this is the headline. AWS funds generative-AI builds through credit programs that are largely partner-filed and invisible on the public Activate page: Activate Portfolio (up to $100K) for institutionally-funded companies, a dedicated Bedrock/GenAI proof-of-concept track ($10K–$50K) for a defined GenAI build, and the competitive Generative AI Accelerator (up to $1M) for AI-first companies. You generally cannot self-serve the large tiers; they are submitted by an AWS partner through the ACE program or by a VC with Portfolio access. This is precisely what CloudRoute does — we route you to a vetted partner who files the credit application and, if you want hands, builds the multi-tenant workload with you. Because AWS funds both the credits and the partner engagement, you pay $0.
Put the two together and the math for a SaaS company is compelling: the GenAI feature you were going to build to grow the product gets built by a team that has done multi-tenant Bedrock before, the inference bill for the first many months is covered by AWS credits, and the partner engagement is funded by AWS too. The credits are also the right size for a SaaS rollout — a Bedrock/GenAI POC ($10K–$50K) comfortably covers piloting the feature across early tenants, and Activate Portfolio ($100K) covers the broader build and run while you prove the unit economics. See AWS credits for generative-AI startups, $100K AWS credits, and AWS / Bedrock POC funding explained.
Design the four tenancy properties in from day one (isolation + attribution + governance + security) so the feature is correct and profitable — then let AWS credits cover the build and the early bill. CloudRoute routes you to a vetted partner who files the credit application and can build the multi-tenant Bedrock workload. AWS funds the credits and the engagement. You pay $0.
The gap between a working GenAI demo and a GenAI feature in a real SaaS product is entirely about tenancy. This is the side-by-side of what you can ignore in a single-user prototype and what becomes load-bearing the moment many customers share the system — and the Bedrock primitive that handles each.
| Concern | Single-user prototype | Multi-tenant SaaS | Bedrock primitive that handles it |
|---|---|---|---|
| Data isolation | One data set — no isolation needed | No tenant can ever see another's data or retrieval | Knowledge Base metadata filtering / index-per-tenant |
| Cost attribution | One bill, one user — irrelevant | Must know exact cost per tenant to price & bill | Application inference profiles (tagged, per tenant) |
| Rate limiting | You are the only user | Per-customer quotas; one tenant can't starve others | App-layer token bucket keyed by tenant (uses profile data) |
| Safety / compliance policy | One policy for you | Different Guardrail per tenant / compliance posture | Per-request Guardrail selection |
| Data security | Your own data | Customers' sensitive data; SOC 2 / HIPAA in scope | IAM + in-account/in-Region + PII-redacting Guardrails |
| Model integration | Hardcode one model | Route/escalate models per call cheaply | Converse API (one schema across all models) |
| Metering | None | In-product usage meters + overage / quota | Model-invocation logging stamped with tenant ID |
Situation: The team had a working single-tenant prototype of an in-product assistant, but could not ship it. The prototype had no tenant isolation on retrieval (a security blocker, and a hard no for their SOC 2 posture), no way to attribute inference cost per tenant (so they could not decide whether to bundle the feature, meter it, or sell it as an add-on), and no per-tenant rate limiting (so one heavy customer could blow the margin on a mid-tier plan). EU tenants needed in-Region processing the prototype did not enforce. Their two infra engineers were fully allocated to the core product, and the founder was wary of GenAI both as a compliance surface and as an unbounded line on the AWS bill.
What CloudRoute did: Routed within 20 hours to a US AWS partner with multi-tenant Bedrock and SOC 2 experience. The partner rebuilt the feature on the reference architecture: a shared Bedrock Knowledge Base with enforced server-side tenantId metadata filtering (dedicated indexes for two enterprise tenants that required physical separation), one application inference profile per tenant tier tagged for cost attribution, per-plan rate limiting in the app layer, and per-tenant Guardrails (a stricter PII-redacting policy for the regulated cohort). Region was pinned per tenant for the EU accounts. A small default model (Nova Lite) handled most calls with Claude Sonnet on the hard path; prompt caching covered the shared system prompt; corpus embedding ran as batch. In parallel the partner filed a Bedrock/GenAI POC application and an Activate Portfolio application via ACE.
Outcome: Per-tenant cost became a Cost Explorer filter, which let the team launch the AI assistant as a metered add-on with confidence in the margin. Tenant isolation passed their security review and went into the SOC 2 scope cleanly. GenAI POC credits ($40K) were approved in under two weeks and Portfolio ($100K) shortly after, so the build and the first many months of inference ran on AWS credits. Multi-tenant assistant + document Q&A in production for all tenants in 6 weeks. CloudRoute's commission was paid by the partner from AWS engagement funding; the customer paid $0.
time-to-match: < 24h · per-tenant cost attribution: exact · credits secured: $140K · cost to customer: $0
CloudRoute routes you to a vetted AWS partner who files your GenAI credit application (Activate Portfolio up to $100K, Bedrock/GenAI POC $10K–$50K, GenAI Accelerator up to $1M) and, if you need hands, builds the multi-tenant Bedrock workload with you — per-tenant isolation, cost attribution, rate-limiting, and Guardrails. AWS funds the credits and the engagement. You pay $0.