for AWS partners →Talk through your build vs buy →

enterprise AI assistant · 2026 buyer's guide

The enterprise AI assistant buyer's guide — build vs buy, decided properly (2026).

Every enterprise wants an internal AI assistant that can answer "what's our refund policy," "summarize last quarter's board deck," or "which customers are at renewal risk" — grounded in the company's own data, respecting who's allowed to see what. The market splits into two camps: buy an off-the-shelf assistant (Amazon Q Business, Microsoft 365 Copilot, Glean) or build a custom one on Amazon Bedrock. This guide walks the six evaluation criteria that actually matter, hands you a scorecard, and tells you — neutrally — when each path wins.

Talk through your build vs buy →→ jump to the scorecard

evaluation criteria

6 core

options compared

4 paths

decision output

scorecard

vendor stance

neutral

TL;DR

There is no universally correct answer — there is a correct answer for your data topology, your security posture, and your extensibility needs. Off-the-shelf assistants (Amazon Q Business, Microsoft 365 Copilot, Glean) win on time-to-value and connector breadth; a custom assistant on Bedrock wins on control, cost shape, and how far you can extend it. The wrong choice is usually the one made on a demo rather than on the six criteria below.
The six criteria that decide it: (1) data connectors — does it reach your actual systems; (2) permission/ACL-aware retrieval — does it respect who can see what; (3) accuracy + citations — can a user trust and verify the answer; (4) security, residency, and data-handling; (5) cost shape — per-seat licensing vs per-token usage; (6) extensibility — can it take actions, not just answer. Score each option 1–5 on all six, weight to your context, and the decision usually makes itself.
Off-the-shelf is right when your data already lives in one vendor's ecosystem (heavily Microsoft 365 → Copilot; broad SaaS sprawl → Glean or Q Business), seat count is predictable, and you need value in weeks. Custom-on-Bedrock is right when you have proprietary data stores no connector reaches, strict residency or model-choice requirements, an unusual permission model, usage that's spiky or seat-count that makes per-seat pricing punishing, or a roadmap where the assistant must take actions inside your own systems.

definition

IWhat an "enterprise AI assistant" actually is — and what makes it hard

An enterprise AI assistant is an LLM-powered interface that answers questions and performs tasks grounded in your organization's own knowledge — documents, wikis, tickets, code, CRM records, data warehouses — while respecting each user's access permissions. The novelty isn't the chat box. It's the grounding and the governance.

Strip away the marketing and an enterprise assistant is four moving parts working together. First, connectors that pull content out of the systems where your knowledge already lives — SharePoint, Confluence, Google Drive, Slack, Jira, Salesforce, ServiceNow, S3, a data warehouse. Second, an index plus retrieval layer (almost always retrieval-augmented generation, RAG) that finds the handful of passages relevant to a question. Third, a generation model that reads those passages and writes a grounded answer with citations. Fourth — and this is the part demos skip — a permission model that ensures the retrieval step only ever surfaces content the asking user is allowed to see.

The consumer version of this problem is easy: point a model at the public web, return an answer. The enterprise version is hard for reasons that have nothing to do with the model's intelligence. Your knowledge is scattered across a dozen systems with a dozen different auth models. The same document might be readable by finance but not by the contractor in the next chair. Answers that are 90% right are dangerous, not delightful, when the question is "are we contractually allowed to do X." And regulated industries need to know where the data went and prove it later.

That is why "build vs buy" for an enterprise assistant is not the same conversation as "which chatbot do we like." The chat experience converges fast across all the options. The hard, differentiating work is connectors, permission-aware retrieval, grounding quality, and governance. The rest of this guide is organized around exactly those axes.

One framing that helps throughout: treat the assistant as a retrieval and governance product that happens to have an LLM at the end, not an LLM product that happens to read your files. Vendors that understand that distinction tend to be the ones worth shortlisting, and teams that internalize it tend to build the ones that actually get adopted.

the four paths

IIThe four paths — off-the-shelf, platform, hybrid, and fully custom

People say "build vs buy" as if there are two options. There are really four points on a spectrum, and most enterprises end up on one of the middle two rather than the pure ends.

It is worth naming all four precisely, because shortlists go wrong when a team compares a turnkey product against a from-scratch build and concludes "buy is obviously faster" — without noticing the two middle options that capture most of the speed and most of the control.

Path 1 — Turnkey off-the-shelf assistant

What it is: A managed product where you connect your sources, configure permissions, and get a working assistant. Examples: Amazon Q Business, Microsoft 365 Copilot, Glean.

You own: Connector configuration, access policies, rollout, and change management.

The vendor owns: The model, the index, the retrieval logic, the citation engine, and the underlying infrastructure.

Time to first value: Days to a few weeks. This is the fast lane — and for a large share of enterprises it is the correct lane.

Path 2 — Custom on a managed AI platform (Bedrock)

What it is: You assemble the assistant on Amazon Bedrock using managed building blocks — Bedrock Knowledge Bases for ingestion, chunking, embeddings, vector storage, and retrieval; your choice of foundation model (Claude, Nova, Llama, Mistral, and others) for generation; and Bedrock Agents or your own orchestration for actions. You write the glue and the permission logic; AWS runs the heavy machinery.

You own: Data pipeline decisions, the retrieval and permission design, model selection, prompts, evaluation, and the UI.

The platform owns: Model hosting, the vector store, embeddings, scaling, and the security substrate.

Time to first value: Weeks to a couple of months for a focused first use case. This is the "build" that most enterprises actually mean — not a from-scratch model, but a custom assembly on managed parts.

Path 3 — Hybrid (buy the broad, build the deep)

What it is: An off-the-shelf assistant for the long tail of general knowledge (HR, IT, policies, the company wiki) plus a custom Bedrock assistant for the one or two high-value, proprietary workflows no connector reaches — pricing logic, a clinical knowledge base, a trading-desk research corpus.

Why it is common: The 80/20 rarely lines up. Off-the-shelf nails 80% of questions cheaply; the remaining 20% is exactly where the business value and the data sensitivity concentrate. Many mature programs run both deliberately rather than forcing one tool to do everything.

Path 4 — Fully bespoke (rare, and usually a mistake to start here)

What it is: Self-hosted models, a self-managed vector database, hand-rolled connectors, custom embedding pipelines — owning the entire stack down to the GPUs.

When it is justified: Air-gapped environments, exotic compliance regimes, or a scale where platform economics genuinely invert. For the overwhelming majority of enterprises in 2026, this is more cost and operational burden than the control is worth. Path 2 captures the control without the toil.

the practical takeaway

Most real decisions land on Path 1 (off-the-shelf), Path 2 (custom on Bedrock), or Path 3 (both). When this guide says "build," it means Path 2 — a custom assembly on managed building blocks — not Path 4. Comparing Path 1 against Path 4 is what makes "buy" look deceptively obvious.

criterion 1 of 6

IIICriterion 1 — Data connectors: can it actually reach your knowledge?

An assistant is only as good as the data it can see. The first and most decisive question is mechanical: does the option connect to the systems where your knowledge actually lives — all of them, including the weird internal ones?

Off-the-shelf assistants compete hard on connector breadth, and the libraries are genuinely large. Amazon Q Business ships dozens of native connectors (S3, SharePoint, Confluence, Salesforce, ServiceNow, Slack, Jira, Google Drive, RDS/Aurora, and many more) plus a generic web-crawler and custom-connector path. Glean built its reputation on breadth and quality of connectors across the modern SaaS stack, with mature incremental sync and identity mapping. Microsoft 365 Copilot is deepest inside the Microsoft graph — SharePoint, OneDrive, Teams, Outlook, Loop — and reaches outward through Microsoft Graph connectors, strongest when your center of gravity is already Microsoft 365.

The trap is the long tail. A connector library covers 90% of the market's systems and still misses your homegrown pricing service, the legacy claims database, the internal API that fronts your data lake, or the document store a 2014 acquisition left behind. For those, every off-the-shelf product offers a custom-connector or API-ingestion path — but now you are building integration code anyway, which narrows the gap with a custom approach.

A custom assistant on Bedrock Knowledge Bases starts from the opposite end. It has first-class ingestion for S3 and a growing set of managed connectors, and for anything else you ingest through your own pipeline — which means any source you can reach with code becomes a source for the assistant. The cost is that you write and maintain that ingestion rather than toggling it on.

Score this criterion by listing your top ten knowledge sources by value, then checking each option honestly: native connector, custom-connector effort, or not feasible. The option that natively covers your high-value sources — not the one with the longest total connector list — wins this axis. A 300-connector product that misses your two most important systems scores lower here than a product that natively covers exactly those two.

Map sources before vendors — Inventory your top 10 knowledge systems by business value and rank them. Decisions made against this list age far better than decisions made against a demo dataset.
Weight by value, not by count — Native coverage of your two most-queried systems beats a longer connector catalog that misses them. Connector count is a vanity metric.
Probe incremental sync + identity mapping — Full re-crawls are slow and expensive; how a connector handles deltas, deletions, and identity mapping matters more than that it exists. This is where mature products separate from new ones.
Price the long tail honestly — If your highest-value data needs a custom connector either way, the off-the-shelf "no-code" advantage shrinks — factor that into the build-vs-buy math rather than assuming buy is integration-free.

criterion 2 of 6

IVCriterion 2 — Permission/ACL-aware retrieval: the make-or-break axis

This is the criterion that quietly kills enterprise assistant projects. If retrieval is not permission-aware, the assistant can surface content a user was never allowed to see — turning your helpful tool into a data-leak engine. Get this wrong and nothing else matters.

The failure mode is specific and severe. A junior employee asks "what's our plan for the reorg" and the assistant — having indexed an executive-only document — happily summarizes it with a citation. The information was access-controlled at the source; the assistant flattened that control by indexing everything into one searchable space and answering from it. Permission-aware (ACL-aware) retrieval prevents this by carrying each user's identity and entitlements into the retrieval step, so the candidate set is filtered to what that user can see before the model ever reads a word.

Off-the-shelf products treat this as a headline feature, with real differences underneath. Microsoft 365 Copilot inherits the Microsoft 365 permission model directly — it respects existing SharePoint/OneDrive/Teams permissions because it operates inside the same graph, which is its single biggest structural advantage for Microsoft-centric shops. Glean built document-level, identity-aware permissions across its connectors as a core design principle, mirroring source-system ACLs and updating as they change. Amazon Q Business supports identity-aware access control and integrates with IAM Identity Center so retrieval and responses honor each user's document-level permissions across connected sources.

On a custom Bedrock build you own this — which is both the risk and the point. Bedrock Knowledge Bases support metadata filtering, so you attach access metadata (groups, roles, classification, region) to each chunk at ingestion and pass the user's entitlements as filters at query time. You can express permission models an off-the-shelf product cannot: attribute-based access, row-level data-warehouse rules, dynamic entitlements from your own IdP, time-boxed access. The cost is that the correctness of that model is your responsibility, and it must be tested as rigorously as any authorization system — because that is exactly what it is.

Whichever path you choose, validate this with adversarial red-teaming, not a happy-path demo. Create test users at different privilege levels and ask questions whose answers live in documents each user should not see. A correct system returns "I don't have information on that" to the unprivileged user and the real answer to the privileged one. Run those probes during the trial, not after rollout.

the question to ask every vendor

"When a user asks something whose answer lives only in a document they are not permitted to read, what exactly happens?" The right answer is that retrieval never surfaces the document, so the model cannot cite or paraphrase it. Anything vaguer than that — "we post-filter the response," "we redact citations" — is a yellow flag worth probing hard, because filtering after retrieval leaves the privileged content inside the model's context.

criterion 3 of 6

VCriterion 3 — Accuracy and citations: can users trust and verify the answer?

An answer a user cannot verify is an answer they should not act on. Grounding quality and citation fidelity determine whether the assistant becomes a trusted daily tool or an unreliable novelty quietly abandoned after launch.

Accuracy in an enterprise assistant is mostly a retrieval problem, not a model-IQ problem. If the system retrieves the right passages, even a mid-tier model usually produces a correct grounded answer. If retrieval surfaces the wrong or stale passages, the smartest model in the world confidently summarizes the wrong thing. So evaluate the retrieval pipeline — chunking strategy, embedding quality, ranking, recency handling — at least as hard as you evaluate the generation model.

Citations are the trust mechanism. Every claim the assistant makes should link to the specific source passage it came from, so a user can click through and confirm. Strong implementations cite at the sentence or claim level and link to the exact location; weak ones cite a whole document, which is barely better than no citation when the document is forty pages. During evaluation, click the citations and check that they actually support the sentence they are attached to — citation theater (plausible-looking links that do not contain the claim) is a real and common failure, and it is invisible unless you check.

Off-the-shelf products provide grounded answers with citations out of the box; the differences are in chunking quality, freshness, and how granular and faithful the citations are — all observable in a trial on your own corpus. On a custom Bedrock build you control every accuracy lever: chunk size and overlap, embedding model, hybrid (semantic + keyword) search, re-ranking, recency boosts, and the grounding prompt that instructs the model to answer only from retrieved context and to say "I don't know" when the context is insufficient. That control is the main reason teams with unusual or high-stakes corpora choose to build — they can tune retrieval to their data instead of accepting a general-purpose default.

Measure accuracy, do not vibe-check it. Build a gold set of 50–200 real questions with known correct answers and known source documents, then score every option on the same set: answer correctness, citation correctness, and refusal behavior on questions whose answers are genuinely not in the corpus. A good assistant is not the one that always answers — it is the one that answers correctly when it can and declines when it cannot. An assistant that confidently fabricates 5% of the time is worse than useless in a regulated workflow, and you only catch that with a scored set.

Build a gold question set — 50–200 real questions with known answers and known source docs. The same set, run against every option, turns a subjective bake-off into a comparable score.
Click the citations — Verify that cited passages actually contain the claim. Citation theater is common and only surfaces when you check the links by hand.
Test refusal behavior — Ask things genuinely not in the corpus. A trustworthy assistant declines instead of inventing; reward that, do not penalize it.
Separate retrieval from generation — When an answer is wrong, diagnose whether retrieval missed the passage or the model misread it — the fixes are completely different, and conflating them wastes weeks.

criterion 4 of 6

VICriterion 4 — Security, residency, and data handling

Your assistant touches your most sensitive content by design. Where the data lives, where inference happens, whether prompts train anyone's model, and what you can prove to an auditor are gating questions in any regulated environment — and increasingly in unregulated ones too.

Start with the non-negotiables, because they eliminate options fast. Data residency: can processing and storage be pinned to specific regions (EU, UK, a GCC country, an in-country region) to satisfy GDPR, UK data rules, or local data-sovereignty law? Training-data isolation: is your content excluded from any model-training loop? For enterprise tiers, reputable vendors contractually guarantee that prompts and documents are not used to train foundation models — get it in writing, not in a slide. Tenancy and encryption: how is your data isolated from other customers, encrypted at rest and in transit, and can you bring your own KMS keys? Auditability: can you log who asked what, what was retrieved, and what was answered — and retain those logs for compliance?

Off-the-shelf products carry enterprise security postures and the usual attestations (SOC 2, ISO 27001, and so on), and the major suites offer residency options in many regions — but you operate inside the vendor's boundary and its region map. If you need a specific in-country region the vendor does not offer, or a data-flow guarantee their architecture does not make, that is a hard limit you discover during security review, not a setting you toggle.

A custom assistant on Bedrock inherits the AWS security and compliance substrate and gives you direct control over the boundary. Inference runs within your AWS account and chosen region; with Bedrock, your prompts and data are not used to train the underlying foundation models and are not shared with model providers. You control VPC placement, KMS encryption, IAM down to the action, CloudTrail logging, and the exact regions every component runs in. For organizations with strict residency or sovereignty requirements, this control is frequently the deciding factor — they build not because off-the-shelf is insecure, but because they need the boundary drawn on their own terms and provable on demand.

Translate your compliance obligations into concrete must-haves before you shortlist — "EU-only processing," "no third-party model providers see our data," "all queries logged and retained for N years" — and treat each as a pass/fail gate. It is far cheaper to eliminate an option in week one of security review than in month three of a stalled procurement.

residency is often the fork in the road

For many regulated and public-sector buyers, the build-vs-buy decision collapses to a single question: can processing and storage be pinned to the exact region and boundary we are legally required to use? If an off-the-shelf option offers it, buy is back on the table. If it does not, a custom Bedrock build in the required region is frequently the only path that clears legal — which is why residency is worth resolving before anything else on this list.

criterion 5 of 6

VIICriterion 5 — Cost shape: per-seat licensing vs per-token usage

The build-vs-buy cost question is not "which is cheaper" — it is "which cost <em>shape</em> fits how we'll actually use it." Per-seat and per-usage pricing cross over at a break-even point that depends entirely on your seat count and how heavily those seats engage.

Off-the-shelf assistants are predominantly per-seat, per-month. You pay a fixed amount for every licensed user whether they ask a hundred questions a day or none all month. The virtues are budget predictability and simplicity — finance loves a number that multiplies cleanly by headcount. The risk is paying for idle seats: license 5,000 users, see 1,200 actually engage, and you are funding 3,800 dormant licenses every month. Per-seat pricing rewards high engagement and punishes broad-but-shallow rollouts.

A custom assistant on Bedrock is predominantly per-usage — you pay per token (input and output) for inference, plus embeddings, vector storage, and the infrastructure around it. There is no charge for a user who does not ask anything. Costs scale with actual consumption, which is efficient for large user bases with light or uneven usage, and controllable through model selection (route simple queries to cheaper models, hard ones to premium), prompt-caching for repeated context, and batch processing where latency is not critical. The flip side is variability — heavy usage means a heavier bill, and you need monitoring and guardrails so an enthusiastic power user or a runaway integration does not surprise you.

The break-even is concrete enough to model on a napkin. If you have a modest number of heavy daily users, per-seat is often cheaper and far simpler. If you have a very large population where most people ask a few questions a week, per-usage is usually dramatically cheaper because you are not buying thousands of idle seats. Estimate seats × seat-price against expected-queries × average-tokens × token-price, run it across a low/expected/high usage range, and the curves tell you which shape fits. Do not forget the build path's engineering and operating cost — it is real, and for small deployments it can swamp any per-token savings. The honest comparison is total cost of ownership, not list price.

A pattern worth noting: per-usage economics improve as models get cheaper and as you tune (caching, right-sizing, batching), whereas per-seat costs are fixed by contract regardless of efficiency gains. Organizations that expect very large, uneven adoption — or that want their unit costs to fall as they optimize — often favor the usage model for that trajectory alone.

criterion 6 of 6

VIIICriterion 6 — Extensibility: answering vs acting

The first generation of enterprise assistants answered questions. The valuable generation takes actions — files the ticket, updates the record, kicks off the workflow. How far an option lets you extend from answering to acting often separates a useful tool from a transformative one.

There is a meaningful gap between "tell me our PTO policy" and "submit my PTO request for next week." The first is retrieval. The second is an action — the assistant must call a system, pass parameters, handle the response, and confirm. Extensibility is how much of that action-taking, and how much custom domain logic, an option supports — and it is where the build path tends to pull ahead.

Off-the-shelf products are steadily adding action capabilities — plugins, custom skills, and workflow integrations that let the assistant do more than answer. Amazon Q Business supports plugins to take actions in third-party systems and custom plugins for your own APIs. Microsoft 365 Copilot extends through Copilot agents and connectors across the Microsoft ecosystem and beyond. Glean offers actions and an agent/app-building layer on top of its search. These are real and improving, but you operate within each platform's extensibility model and its boundaries — you build what the platform lets you build, the way the platform lets you build it.

A custom assistant on Bedrock is extensibility-first by construction. With Bedrock Agents (and your own orchestration where you want more control) the assistant can call any API you expose, run multi-step workflows, chain tools, query a database and act on the result, and enforce arbitrary business logic and approval gates. You are not constrained by a vendor's plugin framework because you own the orchestration layer. For organizations whose roadmap is "the assistant should run real processes inside our systems," this ceiling-less extensibility is often the single strongest argument to build.

Be honest about your trajectory here, because it is easy to over- or under-buy. If you genuinely only need question-answering over documents, do not over-invest in extensibility you will not use — buy the turnkey product and move on. If your eighteen-month vision is an assistant that orchestrates work across your stack, weight this criterion heavily, because retrofitting deep action-taking onto a platform that was not built for it is exactly the kind of dead end that forces an expensive re-platform later.

Separate answering from acting — Decide whether you need retrieval only or retrieval plus actions. The honest answer reshapes the whole decision and the budget.
Map your target workflows — List the specific actions you want — file a ticket, update a record, trigger a pipeline — and check each option against that concrete list, not against "supports actions" in the abstract.
Respect the platform ceiling — Off-the-shelf actions live inside a vendor framework; a custom build has no ceiling but more to own. Choose against your real ambition, not the demo's.
Don't over-buy extensibility — If you only need Q&A, paying for action capability you will never wire up is wasted spend and wasted complexity.

decision framework

IXThe decision framework — a weighted scorecard you can actually run

Pull the six criteria together into one repeatable instrument. Score each option 1–5 per criterion, weight the criteria to your context, sum the weighted scores, and let the numbers turn a roomful of opinions into a defensible decision.

The mechanics are deliberately simple. Score every shortlisted option 1–5 on each of the six criteria (5 = excellent for your situation). Assign each criterion a weight from 1–5 reflecting how much it matters to you — a regulated bank weights security and permissions at 5; a fast-moving startup weights time-to-value and cost-shape higher. Multiply score × weight per criterion, sum across all six, and compare totals. The arithmetic is trivial; the value is that it forces an explicit, written conversation about what your organization actually prioritizes.

Two rules keep the scorecard honest. First, score against your data and requirements, not the vendor's demo environment — run a trial on your real corpus and your real permission model before assigning numbers. Second, treat any 1 on a criterion you weighted 5 as a likely disqualifier regardless of the total, because a high overall score cannot rescue a fatal flaw on something that matters existentially to you. A product that aces five criteria and scores 1 on residency is not a 4.2-average winner; it is disqualified if residency is a 5-weight gate.

Run the scorecard once with off-the-shelf assumptions and once assuming a custom build, then read the gap. If off-the-shelf wins comfortably, buy and stop deliberating. If custom wins on the criteria you weighted highest — typically permissions, security/residency, or extensibility — that is your signal to build on Bedrock. If they are close, the hybrid path (Path 3) usually captures the best of both, and "run both" is a legitimate, common outcome rather than a failure to decide.

how to read your result

A clear off-the-shelf win → buy and move fast. A clear custom win on your highest-weight criteria → build on Bedrock. A near-tie → hybrid: off-the-shelf for broad general knowledge, custom for the deep proprietary workflows. And a 1 on any criterion you weighted 5 → that option is out, whatever its total says.

the honest tells

XWhen custom-on-Bedrock wins (and when it doesn't)

Neutrality means saying plainly where each path is the right call. Custom-on-Bedrock is not the premium answer for everyone — it is the right answer in identifiable situations, and the wrong one in others. Here are the tells.

Lead with the situations where buying off-the-shelf is the better decision, because it is the right call more often than build-minded teams assume. Buy when your knowledge already lives largely in one vendor's ecosystem and a native assistant inherits its permissions cleanly — a heavily Microsoft 365 shop is the canonical case for Copilot. Buy when your sources are mainstream SaaS that off-the-shelf connectors cover well, when seat count is predictable enough that per-seat pricing is fine, when you need value in weeks rather than months, and when question-answering — not action-taking — is the goal. In all of these, an off-the-shelf assistant delivers most of the value for a fraction of the effort, and building would be over-engineering.

Custom-on-Bedrock wins when one or more of these is true: you have proprietary or legacy data stores that no off-the-shelf connector reaches and that hold your highest-value knowledge; you have strict residency or sovereignty requirements an off-the-shelf option cannot satisfy; you need specific model choice or control — a particular model, your own fine-tunes, or the freedom to swap models as the frontier moves; your permission model is unusual (attribute-based, row-level, dynamically computed) in ways a packaged product cannot express; your usage or seat economics make per-seat pricing punishing (huge population, light usage) and per-token efficiency compelling; or your roadmap requires deep extensibility and action-taking across your own systems beyond what a vendor framework allows. The more of these that apply, the stronger the case to build.

The decisive question underneath all of them is control versus convenience. Off-the-shelf trades control for convenience — a fair trade when the defaults fit. Custom trades convenience for control — worth it precisely when the defaults don't fit and the gap sits on a criterion you weighted high. If you cannot articulate a specific control you need that off-the-shelf won't give you, that is itself a strong signal to buy and get on with it.

And the honest synthesis, which is why mature programs so often run both: most enterprises have a broad base of general knowledge that off-the-shelf serves beautifully and one or two crown-jewel workflows where the data is proprietary, the stakes are high, and only a custom build clears the bar. Choosing "both" deliberately — buy the broad, build the deep — is frequently the most rational outcome of this entire exercise, not a hedge.

side by side

Off-the-shelf vs custom-on-Bedrock — the six criteria, compared

A neutral, generalized comparison across the six criteria. Your weights determine which column wins — this table just shows the characteristic tradeoffs each path carries before you apply them.

Criterion	Off-the-shelf (Q Business / Copilot / Glean)	Custom on Bedrock (Knowledge Bases + your data)
Data connectors	Large native libraries; fast for mainstream SaaS; long tail needs custom connectors	S3 + managed connectors first-class; any source reachable via your own ingestion
Permission / ACL-aware retrieval	Built-in, identity-aware; deepest where it owns the source graph (e.g. Copilot in M365)	You own it via metadata filtering; can model unusual ACLs off-the-shelf can't express
Accuracy + citations	Grounded answers with citations out of the box; limited tuning of the pipeline	Full control of chunking, embeddings, hybrid search, re-ranking, grounding prompts
Security + residency	Strong posture + attestations; bounded by vendor's region map and boundary	AWS substrate; you pin region, VPC, KMS, IAM; data not used to train Bedrock models
Cost shape	Predominantly per-seat / month — predictable; pays for idle seats	Predominantly per-usage / token — efficient at scale; variable, needs guardrails
Extensibility (actions)	Plugins, agents, workflows within the vendor's framework and ceiling	Bedrock Agents + your orchestration — call any API, no framework ceiling
Time to first value	Days to weeks — the fast lane	Weeks to a couple of months for a focused first use case

No column is universally "better." Weight each criterion to your context (a bank weights permissions + residency at 5; a startup weights time-to-value + cost-shape higher), score each option, and the winner falls out. A near-tie usually points to a hybrid: buy the broad, build the deep.

not sure which column is yours?

Pressure-test your build-vs-buy against the six criteria with an AWS partner

Talk it through in 24h →

a recent match

A build-vs-buy that landed on "both" — anonymized

inquiry · mid-market healthcare SaaS, ~900 employees

Healthcare SaaS company, ~900 employees, mostly on Microsoft 365, with a proprietary clinical-knowledge database and HIPAA obligations

Situation: Wanted one internal assistant for everyone. Two hard constraints collided: most general knowledge (HR, IT, policies, the company wiki) already lived in Microsoft 365, but the highest-value questions hit a proprietary clinical-knowledge store that no off-the-shelf connector reached — and PHI-adjacent content meant strict access controls and data-handling were non-negotiable. A single off-the-shelf tool couldn't reach the clinical store; a single custom build for the general 80% would have been over-engineering.

What CloudRoute did: Routed within 24 hours to a US-East AWS partner with healthcare and Bedrock experience. The partner ran the six-criterion scorecard with both assumptions. Result: off-the-shelf scored well on the Microsoft-365 general knowledge (native connectors, inherited permissions, fast); a custom Bedrock assistant scored far higher on the clinical store (custom ingestion, metadata-filtered ACL-aware retrieval, region-pinned data handling, sentence-level citations clinicians could verify). The recommendation was the hybrid path — buy for the broad base, build the clinical assistant on Bedrock Knowledge Bases. The Bedrock proof-of-concept was scoped against AWS POC/GenAI funding so the build phase was credit-offset.

Outcome: Hybrid rollout: off-the-shelf assistant live for general knowledge in ~3 weeks; the custom Bedrock clinical assistant reached a validated, citation-checked, permission-red-teamed POC in ~7 weeks, funded by AWS GenAI/POC credits. Clinicians got verifiable, access-controlled answers from the proprietary corpus; everyone else got fast general Q&A. CloudRoute's commission was paid by the partner from AWS engagement funding — the customer paid $0.

decision: hybrid (buy + build) · POC time: ~7 weeks · clinical answers: citation-checked + ACL-red-teamed · cost to customer: $0 (credit-funded)

faq

Common questions

Is it cheaper to buy an off-the-shelf AI assistant or build one on Bedrock?

It depends entirely on your seat count and usage shape, not on any list price. Off-the-shelf is typically per-seat per-month — predictable, but you pay for idle seats. A custom Bedrock assistant is typically per-usage (per token) — you pay nothing for users who don't ask anything, which is far cheaper for very large populations with light or uneven usage. For a modest number of heavy daily users, per-seat is often cheaper and simpler. Model seats × seat-price against expected-queries × average-tokens × token-price across low/expected/high usage, and remember to include the build path's engineering and operating cost. The honest comparison is total cost of ownership, not headline price.

What is permission-aware (ACL-aware) retrieval and why does it matter so much?

It means the retrieval step only ever surfaces content the asking user is allowed to see, by carrying their identity and entitlements into the search before the model reads anything. It matters because without it, an assistant that has indexed everything can answer a junior employee's question using an executive-only document — flattening the access controls that existed at the source and turning the assistant into a data-leak engine. It is the single most important criterion in the evaluation. Test it adversarially: create users at different privilege levels and ask questions whose answers live in documents they should not see; a correct system declines for the unprivileged user and answers for the privileged one.

When should we build a custom assistant on Bedrock instead of buying?

Build when one or more of these is true: you have proprietary or legacy data stores no off-the-shelf connector reaches that hold your highest-value knowledge; you have strict data-residency or sovereignty requirements an off-the-shelf option can't satisfy; you need specific model choice or control (a particular model, your own fine-tunes, freedom to swap); your permission model is unusual (attribute-based, row-level, dynamically computed); your usage or seat economics make per-seat pricing punishing; or your roadmap requires deep action-taking across your own systems beyond a vendor framework. The more that apply, the stronger the case. If you can't name a specific control off-the-shelf won't give you, buying is usually the right call.

Amazon Q Business vs Microsoft 365 Copilot vs Glean — how do they differ?

All three are off-the-shelf enterprise assistants with connectors, identity-aware permissions, grounded answers, and growing action capabilities — the chat experience converges. The differences are structural. Microsoft 365 Copilot is deepest inside the Microsoft graph and inherits M365 permissions directly, making it the strongest fit when your center of gravity is Microsoft 365. Glean built its reputation on connector breadth and quality plus document-level permission-aware search across the modern SaaS stack. Amazon Q Business is identity-aware, integrates with IAM Identity Center, connects to dozens of sources, and supports plugins for actions in your own and third-party systems. Choose against where your data lives and whose permission model you want to inherit.

Do off-the-shelf assistants respect our existing document permissions?

The major ones are designed to. Microsoft 365 Copilot inherits SharePoint/OneDrive/Teams permissions directly because it operates in the same graph. Glean mirrors source-system ACLs at the document level across its connectors and updates as they change. Amazon Q Business supports identity-aware access control via IAM Identity Center so retrieval honors document-level permissions across connected sources. The critical caveat: permission enforcement is only as good as connector configuration and identity mapping, so validate it with adversarial testing on your real data during the trial rather than trusting it by default. Misconfigured connectors are a far more common failure than the products lacking the capability.

How do we make sure the assistant's answers are accurate and trustworthy?

Treat accuracy primarily as a retrieval problem, not a model-IQ problem — if the right passages are retrieved, even a mid-tier model usually answers correctly. Build a gold set of 50–200 real questions with known correct answers and source documents, run every option against the same set, and score answer correctness, citation correctness, and refusal behavior on questions genuinely not in the corpus. Insist on granular citations (sentence- or claim-level, linking to the exact source) and click them to confirm they actually support the claim — citation theater is common. A trustworthy assistant declines when it can't ground an answer instead of fabricating; reward that behavior rather than penalizing it.

Can we get an enterprise assistant that takes actions, not just answers questions?

Yes, and it is increasingly the point. Off-the-shelf products add actions through plugins, agents, and workflow integrations within their frameworks — Amazon Q Business plugins, Microsoft 365 Copilot agents, Glean actions. A custom build on Bedrock is extensibility-first: with Bedrock Agents and your own orchestration the assistant can call any API you expose, run multi-step workflows, query a database and act on the result, and enforce arbitrary business logic and approval gates — no vendor framework ceiling. If your roadmap is "the assistant should run real processes inside our systems," weight extensibility heavily and lean toward building. If you only need document Q&A, don't over-buy action capability you won't use.

Is a hybrid approach — buy and build — a real option or a cop-out?

It is a real and common outcome for mature programs, not a failure to decide. The 80/20 rarely lines up: off-the-shelf serves a broad base of general knowledge cheaply and fast, while one or two crown-jewel workflows hold proprietary, high-stakes data that only a custom Bedrock build can reach and govern properly. Running an off-the-shelf assistant for the broad base and a custom assistant for the deep proprietary workflows captures the strengths of both. If your weighted scorecard produces a near-tie between buy and build, hybrid is usually the most rational result.

Decide build vs buy with people who've shipped both

CloudRoute routes you to a vetted AWS partner who runs the six-criterion scorecard on your real data — and, if you build on Bedrock, often scopes the work against AWS GenAI/POC funding. Customer pays $0. No procurement theater.

Get matched in 24h →→ see the AI-team detail

matched within< 24h

scorecard onyour data

cost to you$0