HIPAA-compliant generative AI on AWS · the 2026 guide

HIPAA-compliant generative AI on AWS — the mechanics, the reference architecture, and what NOT to do.

A neutral, build-grade walkthrough of how protected health information (PHI) is actually kept compliant when you put a large language model in front of it: Bedrock HIPAA-eligibility and the AWS Business Associate Addendum, the no-training guarantee, Guardrails PII redaction, encryption with KMS, PrivateLink and in-region inference, de-identification, audit logging, human-in-the-loop, the compliant reference architecture for clinical and administrative use cases, and the specific mistakes that turn a promising pilot into a breach.

BAA covers
Bedrock
training on PHI
never
PHI on public net
$0 of it
POC → production
8–14 wk
TL;DR
  • HIPAA compliance for generative AI on AWS rests on three load-bearing facts. First, Amazon Bedrock is a HIPAA-eligible service, and AWS will sign a Business Associate Addendum (BAA) — so you can lawfully process protected health information (PHI) through it. Second, on Bedrock your prompts and completions are not used to train the foundation models and are not shared with the model providers, which is what makes putting PHI in a prompt defensible at all. Third, compliance is a shared responsibility: AWS secures the platform, but you own the configuration — encryption, access, network isolation, logging, and what you actually send.
  • A compliant system is an architecture, not a checkbox. The reference pattern keeps PHI inside your AWS account boundary end to end: KMS-encrypted storage and logs, VPC endpoints (AWS PrivateLink) so inference traffic never touches the public internet, an in-region HIPAA-eligible model, Bedrock Guardrails enforcing PII/PHI detection and redaction independent of the model, least-privilege IAM, and CloudTrail plus model-invocation logging for an auditable record. De-identify or minimize PHI wherever the use case allows, and keep a human in the loop for any clinical or high-stakes output.
  • The fastest way to fail is predictable: sending PHI to a consumer LLM or any endpoint not under your BAA, skipping the BAA entirely, treating a model output as a clinical decision, logging raw PHI into an unencrypted or over-shared store, or assuming "HIPAA-eligible service" means "automatically compliant." The building blocks support HIPAA; the obligations are yours. Most healthcare teams reach production fastest by pairing a narrow, low-risk first use case (administrative, not diagnostic) with a partner who has shipped PHI workloads before.
the ground truth

IWhat HIPAA actually requires of a generative-AI system

Before any architecture, get the mental model right. HIPAA does not ban large language models, does not mention AI, and does not certify products. It imposes obligations on how protected health information is handled — and a generative-AI system is just a new kind of software that touches that information. Almost every design decision in this guide is downstream of a single question: where does PHI go, and who is accountable for it there?

HIPAA — the Health Insurance Portability and Accountability Act — protects protected health information (PHI): individually identifiable health information held or transmitted by a covered entity (a provider, health plan, or clearinghouse) or a business associate acting on its behalf. The relevant machinery for software is the Security Rule (administrative, physical, and technical safeguards for electronic PHI), the Privacy Rule (limits on use and disclosure, including the minimum-necessary principle), and the Breach Notification Rule (what you must do when PHI is exposed).

Two legal concepts shape every AWS decision that follows. The first is the Business Associate Addendum (BAA): when a vendor processes PHI on your behalf, you need a signed BAA with them, and that vendor must in turn have BAAs with any of its subcontractors who touch PHI. The second is the principle that compliance is a property of a system and its operation, not of any single component. There is no such thing as a "HIPAA-certified" model or a "HIPAA-compliant" API call in isolation. A service can be HIPAA-eligible — meaning it is covered under the AWS BAA and can be used in a compliant workload — but whether your overall system is compliant depends on how you configure and run it.

For generative AI specifically, three risks dominate, and the rest of this guide is a systematic answer to them. Unauthorized disclosure — PHI leaking to a party without a BAA, most often a consumer LLM or a third-party API outside your account boundary. Improper secondary use — PHI being used to train or improve a model, a use the patient never authorized. Inaccurate output causing harm — a hallucination treated as fact in a clinical or coverage decision. Keep PHI under your BAA and inside your boundary, never train on it, and never let an unverified output drive a high-stakes decision.

One framing to carry throughout: this is an engineering and governance problem with well-trodden answers on AWS, not an unsolved research problem. Healthcare organizations run PHI workloads on AWS at scale every day. Generative AI adds one new component — the model call — but the safeguards around it are the same ones AWS has documented for years. Teams struggle when they treat the model as special and forget the boundary; they succeed when they treat it as one more service inside an architecture they already know how to secure.

this guide is not legal advice

This is a technical and architectural reference, written to help engineering and compliance teams reason about the building blocks. It is not legal advice and not a substitute for your own HIPAA risk analysis. Your obligations depend on your role (covered entity vs business associate), your data, and your jurisdiction. Confirm specifics with your compliance, privacy, and legal counsel, and validate the current scope of any AWS service against the official AWS HIPAA-eligible services list before you design around it.

why it is possible at all

IIThe three facts that make PHI-in-a-prompt defensible

Putting protected health information into a large language model sounds reckless until you understand the specific properties that make it lawful on AWS. There are three, and a privacy or security reviewer will ask about each one. They are the foundation the entire reference architecture rests on: if any one is missing — no BAA, a service that trains on your inputs, or PHI traversing the open internet — the rest of the controls cannot save you. Get crisp on all three and most of a HIPAA review for a Bedrock workload collapses into a short, documentable conversation.

Fact 1 — Amazon Bedrock is HIPAA-eligible and covered by the AWS BAA

What it means: Amazon Bedrock is on the list of AWS HIPAA-eligible services, and AWS offers a Business Associate Addendum that, once accepted, contractually covers your use of in-scope services for processing PHI. That single fact is what lets you lawfully send PHI through the Bedrock Runtime at all. Without a BAA in place, no amount of encryption makes the workload compliant — the legal relationship has to exist first.

How you get it: the AWS BAA is accepted through AWS Artifact and applies across your in-scope accounts (commonly organized so PHI-bearing accounts are clearly designated). Crucially, it covers HIPAA-eligible services used in the appropriate configuration — not every AWS service, and not every model-adjacent tool you might bolt on. Treat the eligible-services list as a hard boundary: if a component is not on it, PHI does not flow through it. And note that "Bedrock is eligible" does not extend to a third-party API, logging SaaS, analytics pixel, or model endpoint outside AWS that you call from your app — each such hop is outside your AWS BAA.

Fact 2 — your prompts and completions are not used to train the models

What it means: on Amazon Bedrock, the content you send to a foundation model (your prompts) and the content it returns (completions) are not used to train or improve the underlying base models, and are not shared with the third-party model providers. Inference happens within the AWS environment under your account. This addresses the "improper secondary use" risk head-on: the PHI in a prompt is processed to produce your answer and is not absorbed into a model someone else will later query.

Why it is the crux of the privacy review: the objection that most often stalls healthcare GenAI is "if we put patient data in, does it become training data?" On Bedrock, the documented answer is no. That is materially different from pasting PHI into a consumer chatbot, whose terms may permit the provider to use inputs to improve their service. The no-training guarantee plus account-boundary processing is the difference between a defensible architecture and a reportable breach. Respect its limit, though: it is a property of Bedrock, not of "LLMs" generically, and the moment your data leaves Bedrock for an endpoint with different terms it no longer applies — keep the model layer on a platform whose terms you have read and your BAA covers.

Fact 3 — compliance is shared: AWS secures the platform, you own the configuration

What it means: under the AWS Shared Responsibility Model, AWS is responsible for the security of the cloud (the physical infrastructure, the managed-service substrate), and you are responsible for security in the cloud (how you configure encryption, identity, network isolation, logging, and what data you choose to send). HIPAA eligibility gives you compliant building blocks; assembling them into a compliant system is your job.

Why it matters practically: almost every real-world HIPAA failure on a cloud platform is a configuration failure, not a platform failure — an over-permissive role, an unencrypted bucket, a log of raw PHI in a place it should not be, a public endpoint left open. The platform did its part; the workload was misconfigured. Naming this up front sets the right expectation with leadership: AWS hands you a compliant-capable kit, and the work is in using it correctly and proving you did.

What it asks of you: a documented risk analysis, technical safeguards (encryption, access control, audit controls, transmission security), and operational discipline (least privilege, logging, incident response). The next sections translate each of those into concrete AWS services and patterns.

the technical safeguards

IIIHandling PHI: encryption, isolation, redaction, and in-region inference

With the legal base in place, compliance becomes a set of concrete technical controls applied to where PHI lives and how it moves. These map directly onto the HIPAA Security Rule's technical safeguards — access control, encryption, audit controls, and transmission security — and onto specific AWS services. None of them is exotic; the discipline is in applying all of them, consistently, to every path PHI can take.

Think of PHI as having three states, and secure each one. At rest — in your source documents, your vector store, your databases, and your logs. In transit — moving between your application, the model, and your data stores. In use — sitting inside a prompt at inference time. The controls below cover all three.

  • Encryption with KMS (at rest) + TLS (in transit) — Encrypt every store that can hold PHI — S3 source documents, the vector store, DynamoDB conversation state, and especially logs — using AWS KMS, and run all traffic over TLS. Customer-managed keys (CMKs) give you control over key policy, rotation, and an auditable trail of key usage via CloudTrail. Encryption is table stakes for the Security Rule and the first thing a reviewer checks; treat your KMS key policy as a security control in its own right, scoped to the roles that genuinely need decrypt.
  • PrivateLink / VPC endpoints (keep PHI off the public internet) — Use VPC interface endpoints (AWS PrivateLink) for the Bedrock Runtime so inference calls travel over the AWS private network, not the public internet — PHI in a prompt never leaves AWS's network fabric on its way to the model. This is one of the most reassuring controls for a security team. Pair endpoint policies with your IAM controls to constrain exactly which principals can invoke models through the endpoint.
  • In-region / data-residency control — Pin inference and storage to the AWS Region(s) your obligations require, and confirm that the specific HIPAA-eligible model you intend to use is available there before you design around it — model and feature availability varies by Region. Keeping PHI in-geography matters for both HIPAA risk posture and any overlapping residency requirements (state law, organizational policy, or international frameworks if you operate outside the US).
  • Guardrails: PHI/PII detection and redaction (in use) — Amazon Bedrock Guardrails apply, independent of the model, sensitive-information filters that detect and redact or block PII — including health-relevant identifiers — in both the user input and the model output. Use Guardrails to strip identifiers the model does not need to see, to block disallowed content and denied topics, and to add contextual-grounding checks that flag answers unsupported by retrieved sources. Because Guardrails are decoupled from the model, one policy applies across every model you call and survives a model swap.
  • Least-privilege IAM and tenant isolation — Scope IAM so only the specific application roles that need to invoke a model or read a PHI store can do so, and nothing more. In multi-tenant or multi-facility systems, isolate each tenant's data at the storage and prompt-assembly layers so one patient population's context can never bleed into another's answer. Minimum-necessary is both a Privacy Rule principle and an access-control discipline.
  • Audit controls (CloudTrail + model-invocation logging) — Enable CloudTrail for API and configuration activity and Bedrock model-invocation logging for a record of inference calls, routed to an encrypted, access-controlled destination. This is your evidence trail for the Security Rule's audit-controls requirement and for breach investigation. Decide deliberately what these logs contain — capture enough to audit, but avoid persisting raw PHI in logs unless you have a specific, controlled reason and protect it accordingly.
the compliant blueprint

IVThe compliant reference architecture, end to end

Here is the canonical shape of a HIPAA-compliant generative-AI system on AWS, assembled from the controls above. The unifying principle is simple to state and demanding to honor: PHI stays inside your AWS account boundary, under your BAA, encrypted, logged, and off the public internet, from the moment it enters to the moment a result is returned to an authorized user. Most clinical and administrative use cases are a variation on this single pattern.

Trace one request through the system. An authenticated, authorized user (clinician, coder, care-team member) makes a request from your application. It hits your API tier inside a VPC — API Gateway plus Lambda, or a container on ECS Fargate — where IAM and your application authorization confirm the user may access this patient's data under minimum-necessary. If the use case is retrieval-grounded, you retrieve only the necessary records from a KMS-encrypted store (or a Bedrock Knowledge Base over an encrypted S3 corpus and vector store), applying tenant and record-level access controls so nothing out of scope is pulled in.

Before the model call, the request passes an input Guardrail that redacts or blocks identifiers the model does not need and screens for disallowed content. The inference call to Bedrock travels over a VPC endpoint (PrivateLink) to an in-region, HIPAA-eligible model — encrypted in transit, never on the public internet, never used to train it. The completion passes an output Guardrail for redaction, denied-topic enforcement, and a contextual-grounding check against the retrieved sources. The result returns to the authorized user, and for any clinical or high-stakes use, it is presented for human review rather than acted on automatically (Section VII). Throughout, CloudTrail and model-invocation logging write an encrypted, access-controlled audit trail.

Two architectural habits make this pattern robust rather than fragile. First, minimize and de-identify at the edges: pull the fewest records that answer the question, and strip identifiers the model does not need before they reach the prompt (Section V). Second, treat the boundary as sacred: every integration — logging, analytics, a third-party API, an email notification — is a potential PHI exit, so each is either inside your BAA and boundary or never sees PHI. Most breaches in cloud GenAI are not the model leaking; they are PHI escaping through an unconsidered side channel.

Clinical use cases (higher stakes, human-in-the-loop mandatory)

Examples: clinical documentation assistance (drafting notes from an encounter), summarizing a patient's longitudinal record for a clinician, surfacing relevant guidelines, or drafting patient-facing explanations for clinician review. These touch care decisions, so they carry the strictest posture: full PHI safeguards, tight retrieval scoping, contextual-grounding checks to suppress hallucination, and — non-negotiable — a qualified human reviews and owns the output before it informs care. The model drafts and assists; the clinician decides.

Administrative use cases (lower stakes, faster to ship)

Examples: prior-authorization and claims drafting, medical-coding assistance, call-center and inbox triage, eligibility Q&A, and back-office document processing. PHI may still be involved, so the same boundary, encryption, BAA, no-training, and logging controls all apply — but the consequence of an error is operational rather than clinical, and much of the work can run on de-identified or minimized data. This is why administrative use cases are the recommended first production deployment: they exercise the full compliance architecture against a lower blast radius, so you build the muscle before you reach for clinical workloads.

the one-sentence test

Before shipping any path, ask: "Can I name every place PHI travels in this request, and confirm each one is inside my AWS account, under my BAA, encrypted, and logged?" If you can answer that cleanly for every code path — including the error paths and the logging paths — you have the spine of a compliant system. If there is a single hop you cannot account for, that hop is your risk.

minimize what you expose

VDe-identification and data minimization — the cheapest risk reduction there is

The most reliable way to reduce PHI risk is to expose less PHI. HIPAA's minimum-necessary principle and its de-identification provisions are not just compliance language — they are a practical design tool. Every identifier you can remove before the model call is one less identifier that can leak, be logged, or be misused. De-identification done right can even move parts of a workload outside HIPAA's scope entirely.

HIPAA recognizes two formal routes to de-identified data. Safe Harbor removes eighteen specific categories of identifiers (names, geographic subdivisions smaller than a state, dates more specific than year tied to an individual, contact details, record numbers, biometric identifiers, full-face images, and so on), after which the data is no longer PHI. Expert Determination uses a qualified statistician to certify that re-identification risk is very small. Properly de-identified data falls outside HIPAA — but the bar is exacting, re-identification risk is real (especially with rich free-text notes), and getting it wrong is worse than not trying, so treat formal de-identification as a deliberate, validated process, not a regex pass.

Short of that, minimization is the everyday lever and applies to essentially every workload. Send the model the fewest records and fields that answer the question, and mask or tokenize identifiers it does not need to reason about — a summarization or coding task rarely needs the patient's name, address, or full MRN; it needs the clinical substance. Bedrock Guardrails' sensitive-information redaction can perform this stripping inline, and you can tokenize identifiers in your own pre-processing so even the prompt the model sees is reduced. Less PHI in the prompt means less in any log, less in any output, and a smaller surface for everything downstream.

The honest tradeoff: aggressive redaction can strip context the model needs, degrading answers, and over-tokenization can confuse a model left reasoning about opaque placeholders. The right calibration is use-case specific and is exactly what you validate against an evaluation set (Section VII) — confirm the de-identified or minimized inputs still produce acceptable outputs before you lock the policy in. Done well, minimization is a rare win-win: lower risk and a smaller, cheaper prompt.

the failure modes

VIWhat NOT to do — the mistakes that cause breaches

The failure modes for HIPAA generative AI are well known and almost entirely avoidable. Nearly every one is a variation on a single theme: PHI ending up somewhere it should not, or an unverified output being trusted as fact. Knowing the list turns most of them into a pre-flight checklist.

These are the patterns that turn a promising pilot into an incident report. Read them as hard "do not," not as "be careful."

  • Do NOT send PHI to a consumer or uncontracted LLM — Pasting patient data into a consumer chatbot, a personal API key, or any model endpoint not covered by a signed BAA is the single most common and most serious mistake. Those terms may permit the provider to use your inputs to improve their models, and there is no BAA — so it is both an improper secondary use and an unauthorized disclosure. PHI goes only to HIPAA-eligible services under your BAA, full stop.
  • Do NOT skip the BAA — No BAA means no lawful basis to process PHI on the service, regardless of how well you encrypt it. Accept the AWS BAA before any PHI touches Bedrock, and ensure any other vendor in the PHI path (and their subcontractors) is under a BAA too. "We'll sign it later" is not a configuration you can ship.
  • Do NOT treat model output as a clinical decision — A generative model produces plausible text, not verified medical truth, and it can hallucinate confidently. Never let an unreviewed output drive diagnosis, treatment, dosing, triage priority, or coverage determination. Position the model as drafting/assisting and keep a qualified human accountable for any clinical or high-stakes decision (Section VII expands on human-in-the-loop).
  • Do NOT log raw PHI carelessly — Verbose request/response logging that dumps full prompts and completions into an unencrypted, broadly-readable, or third-party log store is a classic silent breach. Decide deliberately what logs contain, encrypt and access-control them, redact PHI from logs you do not strictly need it in, and never pipe PHI-bearing prompts to an external observability SaaS that is not under a BAA.
  • Do NOT assume "HIPAA-eligible" means "automatically compliant" — Eligibility means a service can be used compliantly under the BAA — not that any use of it is compliant. You still own encryption, access, network isolation, logging, minimization, and your risk analysis. The platform is compliant-capable; the system is your responsibility.
  • Do NOT let PHI leak through side channels — The model is rarely the leak. PHI escapes through analytics pixels, error-reporting tools, email/SMS notifications, caches, third-party plugins, and "temporary" debug endpoints. Inventory every integration and confirm each is either inside your BAA and boundary or never sees PHI — including error paths and retries.
  • Do NOT fine-tune or build datasets on PHI without controls — Using PHI to fine-tune or to assemble training/eval datasets is a use that demands explicit governance: a lawful basis, de-identification where possible, strict access control, encryption, and assurance that the resulting artifacts (and the service performing the tuning) keep that data within your BAA and never expose it. Do not casually export PHI into notebooks, buckets, or pipelines to "experiment."
audit, evals, and humans

VIIAudit logging, evaluation, and human-in-the-loop

Encryption and a BAA make a system lawful to operate; governance makes it trustworthy and defensible over time. Three practices separate a healthcare GenAI system you can stand behind from one that merely demos well: an auditable record of what happened, an evaluation harness that proves quality and safety did not regress, and a human accountable for high-stakes output. None is optional in a regulated setting.

These are the operational disciplines a regulator, an auditor, or your own risk committee will ask about — and the ones that let you change the system confidently without wondering whether you just introduced a safety problem.

Audit logging — evidence, not an afterthought

The HIPAA Security Rule requires audit controls: the ability to record and examine activity in systems that contain PHI. On AWS that means CloudTrail for API and configuration changes and Bedrock model-invocation logging for inference activity, written to an encrypted, access-controlled destination with retention that matches your policy. Wire this from day one — reconstructing who accessed what after an incident is impossible if the logs were never captured. Be deliberate about log contents: enough to audit and investigate, without persisting raw PHI where it does not belong.

Evaluation — prove quality and safety, repeatedly

Build a representative evaluation set of real (appropriately de-identified) inputs with known-good outputs or clear acceptance criteria, and run it automatically. Bedrock Model Evaluation and RAG evaluation let you score accuracy, faithfulness/grounding, and safety so that when you change a prompt, a model, a redaction policy, or a chunking strategy, you can prove you did not regress — including that you did not start leaking identifiers or producing unsupported clinical claims. In a regulated setting the eval harness is also evidence: it demonstrates that you validate the system rather than trusting it. This is the highest-ROI investment in the whole build.

Human-in-the-loop — accountability for high-stakes output

Match oversight to risk. For clinical and other high-stakes outputs, a qualified human must review and own the result before it informs a decision — the model drafts, the professional decides and is accountable. For lower-stakes administrative tasks, oversight can be lighter (sampling, exception review, confidence thresholds that route uncertain cases to a person). Design the human checkpoint into the workflow rather than bolting it on; "a clinician could review it if they wanted" is not a control, whereas "the draft cannot be finalized without clinician sign-off" is. Contextual-grounding checks and citations from Guardrails and RAG make that human review faster and more reliable by showing the sources behind each claim.

getting it built

VIIIHow a partner builds it — and why it is often credit- or POC-funded

Most healthcare organizations have the clinical and domain expertise but not a team that has shipped a PHI-bearing generative-AI system on AWS before. That gap — not the technology — is the usual reason these projects stall. The reliable path is the same crawl-walk-run staging used for any production GenAI, with HIPAA controls baked into every stage, executed by people who have done it under a BAA before.

A capable AWS partner with healthcare experience does a few things that are hard to get right the first time. They scope the first use case for low blast radius — typically administrative, not diagnostic — so the organization validates the full compliance architecture against manageable risk. They stand up the boundary correctly from the start: BAA confirmed, KMS encryption, PrivateLink, in-region eligible model, Guardrails for PHI redaction, least-privilege IAM, and CloudTrail plus model-invocation logging. They build the eval harness and the human-in-the-loop workflow as first-class deliverables. And they produce the documentation — architecture, data-flow diagrams, control mappings — that your compliance team needs to sign off and your auditors will later ask for.

The funding angle matters in healthcare specifically, where budgets are tight and procurement is slow. Generative-AI build work on AWS is frequently delivered as an AWS-funded proof of concept, and surrounding migration or modernization work can draw on AWS funding programs too. In those structures AWS underwrites the engagement, so the build can be substantially or fully credit-covered — a production-grade, compliance-reviewed system without an open-ended consulting bill. It is the same mechanic that funds GenAI POCs across industries, applied to a regulated workload where the documentation and control rigor are higher.

The crawl-walk-run staging looks like this in a HIPAA context. Crawl (prove value, ~2 weeks): one narrow administrative use case, the full boundary in place even for the pilot, a small de-identified eval set, and an "is this useful and safe?" gate with real users. Walk (harden, ~4–8 weeks): the complete control set, the automated eval harness, the human-review workflow, audit logging verified, and a compliance review against your risk analysis. Run (scale, ongoing): expand to more administrative use cases and, only with controls and oversight proven, carefully toward clinical assistance — re-evaluating models and policies on a regular cadence. The partner accelerates each stage; the staging itself is what keeps a regulated project from collapsing under its own risk.

compliant vs non-compliant

Compliant Bedrock pattern vs the shortcuts that cause breaches

The difference between a defensible HIPAA generative-AI system and a breach waiting to happen is rarely subtle. This table puts the compliant pattern beside the two shortcuts teams reach for under deadline pressure — a consumer LLM, or Bedrock used without the safeguards. Read it as a gut check before you ship.

ControlCompliant Bedrock patternConsumer / uncontracted LLMBedrock without safeguards
BAA in placeYes — AWS BAA accepted, covers BedrockNo — unlawful basis for PHIMaybe — but undermined by the gaps below
Training on your dataNo — prompts/completions not used to train FMsOften yes — inputs may train the provider's modelsNo (Bedrock guarantee) — but other gaps remain
Network path for PHIPrivate — VPC endpoint (PrivateLink), off public internetPublic internet to a third-party APIOften public Bedrock endpoint, not PrivateLink
Encryption (rest + transit)KMS at rest + TLS in transit, everywhereOutside your controlPartial — stores or logs left unencrypted
PHI/PII redactionGuardrails redact identifiers in + outNone enforcedNone — raw PHI in every prompt and log
Audit trailCloudTrail + model-invocation logging, encryptedNone you controlMissing or logging raw PHI insecurely
Human-in-the-loopRequired for clinical / high-stakes outputUndefinedOften output trusted directly
Net postureDefensible, documented, auditableReportable breachEligible service, non-compliant system
HIPAA eligibility is necessary but not sufficient: the right-hand column shows that even Bedrock becomes non-compliant if you skip the configuration. The left column is the only one of the three you can stand behind in a risk analysis or an audit.
want the compliant architecture built and documented for you?
Get matched with a HIPAA-experienced AWS partner — often AWS-funded
Start in 3 minutes →
a recent match

A HIPAA-compliant prior-authorization assistant — anonymized

inquiry · digital-health company, prior-auth automation, US
Venture-backed US digital-health company, ~30 engineers, on AWS, wanted to draft prior-authorization requests from patient records

Situation: A weekend prototype that drafted prior-auth letters from clinical notes impressed leadership, but it was built against a consumer LLM API with PHI flowing straight to a third party — no BAA, no redaction, raw prompts logged to an external tool. Compliance halted any rollout. The team had strong product and clinical knowledge but had never shipped a PHI-bearing system on AWS, and they could not afford an open-ended consulting engagement to figure it out.

What CloudRoute did: Routed within a day to a vetted AWS partner with healthcare and HIPAA delivery experience. The partner confirmed the AWS BAA, rebuilt the workflow on Amazon Bedrock with a HIPAA-eligible model in-region, moved all inference behind a VPC endpoint (PrivateLink), added KMS encryption across the document store, vector store, and logs, and wired Bedrock Guardrails to redact identifiers the model did not need on both input and output. They scoped retrieval to the minimum-necessary records, stood up CloudTrail and model-invocation logging into an encrypted store, built a 75-example de-identified eval set in Bedrock Model Evaluation, and designed a human-in-the-loop step so no prior-auth draft could be finalized without staff sign-off. The work was filed as an AWS-funded GenAI POC, so the build was credit-covered.

Outcome: Compliance signed off after one architecture review on the BAA + no-training + PrivateLink + KMS + Guardrails + audit-logging posture, backed by the data-flow documentation the partner produced. The assistant reached production in 11 weeks as an administrative (non-clinical) drafting tool with mandatory human review. Redaction plus minimum-necessary retrieval kept PHI exposure tight, and the eval harness gave the team confidence to iterate. CloudRoute's commission was paid by the partner from AWS engagement funding — the customer paid $0.

POC → production: 11 weeks · compliance review: 1 meeting · PHI on public internet: none · cost to customer: $0

faq

Common questions

Is Amazon Bedrock HIPAA-compliant?
Amazon Bedrock is a HIPAA-eligible service, and AWS will sign a Business Associate Addendum (BAA) that covers using it to process protected health information (PHI). But "HIPAA-eligible" means the service can be used in a compliant workload — not that any use of it is automatically compliant. Compliance is a shared responsibility: AWS secures the platform, and you own the configuration (encryption with KMS, private networking via PrivateLink, least-privilege IAM, Guardrails for PHI redaction, audit logging with CloudTrail, data minimization) and your own HIPAA risk analysis. Bedrock gives you compliant building blocks; your overall system is compliant only if you assemble and operate them correctly.
Do I need a BAA with AWS to use generative AI with PHI?
Yes. To lawfully process PHI on a cloud service you need a signed Business Associate Addendum (BAA) with that vendor. The AWS BAA is accepted through AWS Artifact and covers HIPAA-eligible services such as Amazon Bedrock used in the appropriate configuration. Accept the BAA before any PHI touches the service. Also ensure any other vendor in the PHI path — and their subcontractors — is under a BAA; if a third-party API, logging tool, or model endpoint outside AWS sees PHI without a BAA, that is an unauthorized disclosure.
Is my PHI used to train the foundation models on Bedrock?
No. On Amazon Bedrock, your prompts and completions are not used to train or improve the underlying foundation models and are not shared with the third-party model providers; inference runs within the AWS environment under your account. This is the property that makes putting PHI in a prompt defensible and addresses HIPAA's improper-secondary-use risk directly. It is materially different from consumer chatbots, whose terms may permit the provider to use your inputs to improve their service. Note the guarantee is a property of Bedrock — it does not follow your data to an endpoint outside Bedrock with different terms.
How do I keep PHI from leaking when using an LLM?
Keep PHI inside your AWS account boundary, under your BAA, encrypted, logged, and off the public internet end to end. Concretely: encrypt every store that can hold PHI (including logs) with KMS; route inference over a VPC endpoint (PrivateLink) so it never traverses the public internet; use Bedrock Guardrails to redact identifiers the model does not need on input and output; apply least-privilege IAM and tenant isolation; minimize and de-identify so you send the fewest records and fields necessary; and inventory every integration (analytics, error reporting, notifications, third-party APIs) to confirm each is either under your BAA and inside your boundary or never sees PHI. Most leaks are side channels, not the model itself.
Can I use a generative-AI output to make a clinical decision?
No — not without a qualified human reviewing and owning it. A generative model produces plausible text, not verified medical truth, and can hallucinate confidently, so an unreviewed output must never drive diagnosis, treatment, dosing, triage, or coverage decisions. Position the model as drafting and assisting, and keep a human accountable for any clinical or high-stakes result. Design the checkpoint into the workflow so a draft cannot be finalized without sign-off; contextual-grounding checks and source citations from Guardrails and RAG make that review faster and more reliable.
What is the difference between HIPAA-eligible and HIPAA-compliant?
HIPAA-eligible describes a service: it is covered under the AWS BAA and can be used as part of a compliant workload. HIPAA-compliant describes a system and how it is operated. A HIPAA-eligible service used without encryption, without a BAA in place, with PHI on the public internet, or with raw PHI dumped into open logs is not a compliant system. Eligibility is necessary but not sufficient — you still own encryption, access control, network isolation, audit logging, data minimization, human oversight, and your risk analysis. Always validate the current scope of HIPAA-eligible services against the official AWS list before designing around one.
Should I de-identify PHI before sending it to a model?
Wherever the use case allows, yes — exposing less PHI is the cheapest risk reduction available. Properly de-identified data (via HIPAA Safe Harbor's removal of 18 identifier categories, or Expert Determination) falls outside HIPAA, though the bar is exacting and re-identification risk in rich free text is real, so treat it as a validated process, not a regex pass. Short of that, minimize: send the fewest records and fields that answer the question and mask or tokenize identifiers the model does not need (Bedrock Guardrails can redact inline). Validate that minimized inputs still produce acceptable outputs against an evaluation set, since over-redaction can strip context.
What does a realistic path to a HIPAA-compliant GenAI system look like?
Crawl-walk-run, with HIPAA controls in place from the first stage. Crawl (~2 weeks): one narrow, low-stakes administrative use case (not clinical), the full boundary in place even for the pilot, a small de-identified eval set, and a usefulness/safety gate. Walk (~4–8 weeks): the complete control set (BAA, KMS, PrivateLink, in-region eligible model, Guardrails, least-privilege IAM, audit logging), an automated eval harness, the human-in-the-loop workflow, and a compliance review against your risk analysis. Run (ongoing): expand to more administrative use cases and, only once controls and oversight are proven, carefully toward clinical assistance. Many teams have a partner build this, often as an AWS-funded POC.

Want HIPAA-compliant generative AI on AWS — built right and signed off?

CloudRoute routes you to a vetted AWS partner with healthcare experience who stands up the compliant architecture (BAA, KMS, PrivateLink, Guardrails, audit logging, human-in-the-loop), produces the documentation your compliance team needs, and ships it — often as an AWS-funded GenAI POC, so you pay $0. No procurement. No open-ended consulting bill.

matched within< 24h
POC → production8–14 wk
cost to you$0
HIPAA-Compliant Generative AI on AWS — The 2026 Guide · CloudRoute