Guide

A practical guide to HIPAA-compliant AI gateways.

What HIPAA actually requires of AI inference, the five questions to ask any gateway, and where the per-request PHI pattern fits. Written for engineers and compliance leads who keep getting passed back and forth between each other.

June 9, 20268 min read

01 — Framing

The actual problem

HIPAA doesn't say 'no AI'. It says 'covered handling, with a BAA, all the way down the chain.' Most teams stumble on the chain.

HIPAA is a US law about how protected health information (PHI) is handled by covered entities (hospitals, insurers, certain providers) and their business associates (the vendors those entities pay to touch PHI). For AI inference, that translates into three concrete obligations:

One — PHI in both directions. The prompt you send and the completion you get back can both contain PHI. A note in, a summary out — both are covered. The data class is a property of the payload, not the endpoint.

Two — a Business Associate Agreement (BAA) with whoever touches the payload. If you pipe PHI through a gateway, the gateway needs a BAA. If the gateway forwards to a model provider, the model provider needs a BAA. If that provider runs on a cloud with managed inference, that cloud needs a BAA. The chain is only as covered as its least-covered link.

Three — auditability. You need to be able to answer, after the fact, who processed which request, where, and under whose agreement. "We send to OpenAI" is not an answer; "request 9f1c… at 14:02 was served by gpt-4o-mini in US-East under BAA dated 2024-03-11" is.

Most teams get the first one. Many get the second. The third is where audits go badly — because the gateway either didn't record the routing decision, or recorded it in a log it cannot actually produce three months later.

HIPAA, GDPR, nFADP are not interchangeable

A vendor that handles EU health data under GDPR / Switzerland's nFADP is not, by that fact, HIPAA-covered. The legal instrument is different (DPA vs BAA), the regulator is different, and the breach notification clocks differ. A gateway that calls itself "health-data ready" without naming the regime is doing you a disservice.

02 — Diligence

The five things to ask of any AI gateway

Use this as your shortlist before you sign. Each question maps to a place real deployments break.

1. Is a BAA available — and with whom? A gateway may offer its own BAA (it acts as your business associate) or pass through to the upstream provider's BAA. Both can be fine; what matters is that the chain closes. Ask explicitly: does the gateway carry liability for PHI it transmits, or is the gateway transparent and the liability sits with the upstream provider only? Get it in writing.

2. Where exactly does the data sit during inference?Not "in the cloud." Which region. Which sub-processor. Which country's legal jurisdiction. A gateway whose answer is "wherever the upstream picks" is not a residency story — it's an outsourced gamble.

3. Is enforcement per request, or account-wide? Account-level config ("turn on HIPAA mode in settings") is brittle: every team member, every code path, every retry inherits it whether they meant to or not. Per-request enforcement — a header, a flag, an explicit data class on the call — lets PHI and non-PHI traffic share one credential without contamination.

4. Are responses auditable after the fact? A 200 OK is not enough. You want the response to tell you which upstream model served the request, which region it ran in, and how many fail-overs were attempted before one succeeded. That trail is what a Security Officer signs against six months later, when the upstream lineup has changed twice.

5. What happens to logs? Three patterns: prompts and completions logged for N days (default at most major LLM providers), metadata-only logged (request id, model, region, token counts), or zero-retention (the gateway holds nothing beyond the life of the request). For PHI, you want option two or three, in writing.

03 — Pattern

PHI vs general traffic, on one credential

Most regulated teams have both data classes. Splitting them upstream — declared per request — is cleaner than running two stacks.

The instinct is to run two parallel deployments: a HIPAA-covered stack for PHI and a cheaper non-PHI stack for everything else (marketing copy, internal tools, ops automation). It works, but it doubles your integration surface and tempts engineers to pick the cheap stack for "just a quick test" with data that turns out to be PHI.

The alternative is a single OpenAI-compatible endpoint where the caller declares the data class on each request. The gateway routes PHI to PHI-eligible providers and general traffic to whichever provider is cheapest. One credential, one SDK, two routing policies.

# Same OpenAI client, two data classes — split per request.curl https://phi-cloud.com/api/v1/chat/completions \  -H "Authorization: Bearer phi_live_…" \  -H "X-PHI: true" \  -H "X-Region: CH" \  -H "Content-Type: application/json" \  -d '{    "model": "auto",    "messages": [      {"role": "user", "content": "Summarise this discharge note: …"}    ]  }'# The next request, from the same client, with no PHI:curl https://phi-cloud.com/api/v1/chat/completions \  -H "Authorization: Bearer phi_live_…" \  -H "X-PHI: false" \  -H "Content-Type: application/json" \  -d '{    "model": "auto",    "messages": [      {"role": "user", "content": "Draft a marketing tagline for a clinic."}    ]  }'

The two requests above use the same client and the same API key. The first declares PHI and is constrained to a covered provider in the pinned region (here, CH); the second is general traffic and routes to the cheapest available model anywhere. The discipline is at the call site — exactly where the engineer knows whether the payload is PHI.

04 — Audit

Where the regional honesty lives

An AI gateway worth deploying tells you, on every response, which provider and region answered the request. No exceptions.

The mechanism is unglamorous: HTTP response headers. They're cheap to emit, cheap to log, and they survive an SDK upgrade. The pattern you want from any gateway looks roughly like this:

< HTTP/2 200< x-phi-routed: gemma-3-4b/CH/phi< x-phi-tier: phi< x-phi-attempts: 1< x-request-id: 9f1c…

Three things to look for. First, the resolved route should be specific — model, region, data class — not just "OK". Second, a tier indicator so you can grep your access logs for any PHI request that landed on the wrong tier (there should be none). Third, an attempt count: PHI fail-over should stay inside the PHI-eligible set, never spill to a non-PHI provider. If a gateway can't show you that boundary held, it didn't.

What good looks like, in one sentence

Every PHI request returns a header that a regulator could read and verify against a sub-processor list — without anyone querying a log database.

05 — Jurisdiction

Match the jurisdiction to your obligation

There is no single 'best' region for PHI. The right jurisdiction is the one whose legal regime matches where your data is generated and which regulator you answer to.

The mistake is treating residency as a marketing preference rather than a legal mapping. PHI generated under US rules, EU rules, or Swiss rules each has a different governing instrument, and the cleanest deployment keeps the data in the jurisdiction whose law already applies to it — rather than routing it somewhere that merely sounds safer.

United States — HIPAA, via a BAA. If your obligation is HIPAA, residency alone is not enough: you need a sub-processor that carries an actual Business Associate Agreement. US-region inference on Azure (or AWS Bedrock) under a signed BAA is the direct fit, with the broadest model lineup.

European Union — GDPR, via a DPA. EU-generated health data stays cleanest in an EU region under a Data Processing Agreement with an EU-resident sub-processor (for example Scaleway), keeping the legal layer and the physical layer in the same jurisdiction.

Switzerland — nFADP, via a DPA. Switzerland's revised Federal Act on Data Protection (nFADP) is the closest non-EU law to GDPR, and the EU Commission's adequacy decision means data can flow EU → Switzerland without standard contractual clauses. Swiss providers such as Infomaniak publish a standard DPA covering health-data handling. As with the EU, this is an nFADP/GDPR residency story under a DPA — not, by itself, a HIPAA BAA, and these providers make no HIPAA representation.

The common thread across all three: residency and HIPAA coverage are different questions. A region keeps data under a given legal regime; a BAA assigns liability for PHI. Pick the region for the regime your data falls under, and add a BAA-carrying sub-processor wherever your obligation is specifically HIPAA.

06 — Tradeoff

The honest tradeoff

Per-request PHI gateways aren't the right answer for every team. Three honest verdicts:

If your PHI must stay in the US under a BAA today, AWS Bedrock and Azure OpenAI both have real, signed BAAs with US-region inference and the largest model lineups. phi-cloud also offers US PHI residency on Azure (under a HIPAA BAA) for the specific modalities it runs there — but if you need the broadest US model lineup, the hyperscalers are the more complete answer.

If your team is comfortable running a region-pinned proxy yourselves, build it. The routing logic is a few hundred lines, the auth is a signed token, and you keep full control of the audit trail. A gateway adds value when the sub-processor list, the per-region eligibility flags, and the failover logic become a moving target you don't want to own.

If you want OpenAI-API ergonomics with per-request PHI residency in your own region and per-request enforcement, the per-request gateway pattern fits. That includes phi-cloud — which routes PHI to region-resident, PHI-eligible providers (the US on Azure; the EU on Scaleway or Azure; Switzerland on Infomaniak or Azure Swiss North), runs general traffic on a stateless OpenAI-compatible endpoint, and emits the audit headers above on every response. Be precise about the legal regime: the EU/Swiss PHI posture is GDPR + nFADP under filed DPAs, while HIPAA BAA coverage is available specifically via the Azure-hosted routes. So if your obligation is HIPAA-first, either pin to phi-cloud's Azure routes or use AWS/Azure directly for the widest model lineup; if it is EU/Swiss data protection, the Infomaniak and Scaleway routes are the cleaner fit.

07 — Try it

Try the pattern

If the per-request gateway pattern fits, the integration is small.

Point your existing OpenAI client at https://phi-cloud.com/api/v1, add X-PHI and X-Region on each call, and inspect the response headers to verify the routing landed where you expected. That's the whole integration. Read the API reference for the headers and error codes, or grab a key if you want to try it against your own payloads.

Read the docs Get an API key