Blog

Guide

A practical guide to HIPAA-compliant AI gateways.

What HIPAA actually requires of AI inference, the five questions to ask any gateway, and where the per-request PHI pattern fits. Written for engineers and compliance leads who keep getting passed back and forth between each other.

8 min read

01 — Framing

The actual problem

HIPAA doesn't say 'no AI'. It says 'covered handling, with a BAA, all the way down the chain.' Most teams stumble on the chain.

HIPAA is a US law about how protected health information (PHI) is handled by covered entities (hospitals, insurers, certain providers) and their business associates (the vendors those entities pay to touch PHI). For AI inference, that translates into three concrete obligations:

One — PHI in both directions. The prompt you send and the completion you get back can both contain PHI. A note in, a summary out — both are covered. The data class is a property of the payload, not the endpoint.

Two — a Business Associate Agreement (BAA) with whoever touches the payload. If you pipe PHI through a gateway, the gateway needs a BAA. If the gateway forwards to a model provider, the model provider needs a BAA. If that provider runs on a cloud with managed inference, that cloud needs a BAA. The chain is only as covered as its least-covered link.

Three — auditability. You need to be able to answer, after the fact, who processed which request, where, and under whose agreement. "We send to OpenAI" is not an answer; "request 9f1c… at 14:02 was served by gpt-4o-mini in US-East under BAA dated 2024-03-11" is.

Most teams get the first one. Many get the second. The third is where audits go badly — because the gateway either didn't record the routing decision, or recorded it in a log it cannot actually produce three months later.

HIPAA, GDPR, nFADP are not interchangeable

A vendor that handles EU health data under GDPR / Switzerland's nFADP is not, by that fact, HIPAA-covered. The legal instrument is different (DPA vs BAA), the regulator is different, and the breach notification clocks differ. A gateway that calls itself "health-data ready" without naming the regime is doing you a disservice.

02 — Diligence

The five things to ask of any AI gateway

Use this as your shortlist before you sign. Each question maps to a place real deployments break.

1. Is a BAA available — and with whom? A gateway may offer its own BAA (it acts as your business associate) or pass through to the upstream provider's BAA. Both can be fine; what matters is that the chain closes. Ask explicitly: does the gateway carry liability for PHI it transmits, or is the gateway transparent and the liability sits with the upstream provider only? Get it in writing.

2. Where exactly does the data sit during inference?Not "in the cloud." Which region. Which sub-processor. Which country's legal jurisdiction. A gateway whose answer is "wherever the upstream picks" is not a residency story — it's an outsourced gamble.

3. Is enforcement per request, or account-wide? Account-level config ("turn on HIPAA mode in settings") is brittle: every team member, every code path, every retry inherits it whether they meant to or not. Per-request enforcement — a header, a flag, an explicit data class on the call — lets PHI and non-PHI traffic share one credential without contamination.

4. Are responses auditable after the fact? A 200 OK is not enough. You want the response to tell you which upstream model served the request, which region it ran in, and how many fail-overs were attempted before one succeeded. That trail is what a Security Officer signs against six months later, when the upstream lineup has changed twice.

5. What happens to logs? Three patterns: prompts and completions logged for N days (default at most major LLM providers), metadata-only logged (request id, model, region, token counts), or zero-retention (the gateway holds nothing beyond the life of the request). For PHI, you want option two or three, in writing.

03 — Pattern

PHI vs general traffic, on one credential

Most regulated teams have both data classes. Splitting them upstream — declared per request — is cleaner than running two stacks.

The instinct is to run two parallel deployments: a HIPAA-covered stack for PHI and a cheaper non-PHI stack for everything else (marketing copy, internal tools, ops automation). It works, but it doubles your integration surface and tempts engineers to pick the cheap stack for "just a quick test" with data that turns out to be PHI.

The alternative is a single OpenAI-compatible endpoint where the caller declares the data class on each request. The gateway routes PHI to PHI-eligible providers and general traffic to whichever provider is cheapest. One credential, one SDK, two routing policies.

# Same OpenAI client, two data classes — split per request.curl https://api.phi-cloud.com/v1/chat/completions \  -H "Authorization: Bearer phi_live_…" \  -H "X-PHI: true" \  -H "X-Region: CH" \  -H "Content-Type: application/json" \  -d '{    "model": "auto",    "messages": [      {"role": "user", "content": "Summarise this discharge note: …"}    ]  }'# The next request, from the same client, with no PHI:curl https://api.phi-cloud.com/v1/chat/completions \  -H "Authorization: Bearer phi_live_…" \  -H "X-PHI: false" \  -H "Content-Type: application/json" \  -d '{    "model": "auto",    "messages": [      {"role": "user", "content": "Draft a marketing tagline for a clinic."}    ]  }'

The two requests above use the same client and the same API key. The first declares PHI and is constrained to a covered provider in Switzerland; the second is general traffic and routes to the cheapest available model anywhere. The discipline is at the call site — exactly where the engineer knows whether the payload is PHI.

04 — Audit

Where the regional honesty lives

An AI gateway worth deploying tells you, on every response, which provider and region answered the request. No exceptions.

The mechanism is unglamorous: HTTP response headers. They're cheap to emit, cheap to log, and they survive an SDK upgrade. The pattern you want from any gateway looks roughly like this:

< HTTP/2 200< x-phi-routed: gemma-3-4b/CH/phi< x-phi-tier: phi< x-phi-attempts: 1< x-request-id: 9f1c…

Three things to look for. First, the resolved route should be specific — model, region, data class — not just "OK". Second, a tier indicator so you can grep your access logs for any PHI request that landed on the wrong tier (there should be none). Third, an attempt count: PHI fail-over should stay inside the PHI-eligible set, never spill to a non-PHI provider. If a gateway can't show you that boundary held, it didn't.

What good looks like, in one sentence

Every PHI request returns a header that a regulator could read and verify against a sub-processor list — without anyone querying a log database.

05 — Jurisdiction

The Swiss-anchor pattern, for EU and Swiss teams

If your PHI is generated under EU or Swiss residency rules, Switzerland is the cleanest jurisdiction available today.

Three reasons. Switzerland's revised Federal Act on Data Protection (nFADP) is the closest non-EU law to GDPR, and the EU Commission's adequacy decision means data can flow EU → Switzerland without standard contractual clauses. Major Swiss cloud providers, notably Infomaniak, publish a standard Data Processing Agreement covering health-data handling, with no US-side parent and no transfer abroad in their default terms. Politically, Switzerland sits outside both US CLOUD Act and EU subpoena reach for data resident on Swiss soil at a Swiss provider.

For US PHI, this is irrelevant — HIPAA doesn't care about Swiss law, and a Swiss provider without a US BAA is not a HIPAA answer. For EU and Swiss health data, the Swiss anchor is a cleaner residency story than "EU region of a US cloud" because the legal layer is also local.

06 — Tradeoff

The honest tradeoff

Per-request PHI gateways aren't the right answer for every team. Three honest verdicts:

If your PHI must stay in the US under a BAA today, you want AWS Bedrock or Azure OpenAI. Both have real, signed BAAs with US-region inference and the largest model lineups. A Swiss-anchored gateway is the wrong shape for US-resident PHI; the data should never cross the Atlantic in the first place.

If your team is comfortable running a region-pinned proxy yourselves, build it. The routing logic is a few hundred lines, the auth is a signed token, and you keep full control of the audit trail. A gateway adds value when the sub-processor list, the per-region eligibility flags, and the failover logic become a moving target you don't want to own.

If you want OpenAI-API ergonomics with Swiss-anchored PHI residency and per-request enforcement, the per-request gateway pattern fits. That includes phi-cloud — which today routes PHI to Infomaniak in Switzerland under nFADP and GDPR, runs general traffic on a stateless OpenAI-compatible endpoint, and emits the audit headers above on every response. phi-cloud's HIPAA BAA work is in progress, not earned — so if your obligation is HIPAA-first, talk to AWS or Azure today and check back when our BAAs with Anthropic and OpenAI land.

07 — Try it

Try the pattern

If the per-request gateway pattern fits, the integration is small.

Point your existing OpenAI client at https://api.phi-cloud.com/v1, add X-PHI and X-Region on each call, and inspect the response headers to verify the routing landed where you expected. That's the whole integration. Read the API reference for the headers and error codes, or grab a key if you want to try it against your own payloads.