Capability · Chat completions

Chat and reasoning, region-pinned.

One OpenAI-compatible chat endpoint over every model family — open-weight and frontier — with the data-residency and PHI gate enforced on every call.

Get started Read the docs

POST /v1/chat/completions

What you get

Built for regulated workloads

Point any OpenAI SDK at the gateway and call chat the way you already do. The model can be a concrete row (claude-sonnet-4-6, gemini-2-5-flash) or a provider-agnostic canonical alias (gemma-4, qwen3.5, glm-5.2) that resolves to whichever provider hosts it in the region you pinned. Set X-Region and X-PHI per request; everything else is standard.

Tools

Function calling & JSON mode

tools, tool_choice and response_format pass straight through to OpenAI-compatible upstreams and are translated for Anthropic. Tool capability is qualified per model — the answer carries an x-phi-tools: supported|unsupported header.

Streaming

SSE streaming, billed correctly

stream:true returns a token-by-token SSE stream; usage is captured from the terminal chunk so billing stays exact. Passthrough for OpenAI-compatible upstreams, buffered→SSE for Anthropic.

Failover

In-region failover

If the resident provider errors, the gateway fails over down an ordered candidate list that never leaves your jurisdiction. The x-phi-attempts header records how many routes were tried.

Aliases

Canonical model aliases

gemma-4, qwen3.5 and glm-5.2 map to the right provider per region, so the same request string works in CH, EU and the US without you tracking which cloud hosts what.

PHI

Hard PHI gate

X-PHI:true restricts routing to verified PHI-eligible providers. Where no eligible provider exists for the region, the call is refused (phi_blocked) rather than failing over to an unverified route.

Math agent

Sandboxed code execution

/v1/math/chat/completions runs Python or R in a jurisdiction-pinned, network-isolated sandbox to compute and verify instead of guessing — same request shape, buffered response. Charts come back as inspectable SVG (XML, never raster) and diagrams as Mermaid; x-phi-sandbox-figures counts what was captured.

Availability & pricing

Where it runs, what it costs

Every route is region-resident and the PHI gate is enforced per call. Prices include the flat +10% gateway margin and mirror the live /v1/pricing rate card.

Region	Provider	Model	Tier	Price
CH	Infomaniak	gemma-4 (31B IT)	PHI	$0.24 / $0.48
CH	Infomaniak	qwen3.5 (397B)	PHI	$0.97 / $4.36
EU	Scaleway	gemma-4 (26B IT)	PHI	$0.30 / $0.59
EU	Scaleway	qwen3.6 (35B)	PHI	$0.47 / $1.78
US	Together	qwen3.5 (397B)	General	$0.77 / $3.85
US	Together	glm-5.2	General	$0.66 / $2.41
US / EU	Anthropic	claude-sonnet-4-6	General	$3.30 / $16.50
WORLD	Google	gemini-2-5-flash	General	$0.33 / $2.75

Token prices are per 1M tokens (input / output), gateway margin included. US chat on protected health information returns phi_blocked today — no US chat provider is PHI-verified yet (Azure OpenAI is staged). Full rate card at /pricing.

Try it

A real call, end to end

A PHI chat call in Switzerland is two headers on a standard request. The response tells you exactly where it ran.

curl

curl https://phi-cloud.com/api/v1/chat/completions \
  -H "Authorization: Bearer $PHI_API_KEY" \
  -H "X-Region: CH" \
  -H "X-PHI: true" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma-4",
    "messages": [
      {"role": "user", "content": "Summarise this discharge note..."}
    ]
  }'

response

x-phi-routed: infomaniak-gemma-4-31b/CH/phi
x-phi-tier: phi
x-phi-attempts: 1
x-phi-usage: verified
x-phi-cost-micro: 412

Residency & gating

CH PHI chat resolves to Infomaniak (Swiss-resident, nFADP/GDPR DPA verified).
EU PHI chat resolves to Scaleway (EU-resident, GDPR Art. 28 DPA, zero-retention, no-train).
US, UK and every other region carry general traffic on the resident cloud; US PHI chat is phi_blocked until a US chat provider is verified.
No prompt or completion is ever logged or persisted by the proxy.

Good to know

· n is dropped (multi-choice would break single-response billing).
· Streaming is disabled on the /math sandbox route — it must inspect tool calls.
· Vision rides this same endpoint as image_url content parts (see the Vision capability).

FAQ

Common questions

Yes. Set base_url to https://phi-cloud.com/api/v1 and keep your existing chat.completions code. Region and PHI are extra request headers (X-Region, X-PHI); nothing else changes.

A pinned model wins only if it serves the region. Otherwise the gateway falls back to the region-resident default for the task, or returns no_route if nothing serves it. Use canonical aliases to stay portable across regions.

Most chat families do. Gemma is the exception (no standard tool template). /v1/models exposes supports_tools per model and a chat call that sends tools gets an x-phi-tools header for the model that answered.

Never — not phi-cloud, and not the PHI-eligible providers, who are contracted to zero-retention / no-training terms. The proxy holds nothing.

Call /v1/math/chat/completions and the model can run Python (numpy/pandas/matplotlib/sympy/scipy) or R (ggplot2) in a network-isolated sandbox in your pinned region. Plots are captured as SVG and stitched into the answer as fenced ```svg blocks — XML you can inspect and re-style, not a base64 image — while flowcharts and sequence/ER diagrams come back as ```mermaid. One ephemeral sandbox per request, torn down after; code execution is general-traffic only (X-PHI is refused until the sandbox BAA lands).

Vision How routing works API reference Pricing

Ready when you are

Put chat & reasoning in production — without giving up your data.

Spin up a key in minutes. The residency and PHI posture above arrives unchanged.

Get started See pricing →

Free to test · Prepaid credits, no subscription · No data retained