Capability · Chat completions

Chat and reasoning, region-pinned.

One OpenAI-compatible chat endpoint over every model family — open-weight and frontier — with the data-residency and PHI gate enforced on every call.

POST /v1/chat/completions

What you get

Built for regulated workloads

Point any OpenAI SDK at the gateway and call chat the way you already do. The model can be a concrete row (claude-sonnet-4-6, gemini-2-5-flash) or a provider-agnostic canonical alias (gemma-4, qwen3.5, glm-5.2) that resolves to whichever provider hosts it in the region you pinned. Set X-Region and X-PHI per request; everything else is standard.

Tools

Function calling & JSON mode

tools, tool_choice and response_format pass straight through to OpenAI-compatible upstreams and are translated for Anthropic. Tool capability is qualified per model — the answer carries an x-phi-tools: supported|unsupported header.
Streaming

SSE streaming, billed correctly

stream:true returns a token-by-token SSE stream; usage is captured from the terminal chunk so billing stays exact. Passthrough for OpenAI-compatible upstreams, buffered→SSE for Anthropic.
Failover

In-region failover

If the resident provider errors, the gateway fails over down an ordered candidate list that never leaves your jurisdiction. The x-phi-attempts header records how many routes were tried.
Aliases

Canonical model aliases

gemma-4, qwen3.5 and glm-5.2 map to the right provider per region, so the same request string works in CH, EU and the US without you tracking which cloud hosts what.
PHI

Hard PHI gate

X-PHI:true restricts routing to verified PHI-eligible providers. Where no eligible provider exists for the region, the call is refused (phi_blocked) rather than failing over to an unverified route.
Math agent

Sandboxed code execution

/v1/math/chat/completions runs Python or R in a jurisdiction-pinned, network-isolated sandbox to compute and verify instead of guessing — same request shape, buffered response. Charts come back as inspectable SVG (XML, never raster) and diagrams as Mermaid; x-phi-sandbox-figures counts what was captured.

Availability & pricing

Where it runs, what it costs

Every route is region-resident and the PHI gate is enforced per call. Prices include the flat +10% gateway margin and mirror the live /v1/pricing rate card.

RegionProviderModelTierPrice
CHInfomaniakgemma-4 (31B IT)PHI$0.24 / $0.48
CHInfomaniakqwen3.5 (397B)PHI$0.97 / $4.36
EUScalewaygemma-4 (26B IT)PHI$0.30 / $0.59
EUScalewayqwen3.6 (35B)PHI$0.47 / $1.78
USTogetherqwen3.5 (397B)General$0.77 / $3.85
USTogetherglm-5.2General$0.66 / $2.41
US / EUAnthropicclaude-sonnet-4-6General$3.30 / $16.50
WORLDGooglegemini-2-5-flashGeneral$0.33 / $2.75

Token prices are per 1M tokens (input / output), gateway margin included. US chat on protected health information returns phi_blocked today — no US chat provider is PHI-verified yet (Azure OpenAI is staged). Full rate card at /pricing.

Try it

A real call, end to end

A PHI chat call in Switzerland is two headers on a standard request. The response tells you exactly where it ran.

curl
curl https://phi-cloud.com/api/v1/chat/completions \
  -H "Authorization: Bearer $PHI_API_KEY" \
  -H "X-Region: CH" \
  -H "X-PHI: true" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma-4",
    "messages": [
      {"role": "user", "content": "Summarise this discharge note..."}
    ]
  }'
response
x-phi-routed: infomaniak-gemma-4-31b/CH/phi
x-phi-tier: phi
x-phi-attempts: 1
x-phi-usage: verified
x-phi-cost-micro: 412

Residency & gating

  • CH PHI chat resolves to Infomaniak (Swiss-resident, nFADP/GDPR DPA verified).
  • EU PHI chat resolves to Scaleway (EU-resident, GDPR Art. 28 DPA, zero-retention, no-train).
  • US, UK and every other region carry general traffic on the resident cloud; US PHI chat is phi_blocked until a US chat provider is verified.
  • No prompt or completion is ever logged or persisted by the proxy.
Good to know
  • · n is dropped (multi-choice would break single-response billing).
  • · Streaming is disabled on the /math sandbox route — it must inspect tool calls.
  • · Vision rides this same endpoint as image_url content parts (see the Vision capability).

FAQ

Common questions

Yes. Set base_url to https://phi-cloud.com/api/v1 and keep your existing chat.completions code. Region and PHI are extra request headers (X-Region, X-PHI); nothing else changes.
A pinned model wins only if it serves the region. Otherwise the gateway falls back to the region-resident default for the task, or returns no_route if nothing serves it. Use canonical aliases to stay portable across regions.
Most chat families do. Gemma is the exception (no standard tool template). /v1/models exposes supports_tools per model and a chat call that sends tools gets an x-phi-tools header for the model that answered.
Never — not phi-cloud, and not the PHI-eligible providers, who are contracted to zero-retention / no-training terms. The proxy holds nothing.
Call /v1/math/chat/completions and the model can run Python (numpy/pandas/matplotlib/sympy/scipy) or R (ggplot2) in a network-isolated sandbox in your pinned region. Plots are captured as SVG and stitched into the answer as fenced ```svg blocks — XML you can inspect and re-style, not a base64 image — while flowcharts and sequence/ER diagrams come back as ```mermaid. One ephemeral sandbox per request, torn down after; code execution is general-traffic only (X-PHI is refused until the sandbox BAA lands).

Ready when you are

Put chat & reasoning in production — without giving up your data.

Spin up a key in minutes. The residency and PHI posture above arrives unchanged.

Free to test · Prepaid credits, no subscription · No data retained