Capability · Speech

Transcription and speech, residency-separate.

Speech-to-text and text-to-speech on the OpenAI audio shape — Swiss, EU and US only, each pinned to a same-jurisdiction provider with no cross-border failover.

Get started Read the docs

POST /v1/audio/transcriptions · /v1/audio/speech

What you get

Built for regulated workloads

Transcription uses the standard /v1/audio/transcriptions multipart shape (json, text, srt, vtt, verbose_json); speech uses /v1/audio/speech. Both are deliberately Swiss + EU + US only and residency-separate — a CH request resolves to a Swiss-jurisdiction host and never fails over across a border. Anything outside those three regions returns no_route by design.

STT

Whisper-class transcription

The whisper canonical maps to Voxtral (EU default), Whisper large-v3 (Scaleway, EU) and Azure Speech (CH / EU / US). Returns json, text, srt, vtt or verbose_json.

TTS

Neural text-to-speech

Voxtral (EU) and Azure neural voices (EU / US) synthesise MP3, WAV, OPUS, AAC, FLAC or PCM from the OpenAI speech shape.

Residency

No cross-border failover

STT/TTS are pinned within a jurisdiction. A Swiss call will not silently complete on an EU host — if the resident provider is unavailable, the call fails in-region rather than crossing a border.

PHI

PHI on verified hosts

PHI speech runs on Azure (CH / EU / US) and Scaleway (EU), all DPA/BAA-verified. CH text-to-speech has no Swiss PHI host today and resolves to no_route by design.

Availability & pricing

Where it runs, what it costs

Every route is region-resident and the PHI gate is enforced per call. Prices include the flat +10% gateway margin and mirror the live /v1/pricing rate card.

Region	Provider	Model	Tier	Price
EU	Mistral (Voxtral)	STT · whisper	General	$0.0033 / min
EU	Scaleway	STT · whisper large-v3	PHI	$0.0036 / min
CH	Azure Speech	STT	PHI	$0.0183 / min
US	Azure Speech	STT	PHI	$0.0183 / min
EU	Mistral (Voxtral)	TTS	General	$17.60 / 1M chars
EU / US	Azure Speech	TTS · neural	PHI	$16.50 / 1M chars
CH	—	TTS	No route	No Swiss host

STT billed per minute of audio; TTS per character. Gateway margin included. CH STT runs on Azure today (Infomaniak Whisper is staged, async-batch).

Try it

A real call, end to end

Transcribe a consult recording in the EU with a residency-pinned Whisper.

curl

curl https://phi-cloud.com/api/v1/audio/transcriptions \
  -H "Authorization: Bearer $PHI_API_KEY" \
  -H "X-Region: EU" \
  -H "X-PHI: true" \
  -F file=@consult.mp3 \
  -F model=whisper \
  -F response_format=verbose_json

response

x-phi-routed: scaleway-whisper-v3/EU/phi
x-phi-tier: phi
x-phi-usage: verified
{ "text": "Patient reports...", "duration": 184.2 }

Residency & coverage

CH speech-to-text → Azure Speech (Switzerland). CH text-to-speech → no_route (no Swiss PHI host yet).
EU → Voxtral / Whisper / Azure for STT; Voxtral / Azure for TTS.
US → Azure for both STT and TTS.
Every other region → no_route. Speech is intentionally CH/EU/US only.

FAQ

Common questions

There is no Swiss-resident, PHI-eligible TTS host today. Rather than route Swiss PHI audio across a border, the gateway refuses it. Swiss speech-to-text is available on Azure; TTS will light up when a compliant Swiss host is verified.

No. Speech is residency-separate with no cross-border failover — that is the point. A failed CH call stays a CH error; it never completes in another jurisdiction.

STT accepts standard audio uploads and returns json, text, srt, vtt or verbose_json. TTS emits MP3, WAV, OPUS, AAC, FLAC or PCM via the OpenAI /audio/speech shape.

Medical transcription Document OCR API reference

Ready when you are

Put speech (stt / tts) in production — without giving up your data.

Spin up a key in minutes. The residency and PHI posture above arrives unchanged.

Get started See pricing →

Free to test · Prepaid credits, no subscription · No data retained