Capability · Speech

Transcription and speech, residency-separate.

Speech-to-text and text-to-speech on the OpenAI audio shape — Swiss, EU and US only, each pinned to a same-jurisdiction provider with no cross-border failover.

POST /v1/audio/transcriptions · /v1/audio/speech

What you get

Built for regulated workloads

Transcription uses the standard /v1/audio/transcriptions multipart shape (json, text, srt, vtt, verbose_json); speech uses /v1/audio/speech. Both are deliberately Swiss + EU + US only and residency-separate — a CH request resolves to a Swiss-jurisdiction host and never fails over across a border. Anything outside those three regions returns no_route by design.

STT

Whisper-class transcription

The whisper canonical maps to Voxtral (EU default), Whisper large-v3 (Scaleway, EU) and Azure Speech (CH / EU / US). Returns json, text, srt, vtt or verbose_json.
TTS

Neural text-to-speech

Voxtral (EU) and Azure neural voices (EU / US) synthesise MP3, WAV, OPUS, AAC, FLAC or PCM from the OpenAI speech shape.
Residency

No cross-border failover

STT/TTS are pinned within a jurisdiction. A Swiss call will not silently complete on an EU host — if the resident provider is unavailable, the call fails in-region rather than crossing a border.
PHI

PHI on verified hosts

PHI speech runs on Azure (CH / EU / US) and Scaleway (EU), all DPA/BAA-verified. CH text-to-speech has no Swiss PHI host today and resolves to no_route by design.

Availability & pricing

Where it runs, what it costs

Every route is region-resident and the PHI gate is enforced per call. Prices include the flat +10% gateway margin and mirror the live /v1/pricing rate card.

RegionProviderModelTierPrice
EUMistral (Voxtral)STT · whisperGeneral$0.0033 / min
EUScalewaySTT · whisper large-v3PHI$0.0036 / min
CHAzure SpeechSTTPHI$0.0183 / min
USAzure SpeechSTTPHI$0.0183 / min
EUMistral (Voxtral)TTSGeneral$17.60 / 1M chars
EU / USAzure SpeechTTS · neuralPHI$16.50 / 1M chars
CHTTSNo routeNo Swiss host

STT billed per minute of audio; TTS per character. Gateway margin included. CH STT runs on Azure today (Infomaniak Whisper is staged, async-batch).

Try it

A real call, end to end

Transcribe a consult recording in the EU with a residency-pinned Whisper.

curl
curl https://phi-cloud.com/api/v1/audio/transcriptions \
  -H "Authorization: Bearer $PHI_API_KEY" \
  -H "X-Region: EU" \
  -H "X-PHI: true" \
  -F file=@consult.mp3 \
  -F model=whisper \
  -F response_format=verbose_json
response
x-phi-routed: scaleway-whisper-v3/EU/phi
x-phi-tier: phi
x-phi-usage: verified
{ "text": "Patient reports...", "duration": 184.2 }

Residency & coverage

  • CH speech-to-text → Azure Speech (Switzerland). CH text-to-speech → no_route (no Swiss PHI host yet).
  • EU → Voxtral / Whisper / Azure for STT; Voxtral / Azure for TTS.
  • US → Azure for both STT and TTS.
  • Every other region → no_route. Speech is intentionally CH/EU/US only.

FAQ

Common questions

There is no Swiss-resident, PHI-eligible TTS host today. Rather than route Swiss PHI audio across a border, the gateway refuses it. Swiss speech-to-text is available on Azure; TTS will light up when a compliant Swiss host is verified.
No. Speech is residency-separate with no cross-border failover — that is the point. A failed CH call stays a CH error; it never completes in another jurisdiction.
STT accepts standard audio uploads and returns json, text, srt, vtt or verbose_json. TTS emits MP3, WAV, OPUS, AAC, FLAC or PCM via the OpenAI /audio/speech shape.

Ready when you are

Put speech (stt / tts) in production — without giving up your data.

Spin up a key in minutes. The residency and PHI posture above arrives unchanged.

Free to test · Prepaid credits, no subscription · No data retained