Reseller & white-label
Run phi-cloud under your own product.
phi-cloud is your wholesale supplier. One key, every model, per-call cost on every response, and a public rate card — so you can meter your own users, add your margin, and keep phi-cloud invisible.
https://api.phi-cloud.com/v1See the economicsThe model
One phi-cloud tenant, many of your users.
Your users never see phi-cloud. Your site holds a single key, routes all traffic through it, reads the cost of each call, and bills your users in your own units with your own margin. phi-cloud stays stateless and never sees your end-users.
your users→ your siteYou own auth, your billing, your packages, and your margin.
your site→ phi-cloudOne phi_live_ key. Read x-phi-cost-micro per call for COGS.
phi-cloud→ modelsRegion-resident routing, PHI gating, fail-over, no payload logging.
Auth & billing posture
Two ways to settle with phi-cloud.
Authenticate with Authorization: Bearer phi_live_… and point your OpenAI SDK at the base URL. Pick the posture that fits your relationship — both still return per-call cost.
Prepaid creditsthird-party reselleroptionalYour phi-cloud account is funded with prepaid credits (auto-recharge optional); usage draws them down. You keep your margin over cost.
Unlimited / first-partysame-org connectoroptionalYour Stripe customer id is added to PHI_UNLIMITED_CUS — full production access incl. PHI, never metered. You keep 100% of what you charge users. For first-party products you own.
Cost is reported either way
The unlimited posture only skips phi-cloud's own metering. The x-phi-cost-micro header is still computed and returned, so you can bill your users exactly even on a free wholesale account.
Per-request controls
Two headers decide residency and routing.
Set these on every call — they control where data is processed and what can serve it.
X-RegionCH | EU | US | …requiredData residency and the routing jurisdiction. Embeddings, STT, TTS and OCR are residency-pinned — a missing region resolves to WORLD and can no_route.
X-PHItrue | falseoptionaltrue routes only to PHI-cleared hosts for that jurisdiction, else phi_blocked. Leave off for general traffic.
Cost & margin
Every response tells you what the call cost.
Two numbers drive resale: the per-call cost (reactive, on every response) and the rate card (proactive, for building packages).
# Every response carries the cost of the call — your COGS.curl -i https://api.phi-cloud.com/v1/chat/completions \ -H "Authorization: Bearer phi_live_…" \ -H "X-Region: EU" \ -H "Content-Type: application/json" \ -d '{"model":"claude-sonnet-4-6","messages":[{"role":"user","content":"hi"}]}'# ── response headers ──# x-phi-routed: claude-sonnet-4-6/EU/general# x-phi-cost-micro: 2759 # ← your cost (micro-USD, incl. +10% phi-cloud markup)# x-phi-usage: verified # ← trust the count?# … and "usage": { "prompt_tokens", "completion_tokens" } in the bodyMargin stacking
Raw provider price → ×1.10 (phi-cloud, baked into x-phi-cost-micro) → × your markup → end-user price. You only set the last step; the header is your cost basis.
from openai import OpenAIphi = OpenAI(base_url="https://api.phi-cloud.com/v1", api_key="phi_live_…")# Use .with_raw_response to read the cost header alongside the body.raw = phi.chat.completions.with_raw_response.create( model="auto", messages=[{"role": "user", "content": "Summarize this note…"}], extra_headers={"X-Region": "EU"},)resp = raw.parse()cost_micro = int(raw.headers["x-phi-cost-micro"]) # your cost basisusage = resp.usage # prompt/completion tokens# Bill YOUR user: cost_micro is already +10% over provider list.# Add your resale margin, convert to your own credits, and meter per user.your_price_micro = round(cost_micro * YOUR_MARGIN)Rate card endpoint
Pre-compute packages with /v1/pricing.
A public, region-filtered rate card of the marked-up rates — the same ones x-phi-cost-micro bills on. Pull it to price packages and stay in sync when prices change.
GET/v1/pricing — marked-up per-model rates (chat/embed per-token + STT/TTS/OCR/image/video)
# Public rate card — pre-compute your packages off this.curl -H "X-Region: EU" https://api.phi-cloud.com/v1/pricing# {# "object": "pricing", "region": "EU",# "markup": 1.1, "credits_per_usd": 300, "unit": "micro_usd",# "data": [# { "id": "claude-sonnet-4-6", "kind": "chat",# "prompt_micro_per_token": 3.3, "completion_micro_per_token": 16.5 },# …# ]# }# micro-USD-per-token == USD per MILLION tokens → $3.30/M in, $16.50/M outHandy unit
Rates are micro-USD per unit. For tokens, micro-per-token equals USD per million tokens — drop it straight into “$20 = N tokens of model X” math. Use credits_per_usd to convert to your own credit unit.
Capability discovery
Build your model picker off the catalog.
Render only what works for a user's region and use-case — don't hard-code a model list.
GET /v1/modelscatalogRegion/PHI-filtered. Each entry carries kind, phi_eligible, and supports_tools.
GET /v1/pricingratesThe marked-up rate card (above).
x-phi-toolssupported | unsupportedOn a call that sends tools, tells you whether the model that answered honored function calling.
Integration
Drop-in OpenAI, plus the bits that matter for resale.
Swap your base URL and keep your code. Three specifics carry weight downstream.
Toolsfunction callingForward tools / tool_choice / response_format unchanged. Works across Claude / GPT / Mistral / Gemini / Qwen (Gemma has no tool template).
Streamingstream: trueSSE passthrough. Cost is settled at stream end, so streamed responses omit x-phi-cost-micro — read tokens from the final SSE usage chunk, or use non-streamed for billing-critical calls.
EmbeddingsdimensionsX-Region: EU (or CH) + dimensions: N to match your vector schema. Same canonical model everywhere, so vectors stay cross-region comparable.
# Embeddings: pin a region (CH or EU) + match your vector schema.curl https://api.phi-cloud.com/v1/embeddings \ -H "Authorization: Bearer phi_live_…" \ -H "X-Region: EU" \ -H "Content-Type: application/json" \ -d '{"model":"qwen3-embedding-8b","input":"…","dimensions":1536}'# → 1536-d, L2-normalized. Without X-Region → region=WORLD → no_route.Operational rules
The rules that keep resale clean.
A short list of foot-guns worth wiring around once.
Vector spaces don't mix
If you migrate embeddings to phi-cloud, re-embed your whole corpus once. Qwen vectors aren't comparable to a prior provider's (e.g. Gemini) — don't mix them in one column.
Keys are server-side credentials
Never ship a key to a browser. Legacy keys can't be individually revoked (only a PHI_KEY_SECRET rotation kills them), so treat the unlimited key like a root secret.
No payload loggingprivacyphi-cloud persists no request/response content — good for your own users' privacy story. You are the controller, so set X-PHI / X-Region correctly.
/v1/pingfree diagnosticUnmetered key + routing check (reports billing_active). Use it for setup without spending.
Residency-pinned modalitiesno cross-borderSTT / TTS / OCR are CH/EU-only and don't fail over across borders — surface that to users who pick regions.