Documentation
Brain Orchestra is a governed LLM gateway. One OpenAI-compatible endpoint that routes requests across every major provider — Anthropic, OpenAI, Mistral, Google, Amazon, Cohere, Moonshot, and xAI (the last two unrestricted tier only, opt-in by name) — with per-request audit logging, territorial data controls, and PII protection.
Quick start
Brain Orchestra implements the OpenAI /v1/chat/completions format. If you already use the OpenAI SDK or any OpenAI-compatible client, point it at https://api.brainorchestra.ai and you're done.
curl https://api.brainorchestra.ai/v1/chat/completions \
-H "Authorization: Bearer bo_live_YOUR_API_KEY" \
-H "X-User-Id: alice@acme.com" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [
{ "role": "user", "content": "Hello from Brain Orchestra" }
],
"stream": true
}'Two headers are required on every request: Authorization: Bearer <api_key> and X-User-Id: <email_or_id>. The user ID is how BO attributes requests to end-users in your audit trail — pass the email of whoever is making the request on your side.
SDK compatibility
Brain Orchestra is OpenAI API-compatible, so the official OpenAI Python and Node.js SDKs work as true drop-in replacements. Point base_url at BO, pass the BO API key + X-User-Id (or X-Actor-Token in strict mode), and existing OpenAI-SDK code runs unchanged.
Both stream modes are supported. Streaming is the right choice for interactive UIs; non-streaming is simpler for batch and one-shot processing. The standard client.chat.completions.create() call works for both:
from openai import OpenAI
client = OpenAI(
api_key="bo_live_YOUR_API_KEY",
base_url="https://api.brainorchestra.ai/v1",
default_headers={"X-User-Id": "alice@acme.com"},
)
# Streaming (interactive UIs)
stream = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Summarize this email: ..."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
print(delta, end="", flush=True)# Non-streaming (batch / one-shot)
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Summarize this email: ..."}],
stream=False, # default; works as a drop-in
)
print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")BO-specific fields via extra_body
The OpenAI SDKs only know about the OpenAI field set. Pass BO-specific fields like routing_preferences and pii_policy through extra_body — the wire request is byte-identical to a curl call with the same body.
# BO-specific fields (routing_preferences, pii_policy) via extra_body
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "..."}],
stream=False,
extra_body={
"routing_preferences": {
"compliance": "eu_cloud",
},
"pii_policy": {
"action": "pseudonymize",
"sensitivity": "medium",
"rehydrate_response": True,
},
},
)Error handling
Provider errors return real HTTP 4xx/5xx with a structured JSON body, so the SDK's built-in exceptions work as written. Pre-stream errors (auth, access denied, validation) come back as proper HTTP status codes — your withFallback, retry decorators, and circuit-breakers behave correctly. Mid-stream errors (failures after the first chunk lands) still appear as SSE error frames inside a 200 response — protocol-level constraint once the response is committed. Your SSE parser should check for an error key on any received chunk.
from openai import APIError, RateLimitError, APIStatusError
import time
try:
response = client.chat.completions.create(...)
except RateLimitError as e:
# 429 — body carries retry_after_seconds + retry_after_source
body = e.response.json()
wait = body["error"].get("retry_after_seconds", 5)
time.sleep(wait)
# ...retry
except APIStatusError as e:
# 4xx — caller-fix needed. Hint is in the body.
body = e.response.json()
code = body["error"]["code"]
if code == "bedrock_access_denied":
print(body["error"]["hint"])
# "The requested model is not enabled in your AWS Bedrock account..."
except APIError as e:
# 5xx — transient. Retry with backoff.
raiseNode.js
Same contract — configure with baseURL + defaultHeaders:
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "bo_live_YOUR_API_KEY",
baseURL: "https://api.brainorchestra.ai/v1",
defaultHeaders: { "X-User-Id": "alice@acme.com" },
});
const response = await client.chat.completions.create({
model: "auto",
messages: [{ role: "user", content: "Hello" }],
stream: false,
});
console.log(response.choices[0].message.content);What's not supported
client.completions.create()— legacy completions API. Usechat.completions.create()instead.client.images.*,client.audio.*— multimodal endpoints (image generation, TTS, STT) are catalog-modeled but not yet routable through BO.client.files.*,client.batches.*,client.fine_tuning.*,client.moderations.create()— provider-specific features outside BO's scope. Use inline file content blocks instead offiles;pii_policycovers most moderation use cases.client.embeddings.create()— fully supported via/v1/embeddings.seed,response_format,logprobs— passed through to the upstream provider; behavior is provider-dependent.
Anthropic SDK
The Anthropic SDK targets api.anthropic.com directly and uses the native messages shape. BO does not implement that surface — only OpenAI-compatible. Use the OpenAI SDK pointed at BO for Claude models too (model: "claude-sonnet", model: "claude-opus-4-7", etc.); model selection routes through BO's catalog.
Capturing the request ID
Every response includes an X-Request-Id header (format: bo_req_<hex>). Capture it client-side and paste into the audit dashboard's "Look up by Request ID" search to cross-reference any individual request — including failure cases.
Model selection
The model field accepts three kinds of values:
- A specific BO model name —
claude-sonnet,gpt-5,gpt-5-mini,gpt-4.1,mistral-large,magistral-medium(Mistral reasoning, EU-owned, qualifies for every tier),gemini-flash,kimi-k2-6(Moonshot, unrestricted tier only),grok-4-3,grok-3-mini(xAI, unrestricted tier only), and so on. Routes directly to that model. auto— BO picks the highest-scoring available model for your project's territorial tier, based on real-time health data and cost optimization.- A compliance shorthand —
eu-strict,eu-cloud,eu-sweden. BO picks the best model qualifying for that tier. Optional size suffix:eu-strict:small.
Territorial tiers
Every project has a territorial tier that controls which models (and which physical regions) are allowed to serve your requests. Territorial tier is enforced — customer-asserted compliance with a full audit trail.
| Tier | What it means | Example models |
|---|---|---|
eu_sweden | Strictest. Data physically stays in Stockholm (eu-north-1). No cross-region inference. | mistral-devstral-sweden, nova-lite-sweden, titan-embed-v2-sweden |
eu_strict | EU-owned vendors only. Mistral family — no US providers at all. | mistral-small, mistral-medium, mistral-large, mistral-codestral, magistral-small, magistral-medium |
eu_cloud | EU data residency via cross-region inference. Data stays in the EU but may route between Frankfurt / Ireland / Paris / Stockholm. | Claude 4.x via Bedrock Frankfurt, Mistral, Cohere Embed |
unrestricted | All providers globally. Default for new projects. | Every model in the catalog |
You can tighten the tier per request (never loosen it) via routing_preferences.compliance:
curl https://api.brainorchestra.ai/v1/chat/completions \
-H "Authorization: Bearer bo_live_YOUR_API_KEY" \
-H "X-User-Id: alice@acme.com" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [...],
"routing_preferences": {
"compliance": "eu_strict"
}
}'If the explicit modeldoesn't qualify for the requested tier, BO rejects the request with 400 model_not_compliant and returns the list of compliant alternatives so you can retry. This is a hard constraint — we will never silently route a request to a provider that violates the tier you asked for.
Document input (PDF)
Send PDFs directly through /v1/chat/completionson any compliance tier. BO accepts three content-block shapes and normalizes them at entry: Anthropic's type: "document", OpenAI's type: "input_file", and the generic type: "file" block. Use whichever your SDK emits.
Per-tier routing: on unrestricted, eu_cloud, and eu_sweden, the PDF is forwarded to the model natively (Anthropic document blocks for unrestricted; Bedrock Converse document blocks for the Bedrock-backed tiers) so the model sees the PDF with layout intact. On eu_strict, BO transparently runs Mistral's OCR endpoint first, splices the extracted markdown into your user message, then continues with chat completion. OCR cost (€0.00171 per page on Mistral OCR 3) is folded into the request's total cost. Your API contract is unchanged — one request in, one response out (streaming or non-streaming, your choice).
Caveat for eu_strict: OCR produces markdown text, not vision. Layout-sensitive content — scanned images, handwriting, tables embedded as images — may degrade versus the other tiers which see the PDF directly. If maximum PDF fidelity matters and EU-Cloud residency is sufficient, choose eu_cloud.
Sending a document to a model that doesn't accept documents (GPT, Gemini, Moonshot, embeddings) now returns 400 model_does_not_support_documents with a list of doc-capable alternatives for your tier. The previous behavior — silently accepting the document and dropping it in the adapter — is fixed.
PII protection
When enabled, Brain Orchestra routes prompts through a Presidio-based PII gateway before any provider call. Detected entities are pseudonymized (or redacted) server-side; the provider sees only the scrubbed version. Token mappings live in a session-scoped vault and are purged after the request completes.
Six languages supported: English, French, German, Italian, Swedish, Spanish. Custom recognizers for Swedish personnummer and IBAN.
curl https://api.brainorchestra.ai/v1/chat/completions \
-H "Authorization: Bearer bo_live_YOUR_API_KEY" \
-H "X-User-Id: alice@acme.com" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet",
"messages": [
{
"role": "user",
"content": "My name is Karl Karlsson. What's 2+2?"
}
],
"pii_policy": {
"enabled": true,
"sensitivity": "medium",
"action": "pseudonymize"
}
}'The audit log records pii_policy_received, pii_processed, pii_entities_detected, and pii_redactions_applied on every request that carries a pii_policy field — so you can prove the feature ran by querying /v1/data/audit.
Audit trail
Every request produces a durable audit record — success, failure, rate-limit rejection, PII fail-closed, compliance mismatch, and everything in between. Audit records are written via a transactional outbox pattern and advance through a documented state machine, so even mid-stream crashes can't leave a request stuck in an invisible state.
Query the audit log programmatically via GET /v1/data/audit, filter by actor_id, model, or status, and get back the full metadata: provider, resolved model, effective tier, cost, tokens, data region, and pricing provenance (what price was in effect at request time).
For AI agents (Claude Desktop, Cursor, Cline) and backend code that need to query logs, traces, billing, and the model catalog, see the agent integration guide — covers the MCP server setup, every available tool with example agent prompts, and the equivalent REST endpoints.
Advanced features
- Embeddings —
POST /v1/embeddings. Same tier-aware routing as chat completions, with EU-owned alternatives (mistral-embed, mistral-codestral-embed) for strict sovereignty. - MCP server — Model Context Protocol JSON-RPC 2.0 at
POST /v1/mcp. Compatible with Claude Desktop, Cursor, and any MCP client. Tools:search_audit_logs,list_traces,get_model_health, and more. - Traces — governed operations for agentic AI. Create a trace with a budget, depth limit, and allowed model list; every LLM call within the trace is gated by the operation contract and logged as a span. See
POST /v1/traces. - Data API — seven read-only endpoints at
/v1/data/*for SIEM integration, custom dashboards, and automated reporting. API-key auth only (no actor token required). - Billing API —
GET /v1/billing/balance,/usage, and/reconciliationsfor programmatic access to spend data, scoped to the calling project.
Pricing
Brain Orchestra uses a prepaid balance model with a 5% platform fee on token usage through managed keys. Top up your balance via the dashboard (EUR / USD / SEK); each request atomically deducts the provider cost plus 5% from your balance, with full per-request transparency in the audit log. Monthly reconciliation against the actual provider invoice.
Bring your own provider API keys (BYOK) and the 5% fee is waived entirely — BO acts purely as the governance + audit layer. You can mix BYOK and BO-managed keys per project, and every request is tagged with its key source so billing is unambiguous.
Plans:
- Free — 1,000 requests/month, unrestricted tier only
- Trial — 30 days of Pro access (100,000 requests/month, all tiers). Card required at signup so the trial flips to Pro automatically when it ends.
- Pro — €79 / $79 / 899 kr per month, 100,000 requests/month, all tiers including EU-sovereign (eu_strict, eu_cloud, eu_sweden)
- Enterprise — custom pricing, unlimited requests, negotiable convenience fee, SLA, dedicated support
Subscription tiers exist alongside the per-token fee, not instead of it — Pro unlocks higher request quotas and the EU-sovereign tiers, but the 5% on managed-key usage applies to every plan. Enterprise contracts can negotiate the convenience fee.
Ready to try it?
Sign up for a free Brain Orchestra account and get an API key in under two minutes. Start with the unrestricted tier and upgrade to EU-strict or EU-sovereign when you need it.
Create a free account