Documentation

Brain Orchestra is a governed LLM gateway. One OpenAI-compatible endpoint that routes requests across every major provider — Anthropic, OpenAI, Mistral, Google, Amazon, Cohere, Moonshot, and xAI (the last two unrestricted tier only, opt-in by name) — with per-request audit logging, territorial data controls, and PII protection.

Quick start

Brain Orchestra implements the OpenAI /v1/chat/completions format. If you already use the OpenAI SDK or any OpenAI-compatible client, point it at https://api.brainorchestra.ai and you're done.

curl
curl https://api.brainorchestra.ai/v1/chat/completions \
  -H "Authorization: Bearer bo_live_YOUR_API_KEY" \
  -H "X-User-Id: alice@acme.com" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [
      { "role": "user", "content": "Hello from Brain Orchestra" }
    ],
    "stream": true
  }'

Two headers are required on every request: Authorization: Bearer <api_key> and X-User-Id: <email_or_id>. The user ID is how BO attributes requests to end-users in your audit trail — pass the email of whoever is making the request on your side.

SDK compatibility

Brain Orchestra is OpenAI API-compatible, so the official OpenAI Python and Node.js SDKs work as true drop-in replacements. Point base_url at BO, pass the BO API key + X-User-Id (or X-Actor-Token in strict mode), and existing OpenAI-SDK code runs unchanged.

Both stream modes are supported. Streaming is the right choice for interactive UIs; non-streaming is simpler for batch and one-shot processing. The standard client.chat.completions.create() call works for both:

python
from openai import OpenAI

client = OpenAI(
    api_key="bo_live_YOUR_API_KEY",
    base_url="https://api.brainorchestra.ai/v1",
    default_headers={"X-User-Id": "alice@acme.com"},
)

# Streaming (interactive UIs)
stream = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Summarize this email: ..."}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)
python
# Non-streaming (batch / one-shot)
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Summarize this email: ..."}],
    stream=False,  # default; works as a drop-in
)
print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")

BO-specific fields via extra_body

The OpenAI SDKs only know about the OpenAI field set. Pass BO-specific fields like routing_preferences and pii_policy through extra_body — the wire request is byte-identical to a curl call with the same body.

python
# BO-specific fields (routing_preferences, pii_policy) via extra_body
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "..."}],
    stream=False,
    extra_body={
        "routing_preferences": {
            "compliance": "eu_cloud",
        },
        "pii_policy": {
            "action": "pseudonymize",
            "sensitivity": "medium",
            "rehydrate_response": True,
        },
    },
)

Error handling

Provider errors return real HTTP 4xx/5xx with a structured JSON body, so the SDK's built-in exceptions work as written. Pre-stream errors (auth, access denied, validation) come back as proper HTTP status codes — your withFallback, retry decorators, and circuit-breakers behave correctly. Mid-stream errors (failures after the first chunk lands) still appear as SSE error frames inside a 200 response — protocol-level constraint once the response is committed. Your SSE parser should check for an error key on any received chunk.

python
from openai import APIError, RateLimitError, APIStatusError
import time

try:
    response = client.chat.completions.create(...)
except RateLimitError as e:
    # 429 — body carries retry_after_seconds + retry_after_source
    body = e.response.json()
    wait = body["error"].get("retry_after_seconds", 5)
    time.sleep(wait)
    # ...retry
except APIStatusError as e:
    # 4xx — caller-fix needed. Hint is in the body.
    body = e.response.json()
    code = body["error"]["code"]
    if code == "bedrock_access_denied":
        print(body["error"]["hint"])
        # "The requested model is not enabled in your AWS Bedrock account..."
except APIError as e:
    # 5xx — transient. Retry with backoff.
    raise

Node.js

Same contract — configure with baseURL + defaultHeaders:

javascript
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "bo_live_YOUR_API_KEY",
  baseURL: "https://api.brainorchestra.ai/v1",
  defaultHeaders: { "X-User-Id": "alice@acme.com" },
});

const response = await client.chat.completions.create({
  model: "auto",
  messages: [{ role: "user", content: "Hello" }],
  stream: false,
});
console.log(response.choices[0].message.content);

What's not supported

Anthropic SDK

The Anthropic SDK targets api.anthropic.com directly and uses the native messages shape. BO does not implement that surface — only OpenAI-compatible. Use the OpenAI SDK pointed at BO for Claude models too (model: "claude-sonnet", model: "claude-opus-4-7", etc.); model selection routes through BO's catalog.

Capturing the request ID

Every response includes an X-Request-Id header (format: bo_req_<hex>). Capture it client-side and paste into the audit dashboard's "Look up by Request ID" search to cross-reference any individual request — including failure cases.

Model selection

The model field accepts three kinds of values:

Territorial tiers

Every project has a territorial tier that controls which models (and which physical regions) are allowed to serve your requests. Territorial tier is enforced — customer-asserted compliance with a full audit trail.

TierWhat it meansExample models
eu_swedenStrictest. Data physically stays in Stockholm (eu-north-1). No cross-region inference.mistral-devstral-sweden, nova-lite-sweden, titan-embed-v2-sweden
eu_strictEU-owned vendors only. Mistral family — no US providers at all.mistral-small, mistral-medium, mistral-large, mistral-codestral, magistral-small, magistral-medium
eu_cloudEU data residency via cross-region inference. Data stays in the EU but may route between Frankfurt / Ireland / Paris / Stockholm.Claude 4.x via Bedrock Frankfurt, Mistral, Cohere Embed
unrestrictedAll providers globally. Default for new projects.Every model in the catalog

You can tighten the tier per request (never loosen it) via routing_preferences.compliance:

curl
curl https://api.brainorchestra.ai/v1/chat/completions \
  -H "Authorization: Bearer bo_live_YOUR_API_KEY" \
  -H "X-User-Id: alice@acme.com" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [...],
    "routing_preferences": {
      "compliance": "eu_strict"
    }
  }'

If the explicit modeldoesn't qualify for the requested tier, BO rejects the request with 400 model_not_compliant and returns the list of compliant alternatives so you can retry. This is a hard constraint — we will never silently route a request to a provider that violates the tier you asked for.

Document input (PDF)

Send PDFs directly through /v1/chat/completionson any compliance tier. BO accepts three content-block shapes and normalizes them at entry: Anthropic's type: "document", OpenAI's type: "input_file", and the generic type: "file" block. Use whichever your SDK emits.

Per-tier routing: on unrestricted, eu_cloud, and eu_sweden, the PDF is forwarded to the model natively (Anthropic document blocks for unrestricted; Bedrock Converse document blocks for the Bedrock-backed tiers) so the model sees the PDF with layout intact. On eu_strict, BO transparently runs Mistral's OCR endpoint first, splices the extracted markdown into your user message, then continues with chat completion. OCR cost (€0.00171 per page on Mistral OCR 3) is folded into the request's total cost. Your API contract is unchanged — one request in, one response out (streaming or non-streaming, your choice).

Caveat for eu_strict: OCR produces markdown text, not vision. Layout-sensitive content — scanned images, handwriting, tables embedded as images — may degrade versus the other tiers which see the PDF directly. If maximum PDF fidelity matters and EU-Cloud residency is sufficient, choose eu_cloud.

Sending a document to a model that doesn't accept documents (GPT, Gemini, Moonshot, embeddings) now returns 400 model_does_not_support_documents with a list of doc-capable alternatives for your tier. The previous behavior — silently accepting the document and dropping it in the adapter — is fixed.

PII protection

When enabled, Brain Orchestra routes prompts through a Presidio-based PII gateway before any provider call. Detected entities are pseudonymized (or redacted) server-side; the provider sees only the scrubbed version. Token mappings live in a session-scoped vault and are purged after the request completes.

Six languages supported: English, French, German, Italian, Swedish, Spanish. Custom recognizers for Swedish personnummer and IBAN.

curl
curl https://api.brainorchestra.ai/v1/chat/completions \
  -H "Authorization: Bearer bo_live_YOUR_API_KEY" \
  -H "X-User-Id: alice@acme.com" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet",
    "messages": [
      {
        "role": "user",
        "content": "My name is Karl Karlsson. What's 2+2?"
      }
    ],
    "pii_policy": {
      "enabled": true,
      "sensitivity": "medium",
      "action": "pseudonymize"
    }
  }'

The audit log records pii_policy_received, pii_processed, pii_entities_detected, and pii_redactions_applied on every request that carries a pii_policy field — so you can prove the feature ran by querying /v1/data/audit.

Audit trail

Every request produces a durable audit record — success, failure, rate-limit rejection, PII fail-closed, compliance mismatch, and everything in between. Audit records are written via a transactional outbox pattern and advance through a documented state machine, so even mid-stream crashes can't leave a request stuck in an invisible state.

Query the audit log programmatically via GET /v1/data/audit, filter by actor_id, model, or status, and get back the full metadata: provider, resolved model, effective tier, cost, tokens, data region, and pricing provenance (what price was in effect at request time).

For AI agents (Claude Desktop, Cursor, Cline) and backend code that need to query logs, traces, billing, and the model catalog, see the agent integration guide — covers the MCP server setup, every available tool with example agent prompts, and the equivalent REST endpoints.

Advanced features

Pricing

Brain Orchestra uses a prepaid balance model with a 5% platform fee on token usage through managed keys. Top up your balance via the dashboard (EUR / USD / SEK); each request atomically deducts the provider cost plus 5% from your balance, with full per-request transparency in the audit log. Monthly reconciliation against the actual provider invoice.

Bring your own provider API keys (BYOK) and the 5% fee is waived entirely — BO acts purely as the governance + audit layer. You can mix BYOK and BO-managed keys per project, and every request is tagged with its key source so billing is unambiguous.

Plans:

Subscription tiers exist alongside the per-token fee, not instead of it — Pro unlocks higher request quotas and the EU-sovereign tiers, but the 5% on managed-key usage applies to every plan. Enterprise contracts can negotiate the convenience fee.

Ready to try it?

Sign up for a free Brain Orchestra account and get an API key in under two minutes. Start with the unrestricted tier and upgrade to EU-strict or EU-sovereign when you need it.

Create a free account
Brain Orchestra is operated by Xalerate AB, Stockholm. Infrastructure in the EU (Railway europe-west4).