Agent integration

A reference for AI agents (Claude Desktop, Cursor, Cline, custom MCP clients) and backend code that need to query Brain Orchestra for audit logs, traces, billing, and the model catalog.

Two equivalent surfaces

Same data, two transports. Pick one based on integration shape.

MCP (POST /v1/mcp) — for AI agents in MCP-aware hosts. Natural-language requests get translated to tool calls automatically. Auth: API key (Bearer).
REST (GET /v1/data/*) — for backend services, dashboards, analytics scripts, anything imperative. Direct HTTP. Auth: API key (Bearer).

Both are scoped to the project the API key belongs to — every query auto-filters to that project's rows. No cross-tenant leakage by construction.

Connecting an MCP client

Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json:

json

{
  "mcpServers": {
    "brain-orchestra": {
      "command": "npx",
      "args": ["-y", "mcp-remote", "https://api.brainorchestra.ai/v1/mcp"],
      "env": {
        "MCP_REMOTE_HEADERS": "{\"Authorization\":\"Bearer $BO_API_KEY\"}"
      }
    }
  }
}

Restart Claude Desktop. The MCP icon should show "brain-orchestra" with the tool list available.

Cursor / Cline / generic MCP client

Point at https://api.brainorchestra.ai/v1/mcp with Authorization: Bearer bo_live_... in headers. Both clients support the same mcp-remote proxy if needed.

Self-test

bash

curl -X POST https://api.brainorchestra.ai/v1/mcp \
  -H "Authorization: Bearer $BO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"tools/list","id":1}'

You should get back the tool list. 401 means either the key is wrong or the project hasn't accepted Terms (sign in to dashboard once).

Tools available to agents

search_audit_logs — query the audit trail

Every gateway request through BO produces a durable audit row. This tool searches them. REST equivalent: GET /v1/data/audit.

Parameters: actor_id, model, status (completed | failed | cancelled | timed_out | streaming), date_from / date_to (ISO 8601), limit (default 50, max 200), offset.

Returns: array of audit records with request_id, actor_id, model, provider, territorial_tier, status, latency_ms, tokens_in, tokens_out, cost_eur, created_at, completed_at. PII fields when applicable.

Agent prompts that translate to this tool:

"Show me last week's failed requests."
"Which model did alice@acme.com use most yesterday?"
"Find all requests over 10 seconds latency from the past 24 hours."
"How much did we spend on Claude Opus this month?"

list_traces, get_trace_detail — multi-step agentic execution

BO supports trace-based observability for agents that make N gateway calls under one logical operation. REST equivalents: GET /v1/data/traces and GET /v1/data/traces/:traceId/spans.

list_traces returns the trace summary (cost, duration, status, span count). get_trace_detail returns the full span tree showing the order, depth, model, cost, and latency of every constituent request.

Agent prompts:

"What did the customer-onboarding-agent do at 3pm yesterday?"
"Show me the most expensive trace from this week."
"Which traces failed and at what step?"

get_billing_summary — aggregate cost

REST equivalent: GET /v1/data/billing/summary.

Parameters: month (format YYYY-MM, defaults to current month).

Returns: total_cost_eur, by_model, by_actor, by_provider arrays.

Catalog — list_models vs get_catalog

Two tools, distinct use cases.

list_models (REST: GET /v1/data/models) returns models eligible for your project's tier. Lightweight, no pricing. Use for routing / capability discovery.

get_catalog (REST: GET /v1/data/catalog) returns the entire catalog regardless of project tier, with eligibleForProject: bool per row plus per-customer effective pricing in USD per 1M tokens. Use for model pickers and tier-upgrade UIs.

Per-entry shape from get_catalog:

json

{
  "name": "gpt-5",
  "providerModelId": "gpt-5",
  "provider": "openai",
  "size": "large",
  "qualityRank": 0.97,
  "modelType": "chat",
  "outputDimensions": null,
  "capabilities": ["text", "vision", "document"],
  "tiers": {
    "unrestricted": true,
    "eu_cloud": false,
    "eu_strict": false,
    "eu_sweden": false
  },
  "adapterByTier": { "unrestricted": "direct" },
  "pricing": {
    "inputPer1m": 1.25,
    "outputPer1m": 10.0,
    "currency": "usd",
    "source": "catalog_default_pending_api",
    "verifiedAt": null,
    "effectiveFrom": null
  },
  "eligibleForProject": true
}

Agent prompts:

"Which document-capable models are cheapest in eu_strict?"
"Show me everything I'd unlock if I upgraded to unrestricted."
"What are the new GPT-5 models priced at?"

get_model_health — real-time availability

REST equivalent: GET /v1/data/model-health. Returns per-model avg_ttft_ms, p95_ttft_ms, success_rate, composite score, is_availablefrom BO's synthetic-ping worker.

Trace correlation — grouping multi-call operations

When an agent turn or a job runs N gateway calls under one logical operation (a Noot conversation, a multi-step briefing, a tool-use loop), you can group them under a single trace so the dashboard can show them as one unit and so the gateway can enforce a shared budget + guardrails.

A trace carries an operation contract— territorial tier, budget cap, max depth, max requests, allowed models — set at creation and immutable for the trace's lifetime. That's by design: when an auditor asks "what was the budget on this run before it started?", the trace row answers it.

Two ways to create a trace

Option A — Explicit pre-create (one extra round trip, cleanest separation). Make a one-time call to POST /v1/traces at the start of your logical operation:

bash

curl -X POST https://api.brainorchestra.ai/v1/traces \
  -H "Authorization: Bearer bo_live_..." \
  -H "X-User-Id: alice@acme.com" \
  -H "Content-Type: application/json" \
  -d '{
    "operation_type": "noot_chat",
    "max_requests": 5,
    "max_depth": 5,
    "budget_eur": 0.50
  }'

# 201 Created
# { "trace_id": "bo_trace_0a1b2c...", "status": "active", ... }

Then on every downstream chat or embeddings call, attach the trace via the X-Trace-Id header:

bash

curl https://api.brainorchestra.ai/v1/chat/completions \
  -H "Authorization: Bearer bo_live_..." \
  -H "X-User-Id: alice@acme.com" \
  -H "X-Trace-Id: bo_trace_0a1b2c..." \
  -H "Content-Type: application/json" \
  -d '{ "model": "claude-haiku-4-5", "messages": [...], "stream": true }'

Option B — Inline create on first call (no extra round trip, fits agent loops better). On the first request of the operation, send X-Create-Trace: true (and no X-Trace-Id), plus an optional trace_contract in the request body:

bash

curl https://api.brainorchestra.ai/v1/chat/completions \
  -H "Authorization: Bearer bo_live_..." \
  -H "X-User-Id: alice@acme.com" \
  -H "X-Create-Trace: true" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-haiku-4-5",
    "messages": [...],
    "stream": true,
    "trace_contract": {
      "operation_type": "noot_chat",
      "max_requests": 5,
      "max_depth": 5
    }
  }'

# Response includes header: X-Trace-Id: bo_trace_...
# Read that header, then thread it on calls 2..N as X-Trace-Id

BO mints the trace as part of the request, runs the call as its first span, and returns the new trace id in the X-Trace-Id response header. Read that header from the response and thread it as X-Trace-Id: <that-id> on every subsequent call in the operation.

Common pitfalls

⚠️ Don't generate trace IDs client-side.

Sending X-Trace-Id: <your-fresh-uuid>against a trace that BO doesn't know about returns 404 trace_not_found. The trace must exist server-side first — either via POST /v1/traces or via X-Create-Trace: trueon the first call. There is no "first request with a new ID auto-creates" semantic.

No streaming asymmetry. /v1/chat/completions validates X-Trace-Id identically whether stream: trueor not. If a non-streaming call succeeded and a streaming one didn't, the difference was likely whether the header was sent, not the endpoint.
Trace tier is immutable. Once a trace is created against an eu_strict project, every span runs under eu_strict. A request inside the trace cannot loosen via routing_preferences.compliance.
Pick a max_requests that fits your loop. The default is 100. Tool-use loops with retry should size up; one-shot summaries can stay small. Exceeding the cap rejects the offending request with trace_max_requests_exceeded.
Combine with X-User-Id for the per-feature axis. Trace gives you the per-conversation/per-job grouping; the actor identity gives you the per-feature breakdown. Both slice the dashboard independently.

Authentication and scoping

Pass Authorization: Bearer bo_live_... on every MCP and REST call
No X-Employee-Id actor token required for these endpoints (project-scoped, not actor-scoped)
All data returned is filtered to the project the API key belongs to — no other project's rows can be reached
Customers must accept Terms / DPA before any read endpoint returns data — first call after signup may return 403 terms_required until you accept on the dashboard
/v1/data/auditcontent fields respect the project's retention policy; older requests return structural fields with content nullified

Worked examples

Backend script (Python, REST)

python

import os, requests

bo = "https://api.brainorchestra.ai"
headers = {"Authorization": f"Bearer {os.environ['BO_API_KEY']}"}

# Last 7 days of failed requests
r = requests.get(
    f"{bo}/v1/data/audit",
    headers=headers,
    params={
        "status": "failed",
        "date_from": "2026-04-19T00:00:00Z",
        "date_to": "2026-04-26T00:00:00Z",
        "limit": 200,
    },
)
failures = r.json()["audit_logs"]

# This month's billing
r = requests.get(f"{bo}/v1/data/billing/summary", headers=headers)
print(f"Spend: ${r.json()['total_cost_eur']}")

# Discover document-capable models with pricing
r = requests.get(f"{bo}/v1/data/catalog", headers=headers)
docs = [m for m in r.json()["models"]
        if m["eligibleForProject"] and "document" in m["capabilities"]]
for m in sorted(docs, key=lambda m: m["pricing"]["inputPer1m"]):
    print(f"{m['name']}: ${m['pricing']['inputPer1m']}/1M in")

Building a model picker (Node, REST)

javascript

const res = await fetch("https://api.brainorchestra.ai/v1/data/catalog", {
  headers: { Authorization: `Bearer ${process.env.BO_API_KEY}` },
});
const { models, project_tier } = await res.json();

const eligible = models.filter(m => m.eligibleForProject);
const upsell = models.filter(m => !m.eligibleForProject);

console.log(`${eligible.length} models available; ${upsell.length} more if you upgrade tier`);
upsell.forEach(m => {
  const tiers = Object.entries(m.tiers).filter(([, v]) => v).map(([k]) => k);
  console.log(`  ${m.name} — needs tier: ${tiers.join(' or ')}`);
});

MCP vs REST — when to choose which

Use MCP when your agent runs inside an MCP-aware host (Claude Desktop, Cursor, Cline) — the host handles auth, retries, schema validation, and translates natural-language to tool calls. The user (operator) gets to grant per-tool permissions and see what the agent is doing.

Use REST whenyour code is imperative — backend script, dashboard, CI job, scheduled report — or when you're not running inside an MCP host.

Both surfaces stay in lockstep — the catalog endpoint shape, for example, is built from a single src/api/catalog-shape.ts module so MCP and REST can never disagree.

Rate limits

Read endpoints share the project's per-minute rate limit (default 60 RPM, configurable per project in dashboard settings). They don't consume your prepaid balance — read-only data returns are free.