Rate limits

Rate limits are enforced per API key, sliding-window. Limits scale with your plan; everyone shares the same headers.

Response headers

Every response - successful or rejected - carries:

Header	Meaning
`X-RateLimit-Limit`	Allowed requests in the current window.
`X-RateLimit-Remaining`	Requests left before the window resets.
`X-RateLimit-Reset`	Unix timestamp at which the window resets.
`Retry-After`	Seconds until the next allowed call. Only set on `429`.

Reading those is enough to back off without polling.

What rejection looks like

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1747396800
Retry-After: 12
Content-Type: application/json

{"detail": {"code": "rate_limited", "message": "Per-key rate limit exceeded."}}

Plan budgets

Plan	`/agent/ask` per minute
Starter	60
Pro	60
Enterprise	300 (negotiable per contract)

Per-plan rate limits and monthly quotas are kept in sync with the Pricing page - that's the canonical reference.

Surge handling

Hitting the limit is a hint, not a failure. The recommended pattern:

Always cap retries - a 429 storm with unbounded recursion will exhaust your stack. The example below tops out at three attempts.

async function ask(body: AgentAskBody, attempt = 0): Promise<Response> {
  const res = await fetch(URL, {
    method: "POST",
    headers,
    body: JSON.stringify(body),
  })
  if (res.status === 429 && attempt < 3) {
    const wait = Number(res.headers.get("retry-after") ?? "1")
    await new Promise((r) => setTimeout(r, wait * 1000))
    return ask(body, attempt + 1)
  }
  return res
}

For long-running batch jobs, prefer pacing yourself with X-RateLimit-Remaining so you never see 429 at all.

Capacity errors are different

503 Service Unavailable from /agent/ask means the agent is saturated right now, not that you exceeded your quota. Retry shortly with exponential backoff; no quota was charged.