Rate limits
Per-plan request budgets, with X-RateLimit-* headers on every response.
Rate limits are enforced per API key, sliding-window. Limits scale with your plan; everyone shares the same headers.
Response headers
Every response - successful or rejected - carries:
| Header | Meaning |
|---|---|
X-RateLimit-Limit | Allowed requests in the current window. |
X-RateLimit-Remaining | Requests left before the window resets. |
X-RateLimit-Reset | Unix timestamp at which the window resets. |
Retry-After | Seconds until the next allowed call. Only set on 429. |
Reading those is enough to back off without polling.
What rejection looks like
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1747396800
Retry-After: 12
Content-Type: application/json
{"detail": {"code": "rate_limited", "message": "Per-key rate limit exceeded."}}Plan budgets
| Plan | /agent/ask per minute |
|---|---|
| Starter | 60 |
| Pro | 60 |
| Enterprise | 300 (negotiable per contract) |
Per-plan rate limits and monthly quotas are kept in sync with the Pricing page - that's the canonical reference.
Surge handling
Hitting the limit is a hint, not a failure. The recommended pattern:
Always cap retries - a 429 storm with unbounded recursion will exhaust your stack. The example below tops out at three attempts.
async function ask(body: AgentAskBody, attempt = 0): Promise<Response> {
const res = await fetch(URL, {
method: "POST",
headers,
body: JSON.stringify(body),
})
if (res.status === 429 && attempt < 3) {
const wait = Number(res.headers.get("retry-after") ?? "1")
await new Promise((r) => setTimeout(r, wait * 1000))
return ask(body, attempt + 1)
}
return res
}For long-running batch jobs, prefer pacing yourself with
X-RateLimit-Remaining so you never see 429 at all.
Capacity errors are different
503 Service Unavailable from /agent/ask means the agent is saturated
right now, not that you exceeded your quota. Retry shortly with
exponential backoff; no quota was charged.