Non-compaction auto-retry policy
This document describes the standard API-error retry path in AgentSession.
It explicitly excludes context-overflow recovery via auto-compaction. Overflow is handled by compaction logic and is documented separately in compaction.md.
Implementation files
../src/session/agent-session.ts../src/config/settings-schema.ts../src/modes/controllers/event-controller.ts../src/modes/rpc/rpc-mode.ts../src/modes/rpc/rpc-client.ts../src/modes/rpc/rpc-types.ts
Scope boundary vs compaction
Retry and compaction are checked from the same agent_end path, but they are intentionally separated:
agent_endinspects the last assistant message.#isRetryableError(...)runs first.- If retry is initiated, compaction checks are skipped for that turn.
- Context-overflow errors are hard-excluded from retry classification (
isContextOverflow(...)short-circuits retry). - Overflow therefore falls through to
#checkCompaction(...)instead of standard retry.
So: overload/rate/server/network-style failures use this retry policy; context-window overflow uses compaction recovery.
Retry classification
#isRetryableError(...) requires all of the following:
- assistant
stopReason === "error" errorMessageexists- message is not context overflow
errorMessagematches#isRetryableErrorMessage(...)
Current retryable pattern set (regex-based):
- overloaded
- rate limit / usage limit / too many requests
- HTTP-like server classes: 429, 500, 502, 503, 504
- service unavailable / server error / internal error
- connection error / fetch failed
retry delaywording
This is string-pattern classification, not typed provider error codes.
Retry lifecycle and state transitions
Session state used by retry:
#retryAttempt: number(0means idle)#retryPromise: Promise<void> | undefined(tracks in-progress retry lifecycle)#retryResolve: (() => void) | undefined(resolves#retryPromise)#retryAbortController: AbortController | undefined(cancels backoff sleep)
Flow (#handleRetryableError):
- Read
retrysettings group. - If
retry.enabled === false, stop immediately (false, no retry started). - Increment
#retryAttempt. - Create
#retryPromiseonce (first attempt in a chain). - If attempt exceeded
retry.maxRetries, emit final failure event and stop. - Compute delay:
retry.baseDelayMs * 2^(attempt-1). - For usage-limit errors, parse retry hints and call auth storage (
markUsageLimitReached(...)); if provider/model switch succeeds, force delay to0. - Emit
auto_retry_start. - Remove the trailing assistant error message from agent runtime state (kept in persisted session history).
- Sleep with abort support.
- On wake, schedule
agent.continue()viasetTimeout(..., 0).
What resets retry counters
#retryAttempt resets to 0 in these cases:
- first successful non-error, non-aborted assistant message after retries started (emits
auto_retry_end { success: true }) - retry cancellation during backoff sleep
- max retries exceeded path
#retryPromise resolves/clears when retry chain ends (success, cancellation, or max-exceeded), via #resolveRetry().
Backoff and max-attempt semantics
Settings:
retry.enabled(defaulttrue)retry.maxRetries(default3)retry.baseDelayMs(default2000)
Attempt numbering:
- attempt counter is incremented before max-check
- start events use current attempt (1-based)
- max-exceeded end event reports
attempt: this.#retryAttempt - 1(last attempted retry count)
Backoff sequence with default settings:
- attempt 1: 2000 ms
- attempt 2: 4000 ms
- attempt 3: 8000 ms
Delay override inputs are only used in the usage-limit handling path, and only to influence auth-storage model/account switching decision. In the main non-compaction retry path, backoff remains local exponential delay unless switching succeeds (delayMs = 0).
Abort mechanics
Explicit retry abort
abortRetry():
- aborts
#retryAbortController(if present) - resolves retry promise (
#resolveRetry()) so awaiters are unblocked
If abort hits while sleeping, catch path emits:
auto_retry_end { success: false, finalError: "Retry cancelled" }- resets attempt/controller
Global operation abort interaction
abort() calls abortRetry() before aborting the active agent stream. This guarantees retry backoff is cancelled when user issues a general abort.
TUI interaction
On auto_retry_start, EventController:
- swaps
Eschandler tosession.abortRetry() - renders loader text:
Retrying (attempt/maxAttempts) in Ns… (esc to cancel)
On auto_retry_end, it restores prior Esc handler and clears loader state.
Streaming and prompt completion behavior
prompt() ultimately waits on #waitForRetry() after agent.prompt(...) returns.
Effect:
- a prompt call does not fully resolve until any started retry chain finishes (success/failure/cancel)
- retry lifecycle is part of one logical prompt execution boundary
This prevents callers from treating a retrying turn as complete too early.
Controls: settings and RPC
Configuration knobs
Defined in settings schema under retry group:
retry.enabledretry.maxRetriesretry.baseDelayMs
Programmatic toggles in session:
setAutoRetryEnabled(enabled)writesretry.enabledautoRetryEnabledreadsretry.enabledisRetryingreports whether retry lifecycle promise is active
RPC controls
RPC command surface:
set_auto_retry→session.setAutoRetryEnabled(command.enabled)abort_retry→session.abortRetry()
Client helpers:
RpcClient.setAutoRetry(enabled)RpcClient.abortRetry()
Both commands return success responses; retry progress/failure details come from streamed session events, not command response payloads.
Event emission and failure surfacing
Session-level retry events:
auto_retry_start { attempt, maxAttempts, delayMs, errorMessage }auto_retry_end { success, attempt, finalError? }
Propagation:
- emitted through
AgentSession.subscribe(...) - forwarded to extension runner as extension events
- in RPC mode, forwarded directly as JSON event objects (
session.subscribe(event => output(event))) - in TUI, consumed by
EventControllerfor loader/error UI
Final failure surfacing:
- On max-exceeded or cancellation,
auto_retry_end.success === false - TUI shows:
Retry failed after N attempts: <finalError> - Extensions/hooks receive
auto_retry_endwith same fields - RPC consumers receive same event object on stdout stream
Permanent stop conditions
Retry stops and will not auto-continue when any of these occur:
retry.enabledis false- error is not retry-classified
- error is context overflow (delegated to compaction path)
- max retries exceeded
- user cancels retry (
abort_retryorEscduring retry loader) - global abort (
abort) cancels retry first
A new retry chain can still start later on a future retryable error after counters reset.
Operational caveats
- Classification is regex text matching; provider-specific structured errors are not used here.
- Retry strips the failing assistant error from runtime context before re-continue, but session history still keeps that error entry.
RpcSessionStatecurrently exposesautoCompactionEnabledbut not anautoRetryEnabledfield; RPC callers must track their own toggle state or query settings through other APIs.