Model and Provider Configuration (`models.yml`)

This document describes how the coding-agent currently loads models, applies overrides, resolves credentials, and chooses models at runtime.

What controls model behavior

Primary implementation files:

src/config/model-registry.ts — loads built-in + custom models, provider overrides, runtime discovery, auth integration
src/config/model-resolver.ts — parses model patterns and selects initial/smol/slow models
src/config/settings-schema.ts — model-related settings (modelRoles, provider transport preferences)
src/session/auth-storage.ts — API key + OAuth resolution order
packages/ai/src/models.ts and packages/ai/src/types.ts — built-in providers/models and Model/compat types

Config file location and legacy behavior

Default config path:

~/.pisces/agent/models.yml

Legacy behavior still present:

If models.yml is missing and models.json exists at the same location, it is migrated to models.yml.
Explicit .json / .jsonc config paths are still supported when passed programmatically to ModelRegistry.

`models.yml` shape

yaml

providers:
  <provider-id>:
    # provider-level config

provider-id is the canonical provider key used across selection and auth lookup.

Provider-level fields

yaml

providers:
  my-provider:
    baseUrl: https://api.example.com/v1
    apiKey: MY_PROVIDER_API_KEY
    api: openai-completions
    headers:
      X-Team: platform
    authHeader: true
    auth: apiKey
    discovery:
      type: ollama
    modelOverrides:
      some-model-id:
        name: Renamed model
    models:
      - id: some-model-id
        name: Some Model
        api: openai-completions
        reasoning: false
        input: [text]
        cost:
          input: 0
          output: 0
          cacheRead: 0
          cacheWrite: 0
        contextWindow: 128000
        maxTokens: 16384
        headers:
          X-Model: value
        compat:
          supportsStore: true
          supportsDeveloperRole: true
          supportsReasoningEffort: true
          maxTokensField: max_completion_tokens
          openRouterRouting:
            only: [anthropic]
          vercelGatewayRouting:
            order: [anthropic, openai]
          extraBody:
            gateway: m1-01
            controller: mlx

Allowed provider/model `api` values

openai-completions
openai-responses
openai-codex-responses
azure-openai-responses
anthropic-messages
google-generative-ai
google-vertex

Allowed auth/discovery values

auth: apiKey (default) or none
discovery.type: ollama

Validation rules (current)

Full custom provider (`models` is non-empty)

Required:

baseUrl
apiKey unless auth: none
api at provider level or each model

Override-only provider (`models` missing or empty)

Must define at least one of:

baseUrl
modelOverrides
discovery

Discovery

discovery requires provider-level api.

Model value checks

id required
contextWindow and maxTokens must be positive if provided

Merge and override order

ModelRegistry pipeline (on refresh):

Load built-in providers/models from @oh-my-pi/pi-ai.
Load models.yml custom config.
Apply provider overrides (baseUrl, headers) to built-in models.
Apply modelOverrides (per provider + model id).
Merge custom models:
- same provider + id replaces existing
- otherwise append
Apply runtime-discovered models (currently Ollama and LM Studio), then re-apply model overrides.

Provider defaults vs per-model overrides:

Provider headers are baseline.
Model headers override provider header keys.
modelOverrides can override model metadata (name, reasoning, input, cost, contextWindow, maxTokens, headers, compat, contextPromotionTarget).
compat is deep-merged for nested routing blocks (openRouterRouting, vercelGatewayRouting, extraBody).

Runtime discovery integration

Implicit Ollama discovery

If ollama is not explicitly configured, registry adds an implicit discoverable provider:

provider: ollama
api: openai-completions
base URL: OLLAMA_BASE_URL or http://127.0.0.1:11434
auth mode: keyless (auth: none behavior)

Runtime discovery calls GET /api/tags on Ollama and synthesizes model entries with local defaults.

Implicit llama.cpp discovery

If llama.cpp is not explicitly configured, registry adds an implicit discoverable provider: Note: it's using the newer antropic messages api instead of the openai-competions.

provider: llama.cpp
api: openai-responses
base URL: LLAMA_CPP_BASE_URL or http://127.0.0.1:8080
auth mode: keyless (auth: none behavior)

Runtime discovery calls GET models on llama.cpp and synthesizes model entries with local defaults.

Implicit LM Studio discovery

If lm-studio is not explicitly configured, registry adds an implicit discoverable provider:

provider: lm-studio
api: openai-completions
base URL: LM_STUDIO_BASE_URL or http://127.0.0.1:1234/v1
auth mode: keyless (auth: none behavior)

Runtime discovery fetches models (GET /models) and synthesizes model entries with local defaults.

Explicit provider discovery

You can configure discovery yourself:

yaml

providers:
  ollama:
    baseUrl: http://127.0.0.1:11434
    api: openai-completions
    auth: none
    discovery:
      type: ollama
      
  llama.cpp:
    baseUrl: http://127.0.0.1:8080
    api: openai-responses
    auth: none
    discovery:
      type: llama.cpp

Extension provider registration

Extensions can register providers at runtime (pi.registerProvider(...)), including:

model replacement/append for a provider
custom stream handler registration for new API IDs
custom OAuth provider registration

Auth and API key resolution order

When requesting a key for a provider, effective order is:

Runtime override (CLI --api-key)
Stored API key credential in agent.db
Stored OAuth credential in agent.db (with refresh)
Environment variable mapping (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.)
ModelRegistry fallback resolver (provider apiKey from models.yml, env-name-or-literal semantics)

models.yml apiKey behavior:

Value is first treated as an environment variable name.
If no env var exists, the literal string is used as the token.

If authHeader: true and provider apiKey is set, models get:

Authorization: Bearer <resolved-key> header injected.

Keyless providers:

Providers marked auth: none are treated as available without credentials.
getApiKey* returns kNoAuth for them.

Model availability vs all models

getAll() returns the loaded model registry (built-in + merged custom + discovered).
getAvailable() filters to models that are keyless or have resolvable auth.

So a model can exist in registry but not be selectable until auth is available.

Runtime model resolution

CLI and pattern parsing

model-resolver.ts supports:

exact provider/modelId
exact model id (provider inferred)
fuzzy/substring matching
glob scope patterns in --models (e.g. openai/*, *sonnet*)
optional :thinkingLevel suffix (off|minimal|low|medium|high|xhigh)

--provider is legacy; --model is preferred.

Initial model selection priority

findInitialModel(...) uses this order:

explicit CLI provider+model
first scoped model (if not resuming)
saved default provider/model
known provider defaults (e.g. OpenAI/Anthropic/etc.) among available models
first available model

Role aliases and settings

Supported model roles:

default, smol, slow, plan, commit

Role aliases like pi/smol expand through settings.modelRoles. Each role value can also append a thinking selector such as :minimal, :low, :medium, or :high.

If a role points at another role, the target model still inherits normally and any explicit suffix on the referring role wins for that role-specific use.

Related settings:

modelRoles (record)
enabledModels (scoped pattern list)
providers.kimiApiFormat (openai or anthropic request format)
providers.openaiWebsockets (auto|off|on websocket preference for OpenAI Codex transport)

Context promotion (model-level fallback chains)

Context promotion is an overflow recovery mechanism for small-context variants (for example *-spark) that automatically promotes to a larger-context sibling when the API rejects a request with a context length error.

Trigger and order

When a turn fails with a context overflow error (e.g. context_length_exceeded), AgentSession attempts promotion before falling back to compaction:

If contextPromotion.enabled is true, resolve a promotion target (see below).
If a target is found, switch to it and retry the request — no compaction needed.
If no target is available, fall through to auto-compaction on the current model.

Target selection

Selection is model-driven, not role-driven:

currentModel.contextPromotionTarget (if configured)
smallest larger-context model on the same provider + API

Candidates are ignored unless credentials resolve (ModelRegistry.getApiKey(...)).

OpenAI Codex websocket handoff

If switching from/to openai-codex-responses, session provider state key openai-codex-responses is closed before model switch. This drops websocket transport state so the next turn starts clean on the promoted model.

Persistence behavior

Promotion uses temporary switching (setModelTemporary):

recorded as a temporary model_change in session history
does not rewrite saved role mapping

Configuring explicit fallback chains

Configure fallback directly in model metadata via contextPromotionTarget.

contextPromotionTarget accepts either:

provider/model-id (explicit)
model-id (resolved within current provider)

Example (models.yml) for Spark -> non-Spark on the same provider:

yaml

providers:
  openai-codex:
    modelOverrides:
      gpt-5.3-codex-spark:
        contextPromotionTarget: openai-codex/gpt-5.3-codex

The built-in model generator also assigns this automatically for *-spark models when a same-provider base model exists.

Compatibility and routing fields

models.yml supports this compat subset:

supportsStore
supportsDeveloperRole
supportsReasoningEffort
maxTokensField (max_completion_tokens or max_tokens)
openRouterRouting.only / openRouterRouting.order
vercelGatewayRouting.only / vercelGatewayRouting.order

These are consumed by the OpenAI-completions transport logic and combined with URL-based auto-detection.

Practical examples

Local OpenAI-compatible endpoint (no auth)

yaml

providers:
  local-openai:
    baseUrl: http://127.0.0.1:8000/v1
    auth: none
    api: openai-completions
    models:
      - id: Qwen/Qwen2.5-Coder-32B-Instruct
        name: Qwen 2.5 Coder 32B (local)

Hosted proxy with env-based key

yaml

providers:
  anthropic-proxy:
    baseUrl: https://proxy.example.com/anthropic
    apiKey: ANTHROPIC_PROXY_API_KEY
    api: anthropic-messages
    authHeader: true
    models:
      - id: claude-sonnet-4-20250514
        name: Claude Sonnet 4 (Proxy)
        reasoning: true
        input: [text, image]

Override built-in provider route + model metadata

yaml

providers:
  openrouter:
    baseUrl: https://my-proxy.example.com/v1
    headers:
      X-Team: platform
    modelOverrides:
      anthropic/claude-sonnet-4:
        name: Sonnet 4 (Corp)
        compat:
          openRouterRouting:
            only: [anthropic]

Legacy consumer caveat

Most model configuration now flows through models.yml via ModelRegistry.

One notable legacy path remains: web-search Anthropic auth resolution still reads ~/.pisces/agent/models.json directly in src/web/search/auth.ts.

If you rely on that specific path, keep JSON compatibility in mind until that module is migrated.

Failure mode

If models.yml fails schema or validation checks:

registry keeps operating with built-in models
error is exposed via ModelRegistry.getError() and surfaced in UI/notifications

Model and Provider Configuration (models.yml) ​

What controls model behavior ​

Config file location and legacy behavior ​

models.yml shape ​

Provider-level fields ​

Allowed provider/model api values ​

Allowed auth/discovery values ​

Validation rules (current) ​

Full custom provider (models is non-empty) ​

Override-only provider (models missing or empty) ​

Discovery ​

Model value checks ​

Merge and override order ​

Runtime discovery integration ​

Implicit Ollama discovery ​

Implicit llama.cpp discovery ​

Implicit LM Studio discovery ​

Explicit provider discovery ​

Extension provider registration ​

Auth and API key resolution order ​

Model availability vs all models ​

Runtime model resolution ​

CLI and pattern parsing ​

Initial model selection priority ​

Role aliases and settings ​

Context promotion (model-level fallback chains) ​

Trigger and order ​

Target selection ​

OpenAI Codex websocket handoff ​

Persistence behavior ​

Configuring explicit fallback chains ​

Compatibility and routing fields ​

Practical examples ​

Local OpenAI-compatible endpoint (no auth) ​

Hosted proxy with env-based key ​

Override built-in provider route + model metadata ​

Legacy consumer caveat ​

Failure mode ​

Model and Provider Configuration (`models.yml`)

What controls model behavior

Config file location and legacy behavior

`models.yml` shape

Provider-level fields

Allowed provider/model `api` values

Allowed auth/discovery values

Validation rules (current)

Full custom provider (`models` is non-empty)

Override-only provider (`models` missing or empty)

Discovery

Model value checks

Merge and override order

Runtime discovery integration

Implicit Ollama discovery

Implicit llama.cpp discovery

Implicit LM Studio discovery

Explicit provider discovery

Extension provider registration

Auth and API key resolution order

Model availability vs all models

Runtime model resolution

CLI and pattern parsing

Initial model selection priority

Role aliases and settings

Context promotion (model-level fallback chains)

Trigger and order

Target selection

OpenAI Codex websocket handoff

Persistence behavior

Configuring explicit fallback chains

Compatibility and routing fields

Practical examples

Local OpenAI-compatible endpoint (no auth)

Hosted proxy with env-based key

Override built-in provider route + model metadata

Legacy consumer caveat

Failure mode