> ## Documentation Index
> Fetch the complete documentation index at: https://opensre.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# LLM Providers

> Supported LLM APIs and CLIs, environment variables, and how to switch between them.

OpenSRE is provider-agnostic: bring your own model. Selection is controlled by the `LLM_PROVIDER` environment variable, with per-provider API key and model overrides. Defaults are tracked in [`config/config.py`](https://github.com/Tracer-Cloud/opensre/blob/main/config/config.py) and routing lives in [`core/runtime/llm/llm_client.py`](https://github.com/Tracer-Cloud/opensre/blob/main/core/runtime/llm/llm_client.py).

## Quick reference

| Provider               | `LLM_PROVIDER`    | Auth                                     | Reasoning model default                                                     | Toolcall model default                        |
| ---------------------- | ----------------- | ---------------------------------------- | --------------------------------------------------------------------------- | --------------------------------------------- |
| Anthropic              | `anthropic`       | `ANTHROPIC_API_KEY`                      | `claude-sonnet-4-6`                                                         | `claude-haiku-4-5-20251001`                   |
| OpenAI                 | `openai`          | `OPENAI_API_KEY`                         | `gpt-5.4-mini`                                                              | `gpt-5.4-mini`                                |
| OpenRouter             | `openrouter`      | `OPENROUTER_API_KEY`                     | `openrouter/auto`                                                           | `openrouter/auto`                             |
| DeepSeek               | `deepseek`        | `DEEPSEEK_API_KEY`                       | `deepseek-v4-pro`                                                           | `deepseek-v4-flash`                           |
| Google Gemini          | `gemini`          | `GEMINI_API_KEY`                         | `gemini-3.1-pro-preview`                                                    | `gemini-3.1-flash-lite-preview`               |
| NVIDIA NIM             | `nvidia`          | `NVIDIA_API_KEY`                         | `meta/llama-3.1-405b-instruct`                                              | `meta/llama-3.1-8b-instruct`                  |
| MiniMax                | `minimax`         | `MINIMAX_API_KEY`                        | `MiniMax-M3`                                                                | `MiniMax-M2.7-highspeed`                      |
| Amazon Bedrock         | `bedrock`         | AWS IAM (`AWS_REGION`)                   | `us.anthropic.claude-sonnet-4-6`                                            | `us.anthropic.claude-haiku-4-5-20251001-v1:0` |
| Ollama (local)         | `ollama`          | None (local daemon)                      | `llama3.2`                                                                  | `llama3.2`                                    |
| OpenAI Codex CLI       | `codex`           | `codex login` (CLI)                      | Codex CLI default                                                           | Codex CLI default                             |
| Claude Code CLI        | `claude-code`     | `claude login` (CLI)                     | Claude Code CLI default                                                     | Claude Code CLI default                       |
| GitHub Copilot CLI     | `copilot`         | `copilot login` or `gh auth login` (CLI) | Copilot CLI default                                                         | Copilot CLI default                           |
| Google Antigravity CLI | `antigravity-cli` | `agy` (browser OAuth, OS keyring)        | Whatever the local `agy` config is set to (switch via `/models` inside agy) | same as reasoning model                       |
| Pi CLI (BYOK)          | `pi`              | provider API key env or `pi` → `/login`  | Pi configured model (`PI_MODEL` to override)                                | same as reasoning model                       |

OpenSRE distinguishes two model slots per provider:

* **Reasoning model** — full-capability model used for diagnosis, claim validation, and multi-step analysis.
* **Toolcall model** — lightweight, lower-cost model used for tool selection and routing.

## Selecting a provider

Set `LLM_PROVIDER` (default: `anthropic`) in your environment or `.env` file:

```bash theme={null}
export LLM_PROVIDER=openai
export OPENAI_API_KEY=sk-...
```

Or run the onboarding wizard, which writes the same values to `.env`:

```bash theme={null}
opensre onboard
```

In the interactive shell, `/model` shows curated quick-pick choices for common models. Providers
with fast-changing or account-gated catalogs (OpenAI, OpenRouter, Gemini, NVIDIA, Bedrock, local
CLIs, Ollama, and DeepSeek) also accept custom model IDs:

```bash theme={null}
/model set openai gpt-5.5
/model set openai gpt-5.5 --toolcall-model gpt-5.4-mini
```

Override the default model for a slot via env vars:

```bash theme={null}
export OPENAI_REASONING_MODEL=gpt-5.4-mini
export OPENAI_TOOLCALL_MODEL=gpt-5.4-mini
```

A shared `LLM_MAX_TOKENS` (default `4096`) controls the response token budget for every provider.

## API providers

### Anthropic

```bash theme={null}
export LLM_PROVIDER=anthropic
export ANTHROPIC_API_KEY=sk-ant-...
# Optional overrides:
export ANTHROPIC_REASONING_MODEL=claude-sonnet-4-6
export ANTHROPIC_TOOLCALL_MODEL=claude-haiku-4-5-20251001
```

The default. Uses the Anthropic Python SDK directly. Get an API key at [console.anthropic.com](https://console.anthropic.com/).

### OpenAI

```bash theme={null}
export LLM_PROVIDER=openai
export OPENAI_API_KEY=sk-...
# Optional overrides:
export OPENAI_REASONING_MODEL=gpt-5.4-mini
export OPENAI_TOOLCALL_MODEL=gpt-5.4-mini
```

Uses the OpenAI SDK. Reasoning models (`o1`, `o3`, `o4`, `gpt-5*`) automatically use `max_completion_tokens` instead of `max_tokens`.

### OpenRouter

```bash theme={null}
export LLM_PROVIDER=openrouter
export OPENROUTER_API_KEY=sk-or-...
# Optional override (single value applies to both slots if set):
export OPENROUTER_MODEL=openrouter/auto
# Or per-slot:
export OPENROUTER_REASONING_MODEL=anthropic/claude-sonnet-4-6
export OPENROUTER_TOOLCALL_MODEL=openai/gpt-4o-mini
```

OpenAI-compatible proxy — pick any model on [openrouter.ai/models](https://openrouter.ai/models). Base URL: `https://openrouter.ai/api/v1`.

### DeepSeek

```bash theme={null}
export LLM_PROVIDER=deepseek
export DEEPSEEK_API_KEY=sk-...
# Optional override (single value applies to all slots if set):
export DEEPSEEK_MODEL=deepseek-v4-pro
# Or per-slot:
export DEEPSEEK_REASONING_MODEL=deepseek-v4-pro
export DEEPSEEK_TOOLCALL_MODEL=deepseek-v4-flash
```

Uses DeepSeek's official OpenAI-compatible API endpoint at `https://api.deepseek.com`.

### Google Gemini

```bash theme={null}
export LLM_PROVIDER=gemini
export GEMINI_API_KEY=...
# Optional override:
export GEMINI_MODEL=gemini-3.1-pro-preview
# Or per-slot:
export GEMINI_REASONING_MODEL=gemini-3.1-pro-preview
export GEMINI_TOOLCALL_MODEL=gemini-3.1-flash-lite-preview
```

Uses Google's OpenAI-compatible endpoint at `https://generativelanguage.googleapis.com/v1beta/openai/`. Get an API key at [aistudio.google.com](https://aistudio.google.com/app/apikey).

### NVIDIA NIM

```bash theme={null}
export LLM_PROVIDER=nvidia
export NVIDIA_API_KEY=nvapi-...
# Optional override:
export NVIDIA_MODEL=meta/llama-3.1-405b-instruct
# Or per-slot:
export NVIDIA_REASONING_MODEL=meta/llama-3.1-405b-instruct
export NVIDIA_TOOLCALL_MODEL=meta/llama-3.1-8b-instruct
```

Uses NVIDIA's OpenAI-compatible API at `https://integrate.api.nvidia.com/v1`. Browse available models on [build.nvidia.com](https://build.nvidia.com/).

### MiniMax

```bash theme={null}
export LLM_PROVIDER=minimax
export MINIMAX_API_KEY=...
# Optional override (single value applies to both slots if set):
export MINIMAX_MODEL=MiniMax-M3
# Or per-slot:
export MINIMAX_REASONING_MODEL=MiniMax-M3
export MINIMAX_TOOLCALL_MODEL=MiniMax-M2.7-highspeed
```

OpenAI-compatible endpoint at `https://api.minimax.io/v1`. Temperature is fixed to `1.0` to match MiniMax recommendations.

### Amazon Bedrock

```bash theme={null}
export LLM_PROVIDER=bedrock
export AWS_REGION=us-east-1
# Optional overrides:
export BEDROCK_REASONING_MODEL=us.anthropic.claude-sonnet-4-6
export BEDROCK_TOOLCALL_MODEL=us.anthropic.claude-haiku-4-5-20251001-v1:0
```

No API key — auth uses the AWS credential chain (environment variables, shared credentials file, or IAM role). Your principal needs permission to invoke the model IDs you configure (for example Bedrock `InvokeModel` / Converse access scoped to those resources in IAM).

**Model routing:**

* **Anthropic Claude** on Bedrock (`anthropic.claude-*`, `us.anthropic.claude-*`, and foundation-model ARNs that contain `anthropic.claude`) use the existing **AnthropicBedrock** SDK path.
* **Other Bedrock foundation models** (for example Mistral, Meta Llama, Amazon Titan IDs you enable in your account) use the **Bedrock Converse** API via `boto3`, so you can set `BEDROCK_REASONING_MODEL` to a non-Claude model ID when your use case requires it.
* **Application inference profile** ARNs (`…:application-inference-profile/…`) do not encode the vendor in the ID; those are always sent through **Converse**, which works for any backing model in the profile.

Defaults in `config/config.py` are US cross-region inference profile IDs for Anthropic Claude; override with IDs or ARNs that are **inference-access enabled** in your account and region.

### Ollama (local)

```bash theme={null}
export LLM_PROVIDER=ollama
# Optional overrides:
export OLLAMA_HOST=http://localhost:11434
export OLLAMA_MODEL=llama3.2
```

Run any local model exposed by an [Ollama](https://ollama.com/) daemon. No API key required — OpenSRE talks to Ollama's OpenAI-compatible endpoint at `${OLLAMA_HOST}/v1`.

## CLI providers (subprocess)

CLI-backed providers shell out to a vendor CLI instead of an HTTP API. They authenticate via the vendor's own login command; OpenSRE detects the binary on `PATH` (or via an explicit env var) and reuses the existing session.

**Investigation timeouts:** Each ReAct turn runs one full CLI subprocess with the system prompt, tool schemas, and conversation history. The shared default subprocess budget is **300 seconds** (Python adds a small buffer). Override per provider when needed, for example `GEMINI_CLI_TIMEOUT_SECONDS`, `CLAUDE_CODE_TIMEOUT_SECONDS`, or `ANTIGRAVITY_CLI_TIMEOUT_SECONDS` (clamped 30–600 where the adapter supports it).

### OpenAI Codex

```bash theme={null}
export LLM_PROVIDER=codex
# Authenticate the Codex CLI separately:
codex login
# Optional overrides (all blank-by-default):
export CODEX_MODEL=
export CODEX_BIN=
```

Requires the [OpenAI Codex CLI](https://github.com/openai/codex). If `CODEX_MODEL` is unset, OpenSRE omits `-m` so `codex exec` uses the CLI's currently configured model. If `CODEX_BIN` is unset, the binary is resolved via `PATH` and known install locations.

### Claude Code

```bash theme={null}
export LLM_PROVIDER=claude-code
# Authenticate the Claude Code CLI separately:
claude login
# Optional overrides (all blank-by-default):
export CLAUDE_CODE_MODEL=
export CLAUDE_CODE_BIN=
```

Requires the [Claude Code CLI](https://github.com/anthropics/claude-code) (`npm i -g @anthropic-ai/claude-code`). If `CLAUDE_CODE_MODEL` is unset, OpenSRE omits the `--model` flag and the CLI uses its configured default. If `CLAUDE_CODE_BIN` is unset, the binary is resolved via `PATH` and known install locations.

### GitHub Copilot

```bash theme={null}
export LLM_PROVIDER=copilot
# Authenticate the Copilot CLI separately. Either flow works — the adapter
# detects both. The interactive `/login` slash command inside `copilot` writes
# to the platform credential store; `gh auth login` is an equivalent path that
# Copilot CLI delegates to automatically.
copilot login          # OAuth device flow; preferred CLI-first onboarding
# or:
gh auth login          # logs you into the gh CLI; Copilot will use that token
# Optional overrides (all blank-by-default):
export COPILOT_MODEL=
export COPILOT_BIN=
# Optional auth bypass for automation (only used when no CLI login is detected):
# export COPILOT_GITHUB_TOKEN=
# export GH_TOKEN=
# export GITHUB_TOKEN=
```

Requires the [GitHub Copilot CLI](https://docs.github.com/copilot/how-tos/use-copilot-agents/use-copilot-cli) (`npm i -g @github/copilot`). Login uses the interactive `/login` slash command or `copilot login`. OpenSRE detects auth in this order: (1) `COPILOT_GITHUB_TOKEN` / `GH_TOKEN` / `GITHUB_TOKEN` env, (2) [`gh auth status`](https://docs.github.com/en/copilot/how-tos/copilot-cli/set-up-copilot-cli/authenticate-copilot-cli#authenticating-with-github-cli) when `gh` is on `PATH` (including `✓ Logged in to github.com account …`, `- Active account: true`, or a supported `- Token:` prefix: `gho_`, `github_pat_`, `ghu_` per Copilot docs — not `ghp_`), with `gh auth status --hostname …` when `COPILOT_GH_HOST` or `GH_HOST` targets a non-`github.com` host. It does **not** read plaintext `$COPILOT_HOME/config.json` (keychain-backed installs may omit it; mis-parsing arbitrary JSON risks false positives). If nothing matches, detection reports `logged_in=None` and the runner verifies at invoke time. If `COPILOT_MODEL` is unset, OpenSRE omits `--model`. Invocations run as `copilot -p PROMPT --no-color --no-ask-user --silent` so they never block on user input. **BYOK / `COPILOT_OFFLINE`:** GitHub auth may be unnecessary; a `None` probe can still be fine if Copilot is configured for offline or external providers only.

### Google Antigravity CLI

```bash theme={null}
export LLM_PROVIDER=antigravity-cli
# Authenticate the Antigravity CLI separately (browser OAuth on first run):
agy                       # interactive launch triggers Google Sign-In; token cached by OS keyring
# Stay current — 1.0.0 had OAuth hangs (fixed in 1.0.1):
agy update
# Optional overrides (all blank-by-default):
export ANTIGRAVITY_CLI_BIN=
export ANTIGRAVITY_CLI_TIMEOUT_SECONDS=300   # default 300; clamped 30–600; maps to `--print-timeout {N}s`
# Note: ANTIGRAVITY_CLI_MODEL is registered for forward-compat but currently no-op
# (agy v1.0.2 does not expose --model in headless `-p` mode). Each invocation uses
# whatever model is persisted in agy's local config; switch it interactively with
# `/models` inside the `agy` REPL. The wizard's model picker is a forward-compat
# catalog: once Google ships `--model` in headless, picking a value here will start
# being forwarded to agy via a one-line change in the adapter.
```

Antigravity CLI (`agy`) is Google's successor to Gemini CLI. Install via `curl -fsSL https://antigravity.google/cli/install.sh | bash`, then run `agy install` to configure your shell `PATH`. The minimum tested version is **1.0.1** — older builds log a warning via the probe and direct you to `agy update`.

**Why two Google providers?** Google's [transition announcement](https://developers.googleblog.com/an-important-update-transitioning-gemini-cli-to-antigravity-cli/) states that **on 2026-06-18** Gemini CLI stops serving Pro/Ultra and free users. Paid Gemini Code Assist licences keep Gemini CLI indefinitely. OpenSRE keeps both `gemini-cli` (deprecated alias with a probe-time notice) and `antigravity-cli` so either group can run without surprises.

As a best-effort fallback, the probe treats explicit `GEMINI_API_KEY` / `GOOGLE_API_KEY` / `GOOGLE_APPLICATION_CREDENTIALS` env credentials as authenticated (mirroring the Gemini CLI adapter), so users migrating across the two CLIs can keep their existing env-var-based auth without re-running the browser flow.

Invocations run as `agy -p PROMPT --print-timeout {N}s`. The adapter never passes `--continue` / `--conversation` / `--sandbox` / `--dangerously-skip-permissions`, keeping every opensre call ephemeral.

### xAI Grok Build CLI

```bash theme={null}
export LLM_PROVIDER=grok-cli
# Authenticate the Grok Build CLI separately. Either path works:
grok login                 # OAuth sign-in with a SuperGrok / X Premium+ account
# ...or, for headless / CI runs, use an API key instead of a browser login:
export XAI_API_KEY=xai-...  # get one from the xAI console
# Optional overrides (all blank-by-default):
export GROK_CLI_MODEL=          # e.g. grok-build; unset → CLI configured default
export GROK_CLI_BIN=            # explicit path to the `grok` binary
export GROK_CLI_TIMEOUT_SECONDS=300   # default 300; clamped 30-600
```

Requires the [xAI Grok Build CLI](https://x.ai/cli) (binary: `grok`). Install with
`curl -fsSL https://x.ai/cli/install.sh | bash` (macOS/Linux) or
`irm https://x.ai/cli/install.ps1 | iex` (Windows). If `GROK_CLI_MODEL` is unset, OpenSRE
omits `-m` and the CLI uses its configured default. The wizard populates the model list live
from `grok models` at onboarding time so newly released models appear without an OpenSRE update.

Invocations run as `grok -p PROMPT --output-format plain`, so each opensre call is a single
non-interactive turn. The adapter deliberately omits `--always-approve`: OpenSRE drives its own
tools, so Grok is used purely as a text responder and never auto-executes shell commands or file edits.

**Auth detection:** auth is probed via `grok models` (\~0.5 s, no LLM call), which prints
"You are logged in" on success. `XAI_API_KEY` is treated as an authenticated fallback for
headless / CI runs even when the probe result is unclear. `XAI_API_KEY` is forwarded **only**
to the Grok subprocess (never via the shared CLI env allowlist), so it cannot leak into other
CLI adapters.

> **Not to be confused with `groq`.** The `grok-cli` provider is xAI's Grok Build CLI. The
> separate `groq` provider is the Groq HTTP API (a different company); the two are unrelated.

### Pi CLI

```bash theme={null}
export LLM_PROVIDER=pi
# Authenticate Pi separately. Either path works — the adapter detects both:
pi                       # then run /login for an OAuth subscription or to store a key
# …or export a provider API key Pi understands (BYOK), e.g. for Gemini:
export GEMINI_API_KEY=...

export PI_MODEL=google/gemini-2.5-flash-lite  # provider/model; unset → Pi configured default
export PI_BIN=                                # explicit path to the `pi` binary (optional)
```

Requires the [Pi CLI](https://pi.dev) (`npm i -g @earendil-works/pi-coding-agent`). Pi is
bring-your-own-key across \~30 providers, so `PI_MODEL` uses the `provider/model` form
(for example `google/gemini-2.5-flash-lite`, `anthropic/claude-haiku-4-5`, `openai/gpt-4o-mini`);
run `pi --list-models` for the full catalog. If `PI_MODEL` is unset, OpenSRE omits `--model`
and Pi uses its configured default. If `PI_BIN` is unset, the binary is resolved via `PATH`
and known install locations.

Invocations run as `pi -p PROMPT` (non-interactive print mode), so each OpenSRE call is a
single headless turn with no TTY.

**Auth detection:** Pi has no non-interactive auth-status command, so OpenSRE detects auth
from state: (1) a supported provider API key in the environment (`GEMINI_API_KEY`,
`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, …) → authenticated; (2) otherwise, credentials stored
in `~/.pi/agent/auth.json` (written by `pi`'s `/login`, covering OAuth subscriptions and
stored keys) → authenticated; (3) neither → not authenticated. Provider API keys are forwarded
**only** to the Pi subprocess, never via the shared CLI env allowlist, so they cannot leak into
other CLI adapters.

See [`integrations/llm_cli/AGENTS.md`](https://github.com/Tracer-Cloud/opensre/blob/main/integrations/llm_cli/AGENTS.md) for the adapter pattern used to add new CLI providers.

## Reasoning effort (interactive shell)

In the TTY REPL (`opensre` with no subcommand), `/effort` stores a **session** preference for how strongly reasoning models should think before answering. It applies only when `LLM_PROVIDER` is **`openai`** (HTTP API) or **`codex`** (Codex CLI); other providers ignore the setting and the shell notes that.

| Input                            | Sent to the model |
| -------------------------------- | ----------------- |
| `low`, `medium`, `high`, `xhigh` | same string       |
| `max`                            | `xhigh`           |

Run `/effort` alone to show the current choice (or `(default)` when unset) and the usage line. `/new` starts a fresh session but **keeps** `/effort` (and trust mode), consistent with other session prefs.

Outside the REPL, optional defaults use the environment variable:

```bash theme={null}
export OPENSRE_REASONING_EFFORT=high   # low | medium | high | xhigh
```

Session `/effort` overrides this for interactive runs. Implementation: [`config/llm_reasoning_effort.py`](https://github.com/Tracer-Cloud/opensre/blob/main/config/llm_reasoning_effort.py).

## Provider fallback and diagnostics

If the provider you set in `LLM_PROVIDER` is missing its API key, OpenSRE does **not** fail outright — it falls back to the next configured provider (by default it tries `openai`, then `anthropic`) so a partially configured machine still works. The trade-off is that calls can quietly go to a different provider than you intended, and you would otherwise only find out via a confusing error naming the *fallback* provider (for example "Anthropic credit balance too low" when you actually configured OpenAI).

To make this visible:

* **A warning is logged** the first time a fallback happens, naming the configured provider, the missing key, and the provider actually used:

  ```
  Configured LLM provider 'openai' is unusable (OPENAI_API_KEY is not set);
  falling back to 'anthropic'. Set OPENAI_API_KEY or change LLM_PROVIDER to use it.
  ```

* **`/status`** (in the interactive shell) shows the **resolved** provider and flags a fallback inline, instead of just echoing `LLM_PROVIDER`:

  ```
  provider   anthropic (fallback from 'openai': OPENAI_API_KEY not set)
  ```

* **Provider errors in the interactive shell** are prefixed with which provider served the request and whether it was a fallback, so the message is actionable:

  ```
  [LLM provider: anthropic — fell back from configured 'openai' (OPENAI_API_KEY not set)]
  Anthropic request rejected (HTTP 400): Your credit balance is too low ...
  ```

To remove a fallback, either set the missing key for your configured provider or change `LLM_PROVIDER` to a provider you have credentials for.

## Switching providers at runtime

OpenSRE caches LLM clients on first use. To switch providers within a single process (tests, benchmarks), call `reset_llm_singletons()` from `core.runtime.llm.llm_client` after updating the env vars; otherwise a fresh process picks up the new `LLM_PROVIDER` automatically.

## Where this lives in the code

* Provider literals and defaults: [`config/config.py`](https://github.com/Tracer-Cloud/opensre/blob/main/config/config.py) (`LLMProvider`, `LLMSettings`).
* Runtime routing: [`core/runtime/llm/llm_client.py`](https://github.com/Tracer-Cloud/opensre/blob/main/core/runtime/llm/llm_client.py) (`_create_llm_client`).
* API-backed provider guide: [`core/runtime/llm/AGENTS.md`](https://github.com/Tracer-Cloud/opensre/blob/main/core/runtime/llm/AGENTS.md).
* CLI-backed provider guide: [`integrations/llm_cli/AGENTS.md`](https://github.com/Tracer-Cloud/opensre/blob/main/integrations/llm_cli/AGENTS.md).
