Introduction
After installing Hermes Agent and verifying that the CLI responds correctly, the next step is crucial: choosing and configuring the AI model that powers the agent. The model directly determines response quality, tool usage capability, and the cost of every interaction. This article covers provider configuration, API key management, multi-provider routing, and best practices for a stable, performant setup.
The hermes model command: the central entry point
Hermes provides a single interactive command to manage all model-related configuration:
hermes model
This command opens an interactive menu that guides you through:
- Choosing a provider (Anthropic, OpenRouter, DeepSeek, Nous Portal, GitHub Copilot, etc.)
- Selecting a model from those available at the provider
- Configuring authentication (API key or OAuth)
- Setting up auxiliary models (vision, compression, web extraction)
You can switch providers at any time — there's no lock-in. Run hermes model again, select a different provider, and the configuration updates instantly.
Interactive menu details
The hermes model menu offers several entries:
- Choose a provider: selection from all supported providers
- Configure auxiliary models: configure secondary models (vision, web_extract, compression, etc.)
- View current configuration: summary of the active configuration
This interactive approach eliminates the need to dig through configuration files manually, though direct configuration remains possible for advanced users.
Supported providers
Hermes Agent supports a broad provider ecosystem. Here are the main ones, organized by access type.
OAuth-authenticated providers (no API key)
These providers use a browser-based OAuth flow, ideal if you don't want to manage API keys:
- Nous Portal: monthly subscription, zero configuration. Login via
hermes model. - OpenAI Codex: uses Codex models with your ChatGPT account. Device code authentication via
hermes model. - Anthropic (OAuth): requires a Claude Max plan + additional credits. Authentication routes through Claude Code.
- MiniMax (OAuth): access to MiniMax-M2.7 models without an API key. Browser login via
hermes model. - GitHub Copilot: uses your Copilot subscription to access GPT-5.x, Claude, Gemini, etc. OAuth via
hermes modelorCOPILOT_GITHUB_TOKEN.
API key providers
These providers require an API key from their respective consoles:
- Anthropic (
ANTHROPIC_API_KEY): direct access to Claude models. Get your key at console.anthropic.com. - OpenRouter (
OPENROUTER_API_KEY): the most versatile, routes to dozens of models (Claude, GPT, Gemini, DeepSeek, Qwen, etc.). Key at openrouter.ai/keys. - DeepSeek (
DEEPSEEK_API_KEY): direct access to DeepSeek models. Key at platform.deepseek.com. - Google AI Studio (
GOOGLE_API_KEYorGEMINI_API_KEY): native Gemini models. Key at aistudio.google.com. - Hugging Face (
HF_TOKEN): routes to 20+ open-source models via a unified endpoint. Token at huggingface.co/settings/tokens. - Alibaba Cloud / DashScope (
DASHSCOPE_API_KEY): Qwen models. - Kimi / Moonshot (
KIMI_API_KEY): Moonshot coding and chat models. - NVIDIA NIM (
NVIDIA_API_KEY): Nemotron models. - xAI (
XAI_API_KEY): Grok models.
Custom Endpoint (self-hosted)
For users running their own models via Ollama, vLLM, SGLang, or any OpenAI-compatible endpoint:
hermes model
# Select "Custom Endpoint"
# Enter the base URL and model name
Or via direct configuration with OPENAI_BASE_URL and OPENAI_API_KEY.
Configuration storage: .env vs config.yaml
Hermes cleanly separates secrets from regular configuration. Everything lives in the ~/.hermes/ directory:
~/.hermes/
├── config.yaml # Non-secret parameters (model, terminal, compression, etc.)
├── .env # API keys and tokens (secrets)
├── auth.json # OAuth credentials (Nous Portal, etc.)
└── ...
Fundamental rule
- Secrets (API keys, tokens, passwords) →
~/.hermes/.env - Everything else (model, terminal backend, limits, toolsets) →
~/.hermes/config.yaml
The hermes config set command
This is the recommended method for modifying any configuration. It automatically routes the value to the correct file:
# Set the model (goes into config.yaml)
hermes config set model anthropic/claude-sonnet-4
# Set an API key (goes into .env)
hermes config set OPENROUTER_API_KEY ***
# Change the terminal backend (goes into config.yaml)
hermes config set terminal.backend docker
# View the full configuration
hermes config
# Open config.yaml in your editor
hermes config edit
Resolution priority
Parameters are resolved in this order (decreasing priority):
- CLI arguments:
hermes chat --model anthropic/claude-sonnet-4(per-invocation override) ~/.hermes/config.yaml: main configuration file~/.hermes/.env: fallback for environment variables, required for secrets- Default values: built-in safe defaults when nothing is configured
Concrete configuration per provider
Anthropic (Claude)
# Via hermes model (recommended)
hermes model
# → Choose "Anthropic"
# → Enter your API key
# Or manually
hermes config set model anthropic/claude-sonnet-4
hermes config set ANTHROPIC_API_KEY sk-ant-xxxxx
OpenRouter
# Get a key at https://openrouter.ai/keys
hermes config set OPENROUTER_API_KEY ***
hermes config set model anthropic/claude-opus-4
# Popular models on OpenRouter:
# - anthropic/claude-opus-4 (advanced reasoning)
# - anthropic/claude-sonnet-4 (fast, good value)
# - openai/gpt-4o (versatile)
# - google/gemini-2.5-flash (fast, affordable)
DeepSeek
hermes config set DEEPSEEK_API_KEY sk-xxxxx
hermes config set model deepseek/deepseek-chat
GitHub Copilot
Two authentication options:
# Option 1: OAuth via hermes model (recommended)
hermes model
# → Choose "GitHub Copilot"
# → Browser authentication
# Option 2: Existing GitHub token
hermes config set COPILOT_GITHUB_TOKEN gho_xxxxx
hermes config set model github/copilot
Nous Portal
hermes model
# → Choose "Nous Portal"
# → OAuth login, no API key required
Custom Endpoint (Ollama, vLLM, etc.)
# Example with local Ollama
hermes config set OPENAI_BASE_URL http://localhost:11434/v1
hermes config set OPENAI_API_KEY ollama
hermes config set model llama3.1:70b
# Example with vLLM
hermes config set OPENAI_BASE_URL http://localhost:8000/v1
hermes config set OPENAI_API_KEY empty
hermes config set model meta-llama/Llama-3.1-70B-Instruct
Minimum requirement: 64,000 context tokens
Hermes Agent requires a model with at least 64,000 context tokens. Models with smaller windows are rejected at startup.
Why this requirement? Hermes' multi-step workflows (tool-calling, file reading, command execution) consume context rapidly. A model with 32K tokens doesn't have enough working memory to maintain a productive conversation with tool calls.
Most hosted models (Claude, GPT, Gemini, Qwen, DeepSeek) far exceed this threshold. For local models, make sure to configure the context size:
# llama.cpp
--ctx-size 65536
# Ollama
-c 65536 # or via context variable
Switching models in one command
Switching between providers or models is instant:
# Via hermes model (interactive)
hermes model
# Via hermes config set (direct)
hermes config set model anthropic/claude-sonnet-4
# Via /model command in session
/model
# → Interactive selection without leaving the conversation
No restart required. The change takes effect at the next interaction.
Configuring auxiliary models
Hermes uses secondary models for specific tasks (vision, web extraction, context compression). By default, auxiliary tasks use your main model. You can configure them independently to optimize costs:
hermes model
# → Choose "Configure auxiliary models"
The interactive menu lets you configure:
- vision: model for image analysis
- web_extract: model for web page summarization
- session_search: model for searching past sessions
- title_generation: model for generating session titles
- compression: model for context compression
For example, to use a cheaper model for compression:
# In ~/.hermes/config.yaml
auxiliary:
compression:
provider: "openrouter"
model: "google/gemini-2.5-flash"
The universal pattern for each auxiliary model is always the same: provider + model + optionally base_url.
Routing and multi-provider fallback
Automatic fallback
Configure a backup model that automatically takes over if the primary provider encounters an error:
# In ~/.hermes/config.yaml
fallback_model:
provider: openrouter
model: anthropic/claude-sonnet-4
Hermes tries the primary model first, then switches to the fallback after transient errors (rate limits, timeouts, 5xx).
Provider routing
For advanced users who want fine-grained routing control:
# In ~/.hermes/config.yaml
provider_routing:
sort: "price" # "price", "throughput", or "latency"
only: ["anthropic", "openai"] # Limit to authorized providers
ignore: ["deepseek"] # Exclude providers
order: ["anthropic", "openrouter", "openai"] # Fallback order
require_parameters: true # Only use providers supporting all parameters
data_collection: "deny" # Exclude providers that store data
Credential rotation
If you have multiple API keys for the same provider, configure the rotation strategy:
credential_pool_strategies:
openrouter: round_robin # Fair distribution
anthropic: least_used # Always the least-used key
Available options: fill_first (default), round_robin, least_used, random.
Testing that a provider works
Before diving into complex tasks, verify that your configuration is functional:
Step 1: Diagnostic
hermes doctor
This command detects configuration issues, missing keys, and incompatibilities.
Step 2: Test conversation
Launch Hermes and send a simple, verifiable prompt:
hermes
Then test with:
List the files in the current directory and tell me what the main language of the project is.
Signs of success:
- The startup banner displays your model and provider
- Hermes responds without errors
- Tools work (terminal, file reading, etc.)
- The conversation continues normally over several turns
Step 3: Session resumption
hermes --continue # or hermes -c
Verify that the previous session is properly resumed.
Quick troubleshooting
If something isn't working, follow this sequence:
hermes doctor
hermes model # Re-check configuration
hermes setup # Re-run full setup
hermes --continue # Test resumption
VPS deployment
For production deployment (Telegram gateway, Discord, etc.), a dedicated VPS is recommended. A VPS with 2 vCPU and 4 GB RAM is sufficient to run Hermes with the gateway.
Hostinger offers performant VPS starting from a few euros per month, ideal for hosting Hermes Agent in always-on mode. Configure the terminal backend to Docker for isolation:
hermes config set terminal.backend docker
Essential environment variables
Here's a summary of the most common variables, to place in ~/.hermes/.env or set via hermes config set:
OPENROUTER_API_KEY: OpenRouter keyANTHROPIC_API_KEY: Anthropic keyDEEPSEEK_API_KEY: DeepSeek keyOPENAI_API_KEY: key for custom OpenAI-compatible endpointOPENAI_BASE_URL: base URL for custom endpointGOOGLE_API_KEY: Google AI Studio / Gemini keyCOPILOT_GITHUB_TOKEN: GitHub token for CopilotHF_TOKEN: Hugging Face token
Conclusion
Configuring models and providers in Hermes Agent is designed to be accessible via hermes model while remaining fully configurable manually for advanced users. The clear separation between .env (secrets) and config.yaml (parameters), support for 25+ providers, and fallback/routing mechanisms offer remarkable flexibility.
Key takeaways:
- Use
hermes modelfor interactive configuration,hermes config setfor quick adjustments - Ensure a minimum of 64K context tokens
- Always test with
hermes doctorand a simple conversation after each change - Configure a fallback for service continuity
- Optimize costs by configuring cheaper auxiliary models
In the next article, we'll explore Hermes Agent's advanced CLI features and built-in tools, building on the installation and introduction basics.