📑 Table of contents

Configure models and providers in Hermes Agent

Hermes Agent 🟢 Beginner ⏱️ 11 min read 📅 2026-05-05

Introduction

After installing Hermes Agent and verifying that the CLI responds correctly, the next step is crucial: choosing and configuring the AI model that powers the agent. The model directly determines the quality of the responses, the ability to use tools, and the cost of each interaction. This article covers in detail the configuration of providers, the management of API keys, multi-provider routing, and best practices for a stable and performant setup.

The hermes model command: the central entry point

Hermes provides a single interactive command to manage all model-related configuration:

hermes model

This command opens an interactive menu that guides you through:

  • Choosing the provider (Anthropic, OpenRouter, DeepSeek, Nous Portal, GitHub Copilot, etc.)
  • Selecting the model among those available from the provider
  • Configuring authentication (API key or OAuth)
  • Configuring auxiliary models (vision, compression, web extraction)

You can switch providers at any time — there is no lock-in. Relaunch hermes model, select another provider, and the configuration is updated instantly.

Interactive menu in detail

The hermes model menu offers several entries:

  • Choose a provider: selection among all supported providers
  • Configure auxiliary models: configure secondary models (vision, web_extract, compression, etc.)
  • View current configuration: summary of the active configuration

This interactive approach eliminates the need to dig through configuration files manually, although direct configuration remains possible for advanced users.

Supported providers

Hermes Agent supports a broad ecosystem of providers. Here are the main ones, categorized by access type.

Providers with OAuth authentication (no API key)

These providers use an OAuth flow via the browser, ideal if you don't want to manage API keys:

  • Nous Portal : monthly subscription, zero configuration. Log in via hermes model.
  • OpenAI Codex : uses Codex models with your ChatGPT account. Device code authentication via hermes model.
  • Anthropic (OAuth) : requires a Claude Max plan + additional credits. Authentication routes through Claude Code.
  • MiniMax (OAuth) : access to MiniMax-M2.7 models without an API key. Browser login via hermes model.
  • GitHub Copilot : uses your Copilot subscription to access GPT-5.x, Claude, Gemini, etc. OAuth via hermes model or COPILOT_GITHUB_TOKEN token.

Providers with API key

These providers require an API key obtained from their respective consoles:

  • Anthropic (ANTHROPIC_API_KEY) : direct access to Claude models. Get your key on console.anthropic.com.
  • OpenRouter (OPENROUTER_API_KEY) : the most versatile, routes to dozens of models (Claude, GPT, Gemini, DeepSeek, Qwen, etc.). Key on openrouter.ai/keys.
  • DeepSeek (DEEPSEEK_API_KEY) : direct access to DeepSeek models. Get your key from the DeepSeek console.
  • Google AI Studio (GOOGLE_API_KEY or GEMINI_API_KEY) : native Gemini models. Key on aistudio.google.com.
  • Hugging Face (HF_TOKEN) : routes to 20+ open-source models via a unified endpoint. Get your token from your Hugging Face account settings.
  • Alibaba Cloud / DashScope (DASHSCOPE_API_KEY) : Qwen models.
  • Kimi / Moonshot (KIMI_API_KEY) : Moonshot coding and chat models.
  • NVIDIA NIM (NVIDIA_API_KEY) : Nemotron models.
  • xAI (XAI_API_KEY) : Grok models.

Custom Endpoint (self-hosted)

For users running their own models via Ollama, vLLM, SGLang, or any OpenAI-compatible endpoint, select "Custom Endpoint" in the hermes model menu, then provide the base URL and model name. You can also configure this manually with the OPENAI_BASE_URL and OPENAI_API_KEY variables.

Storing configurations: .env vs config.yaml

Hermes cleanly separates secrets from normal configuration. Everything is located in the ~/.hermes/ directory, which notably contains config.yaml for non-secret settings (model, terminal, compression), .env for API keys and tokens, as well as auth.json for OAuth credentials (Nous Portal, etc.).

Fundamental rule

  • Secrets (API keys, tokens, passwords) → ~/.hermes/.env
  • Everything else (model, terminal backend, limits, toolsets) → ~/.hermes/config.yaml

The hermes config set command

This is the recommended method for modifying any configuration. It automatically routes the value to the correct file:

hermes config set model anthropic/claude-sonnet-4
hermes config set OPENROUTER_API_KEY sk-or-v1-xxxxxxxx

This command automatically distinguishes secrets (which go into .env) from standard parameters (which go into config.yaml). Use hermes config to see the complete configuration, and hermes config edit to directly open the file in your editor.

Resolution priority

Parameters are resolved in this order (decreasing priority):

  1. CLI arguments: hermes chat --model anthropic/claude-sonnet-4 (override by invocation)
  2. ~/.hermes/config.yaml: main configuration file
  3. ~/.hermes/.env: fallback for environment variables, required for secrets
  4. Default values: safe built-in defaults when nothing is configured

Concrete configuration by provider

Anthropic (Claude)

Run hermes model, choose "Anthropic", then enter your API key. Alternatively, from the command line:

hermes config set model anthropic/claude-sonnet-4
hermes config set ANTHROPIC_API_KEY sk-ant-xxxxx

OpenRouter

Get your key on openrouter.ai/keys, then configure:

hermes config set OPENROUTER_API_KEY sk-or-v1-xxxxx
hermes config set model anthropic/claude-opus-4

Popular models on OpenRouter include anthropic/claude-opus-4 (advanced reasoning), anthropic/claude-sonnet-4 (fast, good value for money), openai/gpt-4o (versatile) and google/gemini-2.5-flash (fast, cheap).

DeepSeek

hermes config set DEEPSEEK_API_KEY sk-xxxxx
hermes config set model deepseek/deepseek-chat

GitHub Copilot

Two authentication options: via hermes model by choosing "GitHub Copilot" for an OAuth flow in the browser, or by manually configuring an existing token:

hermes config set COPILOT_GITHUB_TOKEN gho_xxxxx
hermes config set model github/copilot

Nous Portal

Run hermes model, choose "Nous Portal", then follow the OAuth login. No API key is required.

Custom Endpoint (Ollama, vLLM, etc.)

For local Ollama, set OPENAI_BASE_URL to http://localhost:11434/v1, OPENAI_API_KEY to ollama, then set the model (e.g.: llama3.1:70b). For vLLM, use http://localhost:8000/v1 as the base URL with a model like meta-llama/Llama-3.1-70B-Instruct.

Minimum requirement: 64,000 context tokens

Hermes Agent requires a model with at least 64,000 tokens of context window. Models with a smaller window are rejected at startup.

Why this requirement? Hermes' multi-step workflows (tool-calling, file reading, command execution) quickly consume context. A model with 32K tokens does not have enough working memory to maintain a productive conversation with tool calls.

Most hosted models (Claude, GPT, Gemini, Qwen, DeepSeek) far exceed this threshold. For local models via llama.cpp, pass the --ctx-size 65536 argument. With Ollama, use the -c 65536 flag or the appropriate context variable.

Switch models in a single command

Switching between providers or models is instantaneous:

  • Via hermes model (interactive menu)
  • Via hermes config set model anthropic/claude-sonnet-4 (direct)
  • Via the /model command in a chat session for an interactive selection without leaving the conversation

No restart is required. The change takes effect at the next interaction.

Configuring auxiliary models

Hermes uses secondary models for specific tasks (vision, web extraction, context compression). By default, auxiliary tasks use your main model. You can configure them independently to optimize costs via the hermes model → "Configure auxiliary models" menu.

The interactive menu allows you to configure:

  • vision: model for image analysis
  • web_extract: model for web page summarization
  • session_search: model for searching in past sessions
  • title_generation: model for generating session titles
  • compression: model for context compression

For example, to use a cheaper model for compression, add an auxiliary section in ~/.hermes/config.yaml with the compression key containing the provider (e.g., "openrouter") and the model (e.g., "google/gemini-2.5-flash"). The universal pattern for each auxiliary model is always the same: provider + model + optionally base_url.

Multi-provider routing and fallback

Automatic fallback

Configure a fallback model that takes over automatically if the main provider encounters an error. In ~/.hermes/config.yaml, add a fallback_model section with the fallback provider and model. Hermes first tries the main model, then switches to the fallback after transient errors (rate limits, timeouts, 5xx).

Provider routing

For advanced users who want fine-grained control over routing, the provider_routing section in config.yaml offers several options: sort ("price", "throughput" or "latency"), only to limit to allowed providers, ignore to exclude some, order to define the fallback order, require_parameters to only use providers supporting all params, and data_collection (e.g., "deny") to exclude providers that store data.

Credential rotation

If you have multiple API keys for the same provider, configure the rotation strategy via the credential_pool_strategies section. The available options are fill_first (default), round_robin (fair distribution), least_used (always the least used key) and random.

Testing that a provider works

Before diving into complex tasks, check that your configuration is functional.

Step 1: Diagnostic

hermes doctor

This command detects configuration issues, missing keys, and incompatibilities.

Step 2: Test conversation

Launch hermes then send a simple, verifiable prompt, such as asking to list the files in the current directory and identify the main language of the project. Signs of success are: the startup banner displays your model and provider, Hermes responds without error, tools work (terminal, file reading, etc.), and the conversation continues normally over several turns.

Step 3: Session resumption

hermes --continue   # or hermes -c

Verify that the previous session is properly resumed.

Quick troubleshooting

If something isn't working, follow this sequence: run hermes doctor to diagnose, then hermes model to double-check the configuration, hermes setup to rerun the full setup, and finally hermes --continue to test resumption.

Deploying on a VPS

For a production deployment (Telegram gateway, Discord, etc.), a dedicated VPS is recommended. A VPS with 2 vCPUs and 4 GB of RAM is enough to run Hermes with the gateway.

Hostinger offers high-performance VPSes starting from a few euros per month, ideal for hosting Hermes Agent in always-on mode. Configure the terminal backend in Docker for isolation:

hermes config set terminal.backend docker

Essential environment variables

Here is a summary of the most common variables, to be placed in ~/.hermes/.env or set via hermes config set:

  • OPENROUTER_API_KEY: OpenRouter key
  • ANTHROPIC_API_KEY: Anthropic key
  • DEEPSEEK_API_KEY: DeepSeek key
  • OPENAI_API_KEY: key for custom OpenAI-compatible endpoint
  • OPENAI_BASE_URL: base URL for custom endpoint
  • GOOGLE_API_KEY: Google AI Studio / Gemini key
  • COPILOT_GITHUB_TOKEN: GitHub token for Copilot
  • HF_TOKEN: Hugging Face token

Common errors

  • "Context window too small": your model does not reach the required 64K tokens. Check the configured context size, especially for local models with Ollama or llama.cpp.
  • Unrecognized API key: make sure you placed the key in ~/.hermes/.env via hermes config set, and not in config.yaml. Secrets must always go in the .env file.
  • Timeout with a custom endpoint: check that the base URL ends with /v1 and that the model is actually accessible (test with a simple curl).
  • Fallback not triggered: check that the fallback_model section is properly formatted in config.yaml with the provider and model keys.

FAQ

Can I use multiple providers at the same time?
Yes, via the provider routing and fallback system. You can define a primary provider and one or more fallbacks, or configure automatic routing based on price, latency, or throughput.

Do I need to restart Hermes after a model change?
No. The model change via hermes config set, hermes model or /model takes effect immediately at the next interaction.

How do I know which auxiliary model is being used?
Run hermes config to see the complete configuration, including auxiliary models. If none are explicitly configured, the main model is used for all tasks.

Are API keys stored in plain text?
Yes, in ~/.hermes/.env. Protect access to this file with restrictive permissions (chmod 600 ~/.hermes/.env).

Key takeaways

  • Use hermes model for interactive configuration, hermes config set for quick adjustments
  • Always separate secrets (.env) from parameters (config.yaml)
  • Ensure a minimum of 64K context tokens
  • Always test with hermes doctor and a simple conversation after each change
  • Configure a fallback for service continuity
  • Optimize costs by configuring cheaper auxiliary models for secondary tasks

Conclusion

Configuring models and providers in Hermes Agent is designed to be accessible via hermes model while remaining fully configurable manually for advanced users. The clear separation between .env (secrets) and config.yaml (settings), the support for over 25 providers in 2025, and the fallback and routing mechanisms offer remarkable flexibility.

To go further, discover the 68 tools available in Hermes Agent and learn how to structure the context of your sessions with CLAUDE.md and AGENTS.md files.