📑 Table of contents

Hermes Agent: All 68 Built-in Tools — Complete Guide

Hermes Agent 🟡 Intermediate ⏱️ 12 min read 📅 2026-05-05

After [installing Hermes Agent](/article/hermes-agent-presentation-installation) and [configuring your models and providers](/article/hermes-agent-configurer-modeles-providers), it is time to explore what makes this agent powerful: its tools. Hermes Agent ships with 68 built-in tools organized into logical toolsets, covering virtually every use case — from web research to automation, including smart home control.

This guide walks through each tool category, explains how to enable or disable them, and provides concrete usage examples. Whether you are a developer, DevOps engineer, or simply curious, you will find everything you need to make the most of the Hermes tool ecosystem.

Toolset architecture: how it works

Hermes tools are not enabled individually in an ad-hoc manner. They are grouped into toolsets — named bundles that control what the agent can do. This is the primary mechanism for configuring tool availability per platform, per user profile, or per use case.

Each messaging platform has its own toolset preset. For example, hermes-telegram enables web, terminal, file, vision, todo, memory, and messaging by default, while hermes-cli enables a broader set including code_execution and delegation.

A toolset can consist of a single tool (like tts for text-to-speech) or multiple tools working together (like browser which groups 10 automation tools). There are also composite toolsets like debugging that aggregate file + terminal + web.

System tools: terminal and files

The operational core of Hermes relies on four fundamental tools for interacting with the filesystem and executing commands.

terminal and process

The terminal tool executes shell commands on the Linux environment. It supports seven execution backends:

  • local — direct execution on the host machine
  • docker — persistent isolated container (one shared container per session)
  • ssh — remote execution on a dedicated server
  • singularity — HPC containers for cluster computing
  • modal — serverless cloud execution
  • daytona — persistent cloud workspace
  • vercel_sandbox — cloud microVM with snapshot-based persistence

Long-running commands can be launched in the background with background=true. The process tool then manages these processes: list, poll, wait, retrieve logs, or kill them. PTY mode (pty=true) enables interactive CLI tools like Codex or Claude Code.

Example: launch a test suite in the background and get notified when it finishes

terminal(command="pytest -v tests/", background=true, notify_on_complete=true)
# Process runs in the background, Hermes notifies automatically when done

read_file, write_file, search_files, patch

These four tools replace classic shell commands with safer, smarter equivalents:

  • read_file — reads a file with line numbers and pagination. Cannot read images or binary files.
  • write_file — writes complete file content (full overwrite). Creates parent directories automatically.
  • search_files — content search by regex or file search by glob pattern, powered by ripgrep
  • patch — targeted find-and-replace with 9 fuzzy matching strategies, returns unified diff with automatic syntax checks

Example: fix a variable in a config file

patch(path="config.yaml", old_string="port: 3000", new_string="port: 8080")
# Returns a diff and automatically checks YAML syntax

Web tools: search and extraction

The web toolset includes two complementary tools for online information access.

web_search performs web searches returning up to 100 results with titles, URLs, and descriptions. It supports advanced search operators (site:, filetype:, intitle:, exact phrases). Multiple backends are supported: Exa, Parallel, Firecrawl, and Tavily.

web_extract retrieves URL content and converts it to markdown. It also works with PDFs — just pass the PDF URL directly. Pages under 5000 characters return full markdown; longer pages are LLM-summarized.

Example: search and extract API documentation

web_search(query="FastAPI middleware authentication site:fastapi.tiangolo.com")
web_extract(url="https://fastapi.tiangolo.com/tutorial/middleware/")

Browser tools: advanced web automation

The browser toolset is the richest in tool count, with 10 dedicated tools for interactive web automation:

  • browser_navigate — opens a URL and initializes the session
  • browser_snapshot — captures the page accessibility tree with reference IDs (@e1, @e2...) for interaction
  • browser_click — clicks an element by its reference ID
  • browser_type — types text into a field
  • browser_scroll — scrolls the page
  • browser_press — simulates a keyboard key
  • browser_back — navigates to the previous page
  • browser_get_images — lists all images on the page
  • browser_console — retrieves the JavaScript console
  • browser_vision — takes a screenshot and analyzes it with a vision model

A separate browser-cdp toolset (2 additional tools) activates automatically when a Chrome DevTools Protocol endpoint is detected, allowing raw CDP commands and native JavaScript dialog responses.

Example: navigate a site, fill a form, and visually verify the result

browser_navigate(url="https://example.com/login")
browser_snapshot()
browser_type(ref="@e5", text="my_username")
browser_type(ref="@e7", text="my_password")
browser_click(ref="@e9")
browser_vision(prompt="Verify that the dashboard page loaded correctly")

AI tools: vision, image generation, and TTS

Hermes Agent is not just a text agent — it has powerful multimodal tools:

vision_analyze

Image analysis via vision models. The agent can identify elements, read text (OCR), describe interfaces, or diagnose errors from screenshots.

image_generate

Text-to-image generation via FAL.ai (with optional OpenAI and xAI support). The default model is FLUX 2 Klein 9B, capable of generating an image in under one second.

text_to_speech

Text-to-speech conversion with native delivery per platform: voice bubble on Telegram, audio attachment on Discord and WhatsApp, file in ~/voice-memos/ in CLI. Voice and provider are configurable.

Example: generate a concept image and send it with a voice description

image_generate(prompt="A robot assistant in a modern office, flat illustration style")
text_to_speech(text="Here is the visual concept I generated for your project.")

Communication tools: messaging and Discord

send_message

The send_message tool allows Hermes to send messages to any connected platform (Telegram, Discord, Slack, WhatsApp, etc.) from within a session. Before sending, you must first list available targets with action="list".

discord and discord_admin

Two Discord-specific toolsets, available only on the hermes-discord platform:

  • discord — member search, message sending, reactions, channel reading and participation
  • discord_admin — moderation: role management, channels, timeouts, kicks, and bans (requires appropriate Discord permissions)

Productivity tools: todo, cron, skills, and memory

todo

Session task list management. Ideal for complex multi-step workflows. The agent can create, update, merge, and check off tasks automatically.

cronjob

Scheduled task manager with actions: create, list, update, pause, resume, run, and remove. Jobs can be attached to skills for sophisticated automations. Cron executions launch in fresh sessions with no current chat context.

Example: automate a weekly report

cronjob(action="create", name="weekly-report", schedule="0 9 * * 1",
        prompt="Generate a summary of last week's GitHub activity")

skills

The skills toolset includes three tools for managing the agent's procedural capabilities:

  • skills_list — lists available skills (name + description)
  • skill_view — loads full skill content and linked files (templates, scripts)
  • skill_manage — skill creation, update, and deletion

Skills are Hermes's procedural memory — reusable approaches for recurring task types, compatible with the agentskills.io standard and shareable via the community Skills Hub.

memory

The memory tool manages persistent cross-session memory. Important information is saved and injected into the system prompt at the start of each new session. This is how the agent remembers your preferences, environment, and context between conversations.

session_search

Search across all past session history. When the user says "we did this before" or "last time", this tool helps the agent retrieve context and summarize what happened.

Development tools: code execution and delegation

execute_code

The execute_code tool runs Python scripts that can programmatically call Hermes tools. Use it when you need 3+ tool calls with processing logic between them, or when you want to filter/reduce large tool outputs before they enter your context.

delegate_task

Spawn isolated subagents for parallel work. Each subagent gets its own conversation, terminal session, and toolset. Only the final summary is returned — intermediate results never pollute your context window. Ideal for dividing large projects into independent subtasks.

mixture_of_agents

The MOA tool routes a difficult problem through multiple collaborative LLMs. It makes 5 API calls (4 reference models + 1 aggregator) — reserve this for genuinely complex problems in mathematics, algorithms, or advanced reasoning.

Integration tools: Home Assistant, Spotify, RL, MCP

Home Assistant (4 tools)

The homeassistant toolset provides complete smart home control:

  • ha_list_entities — lists entities (lights, switches, sensors...) with domain or area filtering
  • ha_get_state — detailed entity state (brightness, temperature, etc.)
  • ha_list_services — lists available actions for each device type
  • ha_call_service — executes an action on a device

Example: turn on living room lights and set the thermostat

ha_call_service(domain="light", service="turn_on", entity_id="light.living_room")
ha_call_service(domain="climate", service="set_temperature",
               entity_id="climate.living_room", kwargs='{"temperature": 22}')

Spotify (7 tools)

Native Spotify control via the bundled plugin: playback, queue, search, playlists, albums, and library. Requires initial OAuth authorization via hermes spotify setup.

RL Training (10 tools)

Complete RL training management suite via Atropos: environment selection, configuration, training launch, WandB monitoring, stopping, and inference testing. Requires TINKER_API_KEY and WANDB_API_KEY.

MCP (Model Context Protocol)

Beyond the 68 built-in tools, Hermes can dynamically load tools from MCP servers. MCP tools appear with a server-name prefix (e.g., github_create_issue for the github server). This allows unlimited capability extension by connecting any compatible MCP server.

Other platforms: Feishu and Yuanbao

Specific toolsets exist for regional platforms:

  • feishu_doc (1 tool) — Feishu/Lark document reading
  • feishu_drive (4 tools) — Feishu file comment operations
  • yuanbao (5 tools) — DMs, groups, and stickers on Tencent Yuanbao platform

Configuring tools: the hermes tools command

Tool management is primarily done via the CLI:

# See all available tools and their status
hermes tools

# Launch interactive tool configuration per platform
hermes tools

# Use specific toolsets in CLI
hermes chat --toolsets "web,terminal,file"

# Enable a toolset in config
hermes config set toolsets.enabled '["web","terminal","file","browser"]'

The hermes tools command without arguments launches an interactive menu to browse toolsets, see which tools they contain, and enable or disable them per platform.

The safe toolset: for restricted environments

The safe toolset is designed specifically for environments where security is paramount. It includes only read-only tools:

  • web_search — web search
  • web_extract — content extraction
  • vision_analyze — image analysis
  • image_generate — image generation

No terminal access, no file write access, no code execution. Perfect for public messaging platforms or shared instances.

Security best practices

Choosing which tools to allow is not just a feature question — it is a security question. Here are the recommendations by platform:

Local CLI (hermes-cli) — Full profile. Enable all toolsets: terminal, file, code_execution, delegation, rl, browser. The environment is user-controlled.

Telegram / Discord / Slack — Balanced profile. Enable web, file (read-only if possible), vision, todo, memory, cronjob, messaging. Terminal and code_execution should only be enabled if the instance is private and secured. Use the Docker backend to isolate executions.

Public or shared instances — Use the safe toolset as a base. Add only strictly necessary tools. The SSH backend is recommended to prevent the agent from modifying its own code.

Production environments — Disable rl (training can consume significant resources), limit code_execution to the Docker backend with constrained resources (CPU, memory), and enable process quotas.

The Docker backend offers a good balance between flexibility and security: a persistent container with complete isolation, read-only root filesystem, and all Linux capabilities dropped.

Toolset summary

  • browser (10 tools) — interactive web automation
  • browser-cdp (2 tools) — Chrome DevTools Protocol commands
  • file (4 tools) — file reading, writing, searching, and patching
  • terminal (2 tools) — command execution and process management
  • web (2 tools) — web search and extraction
  • vision (1 tool) — image analysis
  • image_gen (1 tool) — image generation
  • tts (1 tool) — text-to-speech
  • todo (1 tool) — task management
  • cronjob (1 tool) — scheduled tasks
  • memory (1 tool) — persistent memory
  • session_search (1 tool) — history search
  • skills (3 tools) — skill management
  • messaging (1 tool) — cross-platform messaging
  • clarify (1 tool) — user clarification
  • code_execution (1 tool) — Python code execution
  • delegation (1 tool) — parallel subagents
  • moa (1 tool) — multi-model consensus
  • homeassistant (4 tools) — smart home control
  • discord (1 tool) — Discord actions
  • discord_admin (1 tool) — Discord moderation
  • spotify (7 tools) — Spotify control
  • rl (10 tools) — RL training
  • feishu_doc (1 tool) — Feishu documents
  • feishu_drive (4 tools) — Feishu comments
  • yuanbao (5 tools) — Yuanbao platform
  • safe (composite) — secure read-only profile
  • debugging (composite) — diagnostic bundle

Conclusion

With its 68 built-in tools organized into configurable toolsets, Hermes Agent covers an impressive range of capabilities. The toolset system provides fine-grained configuration: you choose exactly what the agent can do, on which platform, and with what access level.

The strength of this architecture lies in its modularity. You do not have to enable everything. A minimal setup with just web + safe is sufficient for a research assistant. An advanced setup with terminal + code_execution + delegation transforms Hermes into a true co-developer. And MCP integration extends these capabilities infinitely by connecting external tool servers.

In the next article in this series, we will explore Hermes's memory system — how the agent learns from your interactions and remembers your context across sessions.