After [installing Hermes Agent](/article/hermes-agent-presentation-installation) and [configuring your models and providers](/article/hermes-agent-configurer-modeles-providers), it is time to explore what makes this agent powerful: its tools. Hermes Agent ships with 68 built-in tools organized into logical toolsets, covering virtually every use case — from web research to automation, including smart home control.
This guide walks through each tool category, explains how to enable or disable them, and provides concrete usage examples. Whether you are a developer, DevOps engineer, or simply curious, you will find everything you need to make the most of the Hermes tool ecosystem.
Toolset architecture: how it works
Hermes tools are not enabled individually in an ad-hoc manner. They are grouped into toolsets — named bundles that control what the agent can do. This is the primary mechanism for configuring tool availability per platform, per user profile, or per use case.
Each messaging platform has its own toolset preset. For example, hermes-telegram enables web, terminal, file, vision, todo, memory, and messaging by default, while hermes-cli enables a broader set including code_execution and delegation.
A toolset can consist of a single tool (like tts for text-to-speech) or multiple tools working together (like browser which groups 10 automation tools). There are also composite toolsets like debugging that aggregate file + terminal + web.
System tools: terminal and files
The operational core of Hermes relies on four fundamental tools for interacting with the filesystem and executing commands.
terminal and process
The terminal tool executes shell commands on the Linux environment. It supports seven execution backends:
- local — direct execution on the host machine
- docker — persistent isolated container (one shared container per session)
- ssh — remote execution on a dedicated server
- singularity — HPC containers for cluster computing
- modal — serverless cloud execution
- daytona — persistent cloud workspace
- vercel_sandbox — cloud microVM with snapshot-based persistence
Long-running commands can be launched in the background with background=true. The process tool then manages these processes: list, poll, wait, retrieve logs, or kill them. PTY mode (pty=true) enables interactive CLI tools like Codex or Claude Code.
Example: launch a test suite in the background and get notified when it finishes
terminal(command="pytest -v tests/", background=true, notify_on_complete=true)
# Process runs in the background, Hermes notifies automatically when done
read_file, write_file, search_files, patch
These four tools replace classic shell commands with safer, smarter equivalents:
- read_file — reads a file with line numbers and pagination. Cannot read images or binary files.
- write_file — writes complete file content (full overwrite). Creates parent directories automatically.
- search_files — content search by regex or file search by glob pattern, powered by ripgrep
- patch — targeted find-and-replace with 9 fuzzy matching strategies, returns unified diff with automatic syntax checks
Example: fix a variable in a config file
patch(path="config.yaml", old_string="port: 3000", new_string="port: 8080")
# Returns a diff and automatically checks YAML syntax
Web tools: search and extraction
The web toolset includes two complementary tools for online information access.
web_search performs web searches returning up to 100 results with titles, URLs, and descriptions. It supports advanced search operators (site:, filetype:, intitle:, exact phrases). Multiple backends are supported: Exa, Parallel, Firecrawl, and Tavily.
web_extract retrieves URL content and converts it to markdown. It also works with PDFs — just pass the PDF URL directly. Pages under 5000 characters return full markdown; longer pages are LLM-summarized.
Example: search and extract API documentation
web_search(query="FastAPI middleware authentication site:fastapi.tiangolo.com")
web_extract(url="https://fastapi.tiangolo.com/tutorial/middleware/")
Browser tools: advanced web automation
The browser toolset is the richest in tool count, with 10 dedicated tools for interactive web automation:
- browser_navigate — opens a URL and initializes the session
- browser_snapshot — captures the page accessibility tree with reference IDs (@e1, @e2...) for interaction
- browser_click — clicks an element by its reference ID
- browser_type — types text into a field
- browser_scroll — scrolls the page
- browser_press — simulates a keyboard key
- browser_back — navigates to the previous page
- browser_get_images — lists all images on the page
- browser_console — retrieves the JavaScript console
- browser_vision — takes a screenshot and analyzes it with a vision model
A separate browser-cdp toolset (2 additional tools) activates automatically when a Chrome DevTools Protocol endpoint is detected, allowing raw CDP commands and native JavaScript dialog responses.
Example: navigate a site, fill a form, and visually verify the result
browser_navigate(url="https://example.com/login")
browser_snapshot()
browser_type(ref="@e5", text="my_username")
browser_type(ref="@e7", text="my_password")
browser_click(ref="@e9")
browser_vision(prompt="Verify that the dashboard page loaded correctly")
AI tools: vision, image generation, and TTS
Hermes Agent is not just a text agent — it has powerful multimodal tools:
vision_analyze
Image analysis via vision models. The agent can identify elements, read text (OCR), describe interfaces, or diagnose errors from screenshots.
image_generate
Text-to-image generation via FAL.ai (with optional OpenAI and xAI support). The default model is FLUX 2 Klein 9B, capable of generating an image in under one second.
text_to_speech
Text-to-speech conversion with native delivery per platform: voice bubble on Telegram, audio attachment on Discord and WhatsApp, file in ~/voice-memos/ in CLI. Voice and provider are configurable.
Example: generate a concept image and send it with a voice description
image_generate(prompt="A robot assistant in a modern office, flat illustration style")
text_to_speech(text="Here is the visual concept I generated for your project.")
Communication tools: messaging and Discord
send_message
The send_message tool allows Hermes to send messages to any connected platform (Telegram, Discord, Slack, WhatsApp, etc.) from within a session. Before sending, you must first list available targets with action="list".
discord and discord_admin
Two Discord-specific toolsets, available only on the hermes-discord platform:
- discord — member search, message sending, reactions, channel reading and participation
- discord_admin — moderation: role management, channels, timeouts, kicks, and bans (requires appropriate Discord permissions)
Productivity tools: todo, cron, skills, and memory
todo
Session task list management. Ideal for complex multi-step workflows. The agent can create, update, merge, and check off tasks automatically.
cronjob
Scheduled task manager with actions: create, list, update, pause, resume, run, and remove. Jobs can be attached to skills for sophisticated automations. Cron executions launch in fresh sessions with no current chat context.
Example: automate a weekly report
cronjob(action="create", name="weekly-report", schedule="0 9 * * 1",
prompt="Generate a summary of last week's GitHub activity")
skills
The skills toolset includes three tools for managing the agent's procedural capabilities:
- skills_list — lists available skills (name + description)
- skill_view — loads full skill content and linked files (templates, scripts)
- skill_manage — skill creation, update, and deletion
Skills are Hermes's procedural memory — reusable approaches for recurring task types, compatible with the agentskills.io standard and shareable via the community Skills Hub.
memory
The memory tool manages persistent cross-session memory. Important information is saved and injected into the system prompt at the start of each new session. This is how the agent remembers your preferences, environment, and context between conversations.
session_search
Search across all past session history. When the user says "we did this before" or "last time", this tool helps the agent retrieve context and summarize what happened.
Development tools: code execution and delegation
execute_code
The execute_code tool runs Python scripts that can programmatically call Hermes tools. Use it when you need 3+ tool calls with processing logic between them, or when you want to filter/reduce large tool outputs before they enter your context.
delegate_task
Spawn isolated subagents for parallel work. Each subagent gets its own conversation, terminal session, and toolset. Only the final summary is returned — intermediate results never pollute your context window. Ideal for dividing large projects into independent subtasks.
mixture_of_agents
The MOA tool routes a difficult problem through multiple collaborative LLMs. It makes 5 API calls (4 reference models + 1 aggregator) — reserve this for genuinely complex problems in mathematics, algorithms, or advanced reasoning.
Integration tools: Home Assistant, Spotify, RL, MCP
Home Assistant (4 tools)
The homeassistant toolset provides complete smart home control:
- ha_list_entities — lists entities (lights, switches, sensors...) with domain or area filtering
- ha_get_state — detailed entity state (brightness, temperature, etc.)
- ha_list_services — lists available actions for each device type
- ha_call_service — executes an action on a device
Example: turn on living room lights and set the thermostat
ha_call_service(domain="light", service="turn_on", entity_id="light.living_room")
ha_call_service(domain="climate", service="set_temperature",
entity_id="climate.living_room", kwargs='{"temperature": 22}')
Spotify (7 tools)
Native Spotify control via the bundled plugin: playback, queue, search, playlists, albums, and library. Requires initial OAuth authorization via hermes spotify setup.
RL Training (10 tools)
Complete RL training management suite via Atropos: environment selection, configuration, training launch, WandB monitoring, stopping, and inference testing. Requires TINKER_API_KEY and WANDB_API_KEY.
MCP (Model Context Protocol)
Beyond the 68 built-in tools, Hermes can dynamically load tools from MCP servers. MCP tools appear with a server-name prefix (e.g., github_create_issue for the github server). This allows unlimited capability extension by connecting any compatible MCP server.
Other platforms: Feishu and Yuanbao
Specific toolsets exist for regional platforms:
- feishu_doc (1 tool) — Feishu/Lark document reading
- feishu_drive (4 tools) — Feishu file comment operations
- yuanbao (5 tools) — DMs, groups, and stickers on Tencent Yuanbao platform
Configuring tools: the hermes tools command
Tool management is primarily done via the CLI:
# See all available tools and their status
hermes tools
# Launch interactive tool configuration per platform
hermes tools
# Use specific toolsets in CLI
hermes chat --toolsets "web,terminal,file"
# Enable a toolset in config
hermes config set toolsets.enabled '["web","terminal","file","browser"]'
The hermes tools command without arguments launches an interactive menu to browse toolsets, see which tools they contain, and enable or disable them per platform.
The safe toolset: for restricted environments
The safe toolset is designed specifically for environments where security is paramount. It includes only read-only tools:
web_search— web searchweb_extract— content extractionvision_analyze— image analysisimage_generate— image generation
No terminal access, no file write access, no code execution. Perfect for public messaging platforms or shared instances.
Security best practices
Choosing which tools to allow is not just a feature question — it is a security question. Here are the recommendations by platform:
Local CLI (hermes-cli) — Full profile. Enable all toolsets: terminal, file, code_execution, delegation, rl, browser. The environment is user-controlled.
Telegram / Discord / Slack — Balanced profile. Enable web, file (read-only if possible), vision, todo, memory, cronjob, messaging. Terminal and code_execution should only be enabled if the instance is private and secured. Use the Docker backend to isolate executions.
Public or shared instances — Use the safe toolset as a base. Add only strictly necessary tools. The SSH backend is recommended to prevent the agent from modifying its own code.
Production environments — Disable rl (training can consume significant resources), limit code_execution to the Docker backend with constrained resources (CPU, memory), and enable process quotas.
The Docker backend offers a good balance between flexibility and security: a persistent container with complete isolation, read-only root filesystem, and all Linux capabilities dropped.
Toolset summary
- browser (10 tools) — interactive web automation
- browser-cdp (2 tools) — Chrome DevTools Protocol commands
- file (4 tools) — file reading, writing, searching, and patching
- terminal (2 tools) — command execution and process management
- web (2 tools) — web search and extraction
- vision (1 tool) — image analysis
- image_gen (1 tool) — image generation
- tts (1 tool) — text-to-speech
- todo (1 tool) — task management
- cronjob (1 tool) — scheduled tasks
- memory (1 tool) — persistent memory
- session_search (1 tool) — history search
- skills (3 tools) — skill management
- messaging (1 tool) — cross-platform messaging
- clarify (1 tool) — user clarification
- code_execution (1 tool) — Python code execution
- delegation (1 tool) — parallel subagents
- moa (1 tool) — multi-model consensus
- homeassistant (4 tools) — smart home control
- discord (1 tool) — Discord actions
- discord_admin (1 tool) — Discord moderation
- spotify (7 tools) — Spotify control
- rl (10 tools) — RL training
- feishu_doc (1 tool) — Feishu documents
- feishu_drive (4 tools) — Feishu comments
- yuanbao (5 tools) — Yuanbao platform
- safe (composite) — secure read-only profile
- debugging (composite) — diagnostic bundle
Conclusion
With its 68 built-in tools organized into configurable toolsets, Hermes Agent covers an impressive range of capabilities. The toolset system provides fine-grained configuration: you choose exactly what the agent can do, on which platform, and with what access level.
The strength of this architecture lies in its modularity. You do not have to enable everything. A minimal setup with just web + safe is sufficient for a research assistant. An advanced setup with terminal + code_execution + delegation transforms Hermes into a true co-developer. And MCP integration extends these capabilities infinitely by connecting external tool servers.
In the next article in this series, we will explore Hermes's memory system — how the agent learns from your interactions and remembers your context across sessions.