Persistent Memory: How Hermes Remembers

Hermes Agent 🟡 Intermediate ⏱️ 11 min read 📅 2026-05-05

Persistent Memory: How Hermes Remembers

Most AI agents are amnesic: every conversation starts from zero, with no memory of previous sessions. Hermes Agent breaks this cycle with a persistent memory system that retains your preferences, projects, and lessons learned across sessions. This memory is automatically injected into every new conversation, giving the agent immediate context without you having to repeat anything.

This article covers the complete workings of Hermes Agent's persistent memory: the two memory stores (MEMORY.md and USER.md), best practices for writing useful entries, searching past sessions, capacity management, and common pitfalls to avoid.

The Fundamental Principle

Hermes Agent maintains bounded, curated memory — deliberately limited in size and actively managed by the agent. This is not unlimited storage: it's precious space where only the most important facts belong.

In practice, memory works as follows:
- Two Markdown files stored in ~/.hermes/: MEMORY.md and USER.md
- Automatic injection into the system prompt at the start of each session
- Managed via the memory tool (add, replace, remove)
- Frozen snapshot at session start (modifications only appear in the next session)

This design guarantees a constant, predictable token footprint while providing the agent with reliable long-term memory.

The Two Memory Targets

Hermes Agent distinguishes two memory stores, each with its purpose and limits:

`memory` — Agent's Personal Notes

The memory target stores everything the agent needs to remember about your environment, projects, and lessons learned:
- Environment facts: installed OS, available tools, project structure
- Project conventions: code style, build tools, test commands
- Technical discoveries: tool quirks, workarounds identified
- Task diary: migrations completed, bugs fixed, deployments done
- Proven techniques: approaches that worked well in the past

Default limit: 2,200 characters (approximately 8 to 15 entries).

`user` — User Profile

The user target keeps everything about your identity, preferences, and communication style:
- Identity: name, role, timezone
- Communication style: concise or detailed responses, preferred format
- Technical preferences: favorite language, editor used, typical workflow
- Things to avoid: pet peeves, phrases not to use

Default limit: 1,375 characters (approximately 5 to 10 entries).

The `memory` Tool in Action

The agent uses the memory tool with three actions:

add — Add a new memory entry
replace — Replace an existing entry (substring matching via old_text)
remove — Remove an obsolete entry (substring matching via old_text)

There is no read action: memory content is automatically injected into the system prompt at the start of each session. The agent "sees" its memories as part of its conversation context.

Substring Matching

The replace and remove actions use unique substring matching. You don't need the full entry text — just a specific enough excerpt to identify a single entry:

# If memory contains "User prefers dark mode in all editors"
memory(action="replace", target="memory",
       old_text="dark mode",
       content="User prefers light mode in VS Code, dark mode in terminal")

If the substring matches multiple entries, an error is returned asking for a more specific match.

The Frozen Snapshot

A crucial point: the system prompt injection is captured once at session start and doesn't change mid-session. This is intentional — it preserves the LLM's prefix cache for optimal performance. When the agent adds or removes entries during a session, changes are immediately written to disk but won't appear in the system prompt until the next session starts.

What to Save — and What to Skip

Save These (Proactively)

The agent saves automatically — you don't need to ask:

User preferences: "I prefer TypeScript over JavaScript" → user target
Environment facts: "This server runs Debian 12 with PostgreSQL 16" → memory target
Corrections: "Don't use sudo for Docker, user is in docker group" → memory target
Conventions: "Project uses tabs, 120-char line width, Google-style docstrings" → memory target
Completed work: "Migrated database from MySQL to PostgreSQL on 2026-01-15" → memory target
Explicit requests: "Remember that my API key rotation happens monthly" → memory target

Skip These

Trivial info: "User asked about Python" — too vague
Easily rediscovered facts: "Python 3.12 supports f-string nesting" — web search handles this
Raw data dumps: large code blocks, log files, data tables — too big
Session-specific ephemera: temporary file paths, one-off debugging context
Already in context files: anything in SOUL.md or AGENTS.md

Facts vs Instructions: Two Types of Entries

The best memory entries fall into two complementary categories:

Facts are neutral information the agent uses to tailor its responses:

"The staging server is at 10.0.1.50, SSH port 2222. Key: ~/.ssh/staging_ed25519."

Instructions are directives the agent should follow:

"Always check dependencies with pnpm outdated before any deployment."

The combination is powerful: facts provide context, instructions guide action. But beware — a vague or obsolete instruction is worse than no instruction at all.

Concrete Examples of Good Memory Entries

Good entries (compact and information-dense):

User runs macOS 14 Sonoma, uses Homebrew, has Docker Desktop and Podman. Shell: zsh with oh-my-zsh. Editor: VS Code with Vim keybindings.

Project ~/code/api uses Go 1.22, sqlc for DB queries, chi router. Run tests with 'make test'. CI via GitHub Actions.

The staging server (10.0.1.50) needs SSH port 2222, not 22. Key is at ~/.ssh/staging_ed25519.

Bad entries:

User has a project.

Too vague — no actionable information.

On January 5th, 2026, the user asked me to look at their project which is located at ~/code/api. I discovered it uses Go version 1.22 and...

Too verbose — wastes precious characters.

Capacity Management: Don't Fill Memory Unnecessarily

Memory has strict limits to keep the system prompt under control:

memory: 2,200 characters (~800 tokens)
user: 1,375 characters (~500 tokens)

When you try to add an entry that would exceed the limit, the tool returns an error with the current entries and usage percentage. The agent should then consolidate existing entries before adding new ones.

Best practice: when memory is above 80% capacity (visible in the injected prompt header), consolidate entries before adding new ones. For example, merge three separate "project uses X" entries into one comprehensive project description.

Detecting and Cleaning Obsolete Entries

Over time, some memories become obsolete. The agent should regularly:
1. Review its memory entries at the start of each session
2. Identify potentially outdated information (old dates, references to outdated software versions)
3. Use replace to update or remove to delete

The system also includes duplicate prevention: if you try to add content that already exists, the tool returns success with a "no duplicate added" message.

Procedural Memory: The Skills System

Beyond declarative memory (MEMORY.md / USER.md), Hermes Agent has procedural memory through its skills system. A skill is a set of knowledge and procedures the agent has learned through experience, enabling it to accomplish complex tasks autonomously.

Skills are stored in ~/.hermes/skills/ and go beyond regular memory:
- They can include scripts, templates, and complete workflows
- They activate automatically when the context requires them
- They evolve over time through the Curator system

The distinction matters: declarative memory says "what" (facts and preferences), procedural memory says "how" (procedures and workflows). Both systems work in tandem to deliver an increasingly competent agent.

`session_search`: Finding Past Conversations

Beyond MEMORY.md and USER.md, the agent can search through all past conversations using the session_search tool:

All CLI and messaging sessions are stored in SQLite (~/.hermes/state.db) with FTS5 full-text search
Queries return relevant past conversations with Gemini Flash summarization
The agent can find exchanges from weeks ago

Persistent Memory vs session_search

Persistent Memory:
- ~1,300 tokens total
- Instant (in system prompt)
- Critical facts always available
- Manually curated by agent
- Fixed token cost per session

Session Search:
- Unlimited capacity (all sessions)
- Requires search + LLM summarization
- Finding specific past conversations
- Automatic storage
- On-demand token cost

Memory is for critical facts always in context. Session search is for "did we discuss X last week?" queries where the agent needs specifics from past conversations.

Memory vs Context Files vs Sessions

Three distinct mechanisms coexist in Hermes Agent, each with its role:

Context files (HERMES.md, AGENTS.md, SOUL.md): project-level instructions and personality. Automatically discovered in the project directory tree, they define expected behavior. Read our context files article for details.

Persistent memory (MEMORY.md, USER.md): facts learned across sessions. The agent manages them itself — you don't intervene directly. This is the "living" memory that evolves with your usage.

Sessions (state.db): complete conversation history. Accessible via session_search to find specific details from past exchanges. Read our sessions and context article to understand the session system.

Real-World Use Cases

Case 1: Full-stack developer with multiple projects

A developer works on a Next.js project and a Python FastAPI project. Thanks to memory, Hermes automatically knows which framework to use depending on the working directory, which commands to run for tests, and which conventions to follow in each project.

Case 2: Multi-server system administrator

The admin manages servers with different configurations (SSH ports, distributions, database versions). Memory retains each server's specifics, avoiding costly errors from connecting to the wrong port or with the wrong key.

Case 3: Team with divergent preferences

On a shared machine, each user can have a different profile in USER.md. The agent adapts its response style and recommendations based on who it's interacting with.

Common Pitfalls and How to Avoid Them

Pitfall 1: Too much memory = wasted tokens

Every character in memory consumes tokens at every session. Filling memory with trivial information reduces space for truly important facts. Be selective: only keep what's hard to find otherwise.

Pitfall 2: Outdated information

Unmaintained memory becomes a hindrance. If an entry says "project uses React 17" while you've moved to React 19, the agent will suggest inappropriate solutions. Clean regularly.

Pitfall 3: Duplicating context files

Don't store in memory what's already in your context files. Hermes reads HERMES.md and AGENTS.md automatically — repeating them in MEMORY.md is wasteful.

Pitfall 4: Contradictory instructions

If one memory entry says "use npm" and another says "use pnpm", the agent will be confused. Ensure consistency across your entries.

Security: Automatic Entry Scanning

Memory entries are automatically scanned for prompt injection attempts, credential exfiltration, and SSH backdoors. Content matching threat patterns is blocked before being accepted. This protection is crucial since memory content is injected into the system prompt.

Configuration

Memory is configured in ~/.hermes/config.yaml:

memory:
  memory_enabled: true
  user_profile_enabled: true
  memory_char_limit: 2200   # ~800 tokens
  user_char_limit: 1375     # ~500 tokens

The default limits are balanced for most use cases, but you can adjust them — keep in mind that more memory = more tokens consumed per session.

External Memory Providers

To go beyond native capabilities, Hermes Agent supports 8 external memory provider plugins: Honcho, OpenViking, Mem0, Hindsight, Holographic, RetainDB, ByteRover, and Supermemory. These providers work alongside built-in memory and add capabilities like knowledge graphs, semantic search, and automatic fact extraction.

Set up an external provider with:

hermes memory setup      # pick a provider and configure it
hermes memory status     # check what's active

Conclusion

Persistent memory is one of the most transformative features of Hermes Agent. It turns a one-time assistant into a true partner that learns and adapts over time. The key is discipline: concise entries, regularly cleaned, well-separated between facts and instructions. Combined with the skills system (procedural memory) and session search, it creates a living knowledge base that makes each session more productive than the last.

To go further in mastering Hermes Agent, explore our previous articles:
- Introduction and Installation
- Configuring Models and Providers
- Available Tools
- Mastering the CLI
- Sessions and Context
- Context Files

#CLI #Hermes Agent #Memory #Productivity #autonomous agent #ia

📚 Related articles

Hermes Agent 🟢 Débutant 13 min

Hermes Agent: Complete Presentation and Installation Guide

Discover Hermes Agent, the most complete open source AI agent. Step-by-step installation guide: local, VPS, Android. 68 tools, multi-platform, free.

2026-05-05 14:42

Hermes Agent 🟢 Débutant 11 min

Configure models and providers in Hermes Agent

Complete guide to setting up AI models and providers in Hermes Agent: Anthropic, OpenRouter, DeepSeek, GitHub Copilot, and custom endpoints.

2026-05-05 14:51

Hermes Agent 🟡 Intermédiaire 12 min

Hermes Agent: All 68 Built-in Tools — Complete Guide

Complete guide to all 68 Hermes Agent built-in tools: terminal, web, browser, vision, automation, and integrations.

2026-05-05 14:57

📑 Table of contents