Persistent Memory: How Hermes Remembers
Most AI agents are amnesic: every conversation starts from zero, with no memory of previous sessions. Hermes Agent breaks this cycle with a persistent memory system that retains your preferences, projects, and lessons learned across sessions. This memory is automatically injected into every new conversation, giving the agent immediate context without you having to repeat anything.
This article covers the complete workings of Hermes Agent's persistent memory: the two memory stores (MEMORY.md and USER.md), best practices for writing useful entries, searching past sessions, capacity management, and common pitfalls to avoid.
The Fundamental Principle
Hermes Agent maintains bounded, curated memory — deliberately limited in size and actively managed by the agent. This is not unlimited storage: it's precious space where only the most important facts belong.
In practice, memory works as follows:
- Two Markdown files stored in ~/.hermes/: MEMORY.md and USER.md
- Automatic injection into the system prompt at the start of each session
- Managed via the memory tool (add, replace, remove)
- Frozen snapshot at session start (modifications only appear in the next session)
This design guarantees a constant, predictable token footprint while providing the agent with reliable long-term memory.
The Two Memory Targets
Hermes Agent distinguishes two memory stores, each with its purpose and limits:
memory — Agent's Personal Notes
The memory target stores everything the agent needs to remember about your environment, projects, and lessons learned:
- Environment facts: installed OS, available tools, project structure
- Project conventions: code style, build tools, test commands
- Technical discoveries: tool quirks, workarounds identified
- Task diary: migrations completed, bugs fixed, deployments done
- Proven techniques: approaches that worked well in the past
Default limit: 2,200 characters (approximately 8 to 15 entries).
user — User Profile
The user target keeps everything about your identity, preferences, and communication style:
- Identity: name, role, timezone
- Communication style: concise or detailed responses, preferred format
- Technical preferences: favorite language, editor used, typical workflow
- Things to avoid: pet peeves, phrases not to use
Default limit: 1,375 characters (approximately 5 to 10 entries).
The memory Tool in Action
The agent uses the memory tool with three actions:
add— Add a new memory entryreplace— Replace an existing entry (substring matching viaold_text)remove— Remove an obsolete entry (substring matching viaold_text)
There is no read action: memory content is automatically injected into the system prompt at the start of each session. The agent "sees" its memories as part of its conversation context.
Substring Matching
The replace and remove actions use unique substring matching. You don't need the full entry text — just a specific enough excerpt to identify a single entry:
# If memory contains "User prefers dark mode in all editors"
memory(action="replace", target="memory",
old_text="dark mode",
content="User prefers light mode in VS Code, dark mode in terminal")
If the substring matches multiple entries, an error is returned asking for a more specific match.
The Frozen Snapshot
A crucial point: the system prompt injection is captured once at session start and doesn't change mid-session. This is intentional — it preserves the LLM's prefix cache for optimal performance. When the agent adds or removes entries during a session, changes are immediately written to disk but won't appear in the system prompt until the next session starts.
What to Save — and What to Skip
Save These (Proactively)
The agent saves automatically — you don't need to ask:
- User preferences: "I prefer TypeScript over JavaScript" →
usertarget - Environment facts: "This server runs Debian 12 with PostgreSQL 16" →
memorytarget - Corrections: "Don't use sudo for Docker, user is in docker group" →
memorytarget - Conventions: "Project uses tabs, 120-char line width, Google-style docstrings" →
memorytarget - Completed work: "Migrated database from MySQL to PostgreSQL on 2026-01-15" →
memorytarget - Explicit requests: "Remember that my API key rotation happens monthly" →
memorytarget
Skip These
- Trivial info: "User asked about Python" — too vague
- Easily rediscovered facts: "Python 3.12 supports f-string nesting" — web search handles this
- Raw data dumps: large code blocks, log files, data tables — too big
- Session-specific ephemera: temporary file paths, one-off debugging context
- Already in context files: anything in SOUL.md or AGENTS.md
Facts vs Instructions: Two Types of Entries
The best memory entries fall into two complementary categories:
Facts are neutral information the agent uses to tailor its responses:
"The staging server is at 10.0.1.50, SSH port 2222. Key: ~/.ssh/staging_ed25519."
Instructions are directives the agent should follow:
"Always check dependencies with pnpm outdated before any deployment."
The combination is powerful: facts provide context, instructions guide action. But beware — a vague or obsolete instruction is worse than no instruction at all.
Concrete Examples of Good Memory Entries
Good entries (compact and information-dense):
User runs macOS 14 Sonoma, uses Homebrew, has Docker Desktop and Podman. Shell: zsh with oh-my-zsh. Editor: VS Code with Vim keybindings.
Project ~/code/api uses Go 1.22, sqlc for DB queries, chi router. Run tests with 'make test'. CI via GitHub Actions.
The staging server (10.0.1.50) needs SSH port 2222, not 22. Key is at ~/.ssh/staging_ed25519.
Bad entries:
User has a project.
Too vague — no actionable information.
On January 5th, 2026, the user asked me to look at their project which is located at ~/code/api. I discovered it uses Go version 1.22 and...
Too verbose — wastes precious characters.
Capacity Management: Don't Fill Memory Unnecessarily
Memory has strict limits to keep the system prompt under control:
- memory: 2,200 characters (~800 tokens)
- user: 1,375 characters (~500 tokens)
When you try to add an entry that would exceed the limit, the tool returns an error with the current entries and usage percentage. The agent should then consolidate existing entries before adding new ones.
Best practice: when memory is above 80% capacity (visible in the injected prompt header), consolidate entries before adding new ones. For example, merge three separate "project uses X" entries into one comprehensive project description.
Detecting and Cleaning Obsolete Entries
Over time, some memories become obsolete. The agent should regularly:
1. Review its memory entries at the start of each session
2. Identify potentially outdated information (old dates, references to outdated software versions)
3. Use replace to update or remove to delete
The system also includes duplicate prevention: if you try to add content that already exists, the tool returns success with a "no duplicate added" message.
Procedural Memory: The Skills System
Beyond declarative memory (MEMORY.md / USER.md), Hermes Agent has procedural memory through its skills system. A skill is a set of knowledge and procedures the agent has learned through experience, enabling it to accomplish complex tasks autonomously.
Skills are stored in ~/.hermes/skills/ and go beyond regular memory:
- They can include scripts, templates, and complete workflows
- They activate automatically when the context requires them
- They evolve over time through the Curator system
The distinction matters: declarative memory says "what" (facts and preferences), procedural memory says "how" (procedures and workflows). Both systems work in tandem to deliver an increasingly competent agent.
session_search: Finding Past Conversations
Beyond MEMORY.md and USER.md, the agent can search through all past conversations using the session_search tool:
- All CLI and messaging sessions are stored in SQLite (
~/.hermes/state.db) with FTS5 full-text search - Queries return relevant past conversations with Gemini Flash summarization
- The agent can find exchanges from weeks ago
Persistent Memory vs session_search
Persistent Memory:
- ~1,300 tokens total
- Instant (in system prompt)
- Critical facts always available
- Manually curated by agent
- Fixed token cost per session
Session Search:
- Unlimited capacity (all sessions)
- Requires search + LLM summarization
- Finding specific past conversations
- Automatic storage
- On-demand token cost
Memory is for critical facts always in context. Session search is for "did we discuss X last week?" queries where the agent needs specifics from past conversations.
Memory vs Context Files vs Sessions
Three distinct mechanisms coexist in Hermes Agent, each with its role:
Context files (HERMES.md, AGENTS.md, SOUL.md): project-level instructions and personality. Automatically discovered in the project directory tree, they define expected behavior. Read our context files article for details.
Persistent memory (MEMORY.md, USER.md): facts learned across sessions. The agent manages them itself — you don't intervene directly. This is the "living" memory that evolves with your usage.
Sessions (state.db): complete conversation history. Accessible via session_search to find specific details from past exchanges. Read our sessions and context article to understand the session system.
Real-World Use Cases
Case 1: Full-stack developer with multiple projects
A developer works on a Next.js project and a Python FastAPI project. Thanks to memory, Hermes automatically knows which framework to use depending on the working directory, which commands to run for tests, and which conventions to follow in each project.
Case 2: Multi-server system administrator
The admin manages servers with different configurations (SSH ports, distributions, database versions). Memory retains each server's specifics, avoiding costly errors from connecting to the wrong port or with the wrong key.
Case 3: Team with divergent preferences
On a shared machine, each user can have a different profile in USER.md. The agent adapts its response style and recommendations based on who it's interacting with.
Common Pitfalls and How to Avoid Them
Pitfall 1: Too much memory = wasted tokens
Every character in memory consumes tokens at every session. Filling memory with trivial information reduces space for truly important facts. Be selective: only keep what's hard to find otherwise.
Pitfall 2: Outdated information
Unmaintained memory becomes a hindrance. If an entry says "project uses React 17" while you've moved to React 19, the agent will suggest inappropriate solutions. Clean regularly.
Pitfall 3: Duplicating context files
Don't store in memory what's already in your context files. Hermes reads HERMES.md and AGENTS.md automatically — repeating them in MEMORY.md is wasteful.
Pitfall 4: Contradictory instructions
If one memory entry says "use npm" and another says "use pnpm", the agent will be confused. Ensure consistency across your entries.
Security: Automatic Entry Scanning
Memory entries are automatically scanned for prompt injection attempts, credential exfiltration, and SSH backdoors. Content matching threat patterns is blocked before being accepted. This protection is crucial since memory content is injected into the system prompt.
Configuration
Memory is configured in ~/.hermes/config.yaml:
memory:
memory_enabled: true
user_profile_enabled: true
memory_char_limit: 2200 # ~800 tokens
user_char_limit: 1375 # ~500 tokens
The default limits are balanced for most use cases, but you can adjust them — keep in mind that more memory = more tokens consumed per session.
External Memory Providers
To go beyond native capabilities, Hermes Agent supports 8 external memory provider plugins: Honcho, OpenViking, Mem0, Hindsight, Holographic, RetainDB, ByteRover, and Supermemory. These providers work alongside built-in memory and add capabilities like knowledge graphs, semantic search, and automatic fact extraction.
Set up an external provider with:
hermes memory setup # pick a provider and configure it
hermes memory status # check what's active
Conclusion
Persistent memory is one of the most transformative features of Hermes Agent. It turns a one-time assistant into a true partner that learns and adapts over time. The key is discipline: concise entries, regularly cleaned, well-separated between facts and instructions. Combined with the skills system (procedural memory) and session search, it creates a living knowledge base that makes each session more productive than the last.
To go further in mastering Hermes Agent, explore our previous articles:
- Introduction and Installation
- Configuring Models and Providers
- Available Tools
- Mastering the CLI
- Sessions and Context
- Context Files