🧠 The problem: why LLMs forget everything
To understand the problem, you have to understand how an LLM works. When you send a message to Claude or GPT, here's what happens:
- Your message is converted into tokens (chunks of text)
- The model processes these tokens within a context window (e.g., 200K tokens for Claude)
- It generates a response
- Everything is discarded. The next conversation starts from scratch.
There is no persistent "brain". No hard drive. No database. The model only has what is sent to it in the prompt.
💡 Analogy: Imagine an expert to whom you show a file at every meeting, then take the file back as you leave. Tomorrow, they will have no memory of your exchange.
This is why memory is not a "nice to have" — it is the feature that turns a chatbot into a true personal assistant.
🗂️ The three types of AI memory
Any AI memory architecture can be broken down into three levels:
| Type | Duration | Mechanism | Example |
|---|---|---|---|
| Short-term | A conversation | LLM context window | The last 10 messages exchanged |
| Medium-term | A session | Summary/state injected into the prompt | "The user is working on an e-commerce site" |
| Long-term | Permanent | Files, database, vector DB | Preferences, decisions, project history |
Short-term memory: the context
This is the "free" memory. When you chat with a chatbot, previous messages are sent back to the model with every exchange. But this memory has a physical limit: the context window.
- Claude 3.5 Sonnet: 200K tokens (~150,000 words)
- GPT-4o: 128K tokens (~96,000 words)
- Gemini 2.0: 1M tokens (~750,000 words)
That seems huge, but in practice, filling the context is expensive (in billed tokens) and slows down responses.
Medium-term memory: the session summary
Rather than sending the entire conversation, it can be summarized. With each exchange, a condensed system updates a state:
Session state: The user's name is Nicolas. He is building a tech blog
with Flask. He prefers concise answers. We decided to use
SQLite for the database.
This summary is injected at the beginning of each prompt. It preserves the essential context without exploding the window.
Long-term memory: persistence
This is where it gets interesting. Long-term memory survives sessions. It is stored outside the model — in files, databases, or vector systems.
It is this memory that transforms your avatar from a tool into a true digital alter ego.
📁 File-based memory: simple and efficient
The most straightforward approach to giving memory to an AI is to use text files. This is exactly what OpenClaw does with its native system.
MEMORY.md: consolidated memory
The MEMORY.md file at the root of the workspace contains curated and permanent information. It is structured into several sections: the user's identity (name, role, tech stack), their communication and coding preferences, the list of active projects with their status, and a history of important dated decisions. This file acts as an always up-to-date ID card, automatically read at the start of each session.
memory/YYYY-MM-DD.md: daily notes
For day-to-day details, OpenClaw uses dated files in the memory/ folder. Each daily note follows a structured format in three parts: the work done (concrete actions of the day), the decisions made (technical or strategic choices), and things to remember (reminders, constraints, future requests). This daily approach makes it possible to precisely track the evolution of projects without polluting the main file.
This approach has the advantage of being:
- Readable by a human (it's Markdown)
- Versionable (Git-compatible)
- Inexpensive (no external service)
- Reliable (a file does not "hallucinate")
🔍 RAG : Retrieval Augmented Generation
When memory becomes large (hundreds of pages, thousands of notes), reading all the files is no longer viable. This is where RAG comes in.
The principle
RAG works in three steps:
- Indexing: Your documents are split into chunks and converted into embeddings — numerical vectors that represent the meaning of the text
- Retrieval: When the AI needs information, it converts the question into an embedding and searches for the most similar chunks in the database
- Generation: The relevant chunks are injected into the prompt, and the LLM generates its answer with this context
In practice, when a question is asked (for example, "Which database did we choose for AI-master.dev?"), it is first transformed into a numerical vector. This vector is compared to all those stored in the vector database to find the most semantically close chunks. These relevant excerpts are then injected into the LLM's prompt, which can generate an accurate and contextualized answer.
Embeddings in 30 seconds
An embedding is a numerical representation of the meaning of a text. Two sentences that talk about the same subject will have close embeddings in the vector space, even if they use different words. For example, "Le chat dort sur le canapé" and "Le félin se repose sur le sofa" will have a cosine similarity score close to 1.0, indicating that they share the same meaning. It is this property that enables semantic search: finding information by meaning, not by exact words.
🗄️ Vector databases: the comparison
To store and query these embeddings, you need a vector database. Here are the three main ones:
| Criteria | ChromaDB | Pinecone | Weaviate |
|---|---|---|---|
| Type | Open-source, local | Managed cloud | Open-source, self-hosted or cloud |
| Installation | pip install chromadb |
None (SaaS) | Docker or cloud |
| Cost | Free | Freemium (paid at scale) | Free (self-hosted) / paid (cloud) |
| Scalability | Small-medium scale | Large scale | Large scale |
| Ease of use | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| Ideal for | Personal projects, prototypes | SaaS production | Complex projects, multi-modal |
| Persistence | Local file | Cloud | Docker volume or cloud |
| Latency | < 10ms (local) | ~50ms (network) | Variable |
ChromaDB: the pragmatic choice
For a personal AI avatar, ChromaDB is often the best choice. It installs with a single Python command and runs entirely locally. Its principle is simple: you create a persistent client that stores data in a folder, then you create a collection (e.g., "avatar_memory") configured with a cosine distance. You add documents as text, each associated with a unique identifier and metadata (type, date). The search is then performed by querying the collection with natural language text, and ChromaDB returns the most semantically similar documents.
Pinecone: for production
If your avatar needs to handle millions of memories or be used by multiple users simultaneously, Pinecone is the cloud choice. Pinecone works as a managed vector database: you send vectors (generated by your embedding model) associated with textual metadata via an API call, and Pinecone takes care of indexing, storage, and large-scale search. The main advantage is that there is no infrastructure to manage — just send the data and query the index.
🏗️ Structured memory vs raw memory
Two philosophies clash when it comes to organizing long-term memory:
Raw memory (append-only)
Everything is stored as-is, in chronological order. Every interaction, every decision, every note is appended sequentially.
Advantages:
- Simple to implement
- Nothing is lost
- Complete history
Disadvantages:
- Grows indefinitely
- Redundancies
- Slow search without indexing
Structured memory (curated)
Memory is organized by categories, duplicates are merged, and obsolete data is archived. The file is organized into clear sections: user profile (name, role, preferences as tags), active projects (with stack, URL, and status), and business rules (recurring processes to follow). This tabular and hierarchical structure allows for fast access to information without having to go through a chronological history.
Advantages:
- Compact and readable
- Fast access
- Easy to audit
Disadvantages:
- Requires curation logic
- Risk of losing nuances
The right approach: hybrid
In practice, the best strategy combines both:
- MEMORY.md → structured memory, curated manually or by AI
- memory/YYYY-MM-DD.md → daily raw memory
- Vector DB → semantic index for searching the volume
⚙️ How OpenClaw manages memory
OpenClaw features a native memory system designed for AI avatars. Here is how it works in practice.
The startup protocol
Each session begins by reading key files, configured in AGENTS.md. The protocol follows a specific order: first reading SOUL.md to load the avatar's identity, then USER.md to understand the user, followed by daily notes (journal and monitoring) for recent context, and finally MEMORY.md for long-term memory. This sequential loading ensures that the AI always has the necessary context without overloading the window.
Native memory tools
OpenClaw exposes dedicated tools to interact with memory:
| Tool | Function | Usage |
|---|---|---|
memory_search |
Semantic search in memory files | Retrieve a past decision |
memory_get |
Direct reading of a memory file | Load MEMORY.md or a daily note |
read |
Reading of any file | Access configuration files |
write / edit |
File writing | Update memory |
Concrete usage example
When you ask your avatar: "What did we decide for the database?", here is what happens:
- The AI first checks
MEMORY.md(already in context) - If the info isn't there, it uses
memory_searchto look through the daily notes - It finds the December 15 entry in
memory/2024-12-15.md - It replies: "We chose SQLite on December 15, for its simplicity and because AI-master.dev doesn't need concurrent queries."
The whole process is seamless — the AI knows where to look and how to synthesize.
🛠️ Complete example: setting up your avatar's memory
Let's put it all together. Here is how to set up a complete memory system for your AI avatar with OpenClaw.
Step 1: Create the structure
Start by creating a memory folder at the root of your OpenClaw workspace, then initialize the main MEMORY.md file. This file must contain a basic structure with pre-filled sections: user identity (name, role, language), preferences (response style, code format), active projects (to be filled in by the AI as you go), important decisions (log of technical choices), and recurring rules (things to always apply). This empty structure serves as a skeleton that the avatar will enrich over the course of sessions.
Step 2: Configure the protocol in AGENTS.md
Refer to the guide on SOUL and AGENTS configuration to set up automatic memory loading. In your AGENTS.md file, add a session protocol that specifies the order in which files are read: SOUL.md first, then USER.md, followed by today's and yesterday's daily notes, and finally MEMORY.md. This order ensures a progressive loading of context, from the most specific (personality) to the most global (long-term memory).
Step 3: Instruct the AI to maintain its memory
Add clear memory management instructions to your SOUL.md (the file that defines the avatar's personality). The AI must know that after every significant work session, it must update the current day's daily note. When an important decision is made, it must add it to MEMORY.md. The same goes for expressed user preferences and project developments. The golden rule to instill in it: if it doesn't write it down, it will forget it.
Step 4: Add a vector database (optional)
For projects with a lot of history, add ChromaDB. The indexing process involves creating a persistent client linked to a local folder, then automatically scanning all Markdown files in the memory/ folder. Each file is split into paragraphs of more than 50 characters, and each paragraph is inserted into the collection with a unique identifier (file name + index), the paragraph text, and metadata (source file and date). This script can be executed periodically via an OpenClaw cron to keep the index up to date.
⚠️ The pitfalls of long-term memory
Giving memory to an AI is not without risks. Here are the most common pitfalls.
1. Memory that grows indefinitely
If you store everything without ever cleaning up, your memory will become a swamp:
- Context files become too long for the LLM window
- Token cost explodes (each session loads more memory)
- Outdated information pollutes responses
2. Hallucinations on old memories
When an LLM reads an old and ambiguous memory, it can misinterpret it. For example:
Memory: "In March, we discussed migrating to PostgreSQL"
The AI might conclude that you have migrated, whereas you only discussed doing it. This is a hallucination fed by the memory itself.
Solution: Be precise in your notes. Write "Decided to..." or "Discussed..." — never leave ambiguity.
3. Memory conflicts
When information is updated in MEMORY.md but an old daily note contains the old version, the AI can get confused. Which source is authoritative?
Solution: Establish a clear hierarchy:
1. MEMORY.md → source of truth (the most recent)
2. Recent notes → fresh context
3. Old files → history, only consulted via search
4. Sensitive data leaks
Your memory can contain sensitive information: API keys, passwords mentioned in passing, personal data. If the memory is in a public Git repo or a cloud service...
Solution:
- Never store secrets in memory
- Use .gitignore for sensitive files
- Host on a server you control (a VPS dédié for example, with 20% off)
🧹 Cleaning and archiving strategies
To prevent memory from becoming unmanageable, here are some proven strategies.
Quarterly archiving
The archiving strategy consists of creating an archive folder per quarter (for example memory/archives/2025-Q1/), then automatically moving all daily notes older than 90 days to this folder. This process can be automated with a scheduled script that identifies.md` files older than 90 days in the main folder and moves them to the corresponding archive. This keeps the working folder clean while keeping the history accessible.
Automatic curation of MEMORY.md
You can ask the AI itself to clean its memory. The curation process comes down to a monthly prompt that asks the avatar to read MEMORY.md in its entirety, delete obsolete information, merge duplicates, move completed projects to an "Archives" section, verify the accuracy of each entry, and keep the file under 200 lines. The AI cleans up its own memory.
The 3-tier rule
| Level | Content | Retention | Action |
|---|---|---|---|
| Hot | MEMORY.md + last 7 days | Permanent (curated) | Read at each session |
| Warm | Notes from the last 90 days | 90 days | Accessible via search |
| Cold | Quarterly archives | 1 year | Indexed in vector DB only |
📊 Which memory solution for your use case?
The choice of strategy depends on your usage:
| Use case | Recommended solution | Complexity | Cost |
|---|---|---|---|
| Personal assistant | MEMORY.md + daily notes | ⭐ | Free |
| Pro AI avatar | Files + ChromaDB | ⭐⭐ | Free |
| Multi-user chatbot | Pinecone + Redis | ⭐⭐⭐⭐ | ~$70/month |
| Autonomous agent | OpenClaw natif + ChromaDB | ⭐⭐ | Free |
| Enterprise (100+ users) | Weaviate + PostgreSQL | ⭐⭐⭐⭐⭐ | Variable |
| Quick prototype | Simple JSON file | ⭐ | Free |
| Mobile app | Pinecone serverless | ⭐⭐⭐ | Freemium |
For most personal AI avatars, the combination of files + OpenClaw natif is more than enough. Only add complexity when the need arises.
🎯 The key takeaways
- LLMs have no persistent memory: everything is forgotten between sessions
- Three levels of memory exist: short-term (context), medium-term (summary), long-term (files/databases)
- A well-structured
MEMORY.mdfile is enough to get started - RAG with embeddings allows searching through a large volume of memories
- ChromaDB is the most pragmatic choice for a personal avatar in 2025
- Monthly curation is essential to avoid memory pollution
❌ Common mistakes
- Storing everything without sorting : raw memory grows indefinitely and eventually costs a lot in tokens and pollutes responses
- Being vague in notes : writing "we discussed PostgreSQL" instead of "decided to stick with SQLite" creates hallucinations
- Ignoring the hierarchy of sources : without a clear rule (MEMORY.md is authoritative), the AI mixes old and new
- Storing sensitive data : API keys or passwords in memory files = risk of leakage
- Adding complexity too early : jumping straight to Pinecone or Weaviate when a Markdown file is enough
❓ FAQ
Is file memory sufficient for personal use?
Yes. For an avatar used by a single person, the combination of MEMORY.md + daily notes covers 95% of needs. Only add ChromaDB when you exceed a few hundred notes.
How much does using a cloud vector database cost?
Pinecone offers a limited free tier. In production with millions of vectors, expect to pay around $70 per month in 2025. ChromaDB and Weaviate in self-hosted are free.
How can I prevent the AI from hallucinating based on old memories?
Be precise in your notes: use "Decided to" vs "Discussed", date each entry, and curate monthly. Also establish a clear hierarchy: MEMORY.md is the source of truth.
Should I version memory with Git?
Yes, it is recommended. This allows you to track the evolution of memory, revert in case of overly aggressive curation, and naturally backup.
🛒 Recommended tools
- OpenClaw — Try OpenClaw: native memory system for AI avatars (files + integrated semantic search)
- ChromaDB — Open-source local vector database, ideal for personal projects
- Pinecone — Managed cloud vector database, suitable for production and multi-user environments
- Anthropic's Claude — Try Claude: 200K context tokens, excellent for avatars with loaded memory
- OpenRouter — Try OpenRouter: compare models and their context handling
🎯 Conclusion: memory is what makes the difference
An AI avatar without memory is a tool. An AI avatar with memory is a partner.
The good news is that you don't need a complex architecture to get started. A well-maintained MEMORY.md file, daily notes, and a clear protocol in your AGENTS.md configuration are enough to radically transform the experience.
Start simple:
1. Create your MEMORY.md
2. Configure the session protocol
3. Let your avatar take notes
4. Clean up monthly
And when your memory exceeds a few hundred pages, you'll add ChromaDB or another semantic search tool. The important thing is to start now — every day without memory is a day of lost experience.
You can check out the OpenClaw source code on GitHub to see how the memory system is implemented, or discover the best tools for creating an AI avatar in 2025 to choose the stack that suits you.