📑 Table des matières

Comment entraîner son avatar IA avec ses propres données

Avatars IA 🔴 Avancé ⏱️ 17 min de lecture 📅 2026-02-24

How to train your AI avatar with your own data? This is the key question to transform a generic assistant into a true digital twin that speaks, thinks, and reacts like you. An AI avatar fed with your emails, documents, notes, and conversations becomes a remarkably powerful tool—provided you know how to go about it.

In this advanced guide, we explore the three main approaches—prompting, RAG, and fine-tuning—with code, comparative tables, and a complete example of training on 500 documents.

🎯 Why a Generic Avatar Isn’t Enough

A language model like Anthropic’s Claude excels at general knowledge. But ask it about your internal billing process, industry jargon, or customer preferences, and it will invent a plausible but false answer.

The fundamental problem: LLMs don’t know YOUR data. They were trained on the internet, not your business.

A truly useful AI avatar must:

  • Know your context: history, clients, products, processes
  • Adopt your tone: formal, casual, technical—your unique style
  • Respond accurately: cite your documents, don’t hallucinate
  • Evolve: integrate new data over time

The good news? Three approaches can achieve this, depending on your budget and technical skills.

🔀 The 3 Approaches: Prompting, RAG, and Fine-Tuning

Before diving into details, here’s an overview of the three strategies to personalize an AI avatar.

Advanced Prompting (Easy Level)

You inject your data directly into the prompt (system message). The model uses this context to respond. No additional infrastructure required.

RAG — Retrieval-Augmented Generation (Intermediate Level)

Your documents are chunked, vectorized, and stored in a vector database. For each question, relevant passages are retrieved and injected into the prompt. The model responds based on these extracts.

Fine-Tuning (Advanced Level)

You (partially) retrain the model on your data. The knowledge is embedded in the network’s weights. More expensive, but the model "knows" natively.

📊 Comparative Table of the 3 Approaches

Criteria Advanced Prompting RAG Fine-Tuning
Difficulty ⭐ Easy ⭐⭐ Intermediate ⭐⭐⭐ Advanced
Initial Cost ~$0 $50–$200 $500–$5,000
Recurring Cost Tokens (long context) Vector DB hosting Periodic retraining
Data Volume <50 pages 50 to 100,000+ docs 1,000+ structured examples
Response Quality Good if context is sufficient Very good Excellent in domain
Data Freshness Immediate (copy-paste) Near real-time Requires retraining
Hallucinations Medium risk Low (sources cited) Low but possible
Maintenance Manual Automatable Heavy
Latency Low Medium (+retrieval) Low
Ideal For Prototyping, small volumes Production, evolving docs Specific tone/style, niche domain

💡 Advanced Prompting: Techniques and Examples

Advanced prompting is the most accessible entry point. Three techniques stand out.

Few-Shot Prompting

Provide examples of ideal conversations in the system prompt:

You are the AI avatar of Marie Dupont, a digital transformation consultant.

Here are examples of how Marie responds:

Client: "How much does a digital audit cost?"
Marie: "Our audits start at €3,500 (excl. VAT) for SMEs with fewer than 50 employees.
The deliverable includes a 30-page report with a prioritized action plan.
We can discuss this in a free 30-minute call—should I send you my Calendly link?"

Client: "What tools do you use?"
Marie: "My main stack: Notion for project management, Miro for workshops,
and Power BI for dashboards. For AI, I recommend Claude for writing and Midjourney for visuals."

Chain-of-Thought (CoT)

Ask the model to reason step-by-step before answering:

When a client asks a complex question, reason as follows:
1. Identify the real need behind the question
2. Search the provided context for relevant information
3. Structure the answer with concrete figures
4. Propose a next step (call, quote, resource)

Complete System Prompt Template

# IDENTITY
You are the AI avatar of [NAME], [TITLE] at [COMPANY].

# STYLE
- Tone: professional yet approachable
- Length: concise answers (3–5 sentences), elaborate if requested
- Signature: always end with a question or CTA

# KNOWLEDGE (injected)
[Paste your FAQs, pricing, processes here—up to ~30 pages]

# RULES
- Never invent numbers. If unsure, say so.
- Always cite the source when using a document.
- Redirect to a human for: legal, medical, serious complaints.

Limitations: The context window is limited (200K tokens for Claude, ~150,000 words). Beyond this, switch to RAG.

🔍 RAG in Detail: The Complete Pipeline

RAG is the most popular approach in production. Here’s the full pipeline with functional Python code.

Pipeline Architecture

Documents → Chunking → Embeddings → Vector Store
                                         ↓
User question → Embedding → Similarity search → Top-K chunks
                                                              ↓
                                              Prompt + chunks → LLM → Response

Step 1: Document Chunking

Split your documents into 500–1,000-token chunks with overlap:

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=800,
    chunk_overlap=200,
    separators=["\n\n", "\n", ". ", " "]
)

# Load your documents
documents = []
for filepath in Path("./my_docs").glob("**/*.md"):
    text = filepath.read_text()
    chunks = splitter.split_text(text)
    for i, chunk in enumerate(chunks):
        documents.append({
            "text": chunk,
            "source": str(filepath),
            "chunk_id": f"{filepath.stem}_{i}"
        })

print(f"{len(documents)} chunks created")

Step 2: Generate Embeddings

Convert each chunk into a numerical vector:

import openai

client = openai.OpenAI(
    base_url="https://openrouter.ai/api/v1",  # via OpenRouter
    api_key="sk-or-..."
)

def get_embedding(text: str) -> list[float]:
    response = client.embeddings.create(
        model="openai/text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

# Vectorize all chunks
for doc in documents:
    doc["embedding"] = get_embedding(doc["text"])

You can use OpenRouter to access different embedding models via a single API.

Step 3: Store in a Vector Database

import chromadb

client_db = chromadb.PersistentClient(path="./avatar_vectordb")
collection = client_db.get_or_create_collection(
    name="my_documents",
    metadata={"hnsw:space": "cosine"}
)

# Insert chunks
collection.add(
    ids=[doc["chunk_id"] for doc in documents],
    embeddings=[doc["embedding"] for doc in documents],
    documents=[doc["text"] for doc in documents],
    metadatas=[{"source": doc["source"]} for doc in documents]
)

print(f"{collection.count()} vectors stored")

Step 4: Retrieval and Generation

def ask_avatar(question: str, n_results: int = 5) -> str:
    # 1. Search for relevant chunks
    results = collection.query(
        query_embeddings=[get_embedding(question)],
        n_results=n_results
    )

    # 2. Build context
    context = "\n\n---\n\n".join([
        f"[Source: {meta['source']}]\n{doc}"
        for doc, meta in zip(results["documents"][0], results["metadatas"][0])
    ])

    # 3. Generate response
    response = client.chat.completions.create(
        model="anthropic/claude-sonnet-4",
        messages=[
            {"role": "system", "content": f"""You are the AI avatar of a consultant.
Respond ONLY based on the provided context.
If the information isn’t in the context, say so clearly.

CONTEXT:
{context}"""},
            {"role": "user", "content": question}
        ]
    )

    return response.choices[0].message.content

# Test
print(ask_avatar("What are our audit pricing rates?"))

Key RAG Optimizations

Technique Impact Complexity
Hybrid search (BM25 + vectors) +15–20% relevance Medium
Reranking (Cohere, cross-encoder) +10–15% relevance Low
Semantic chunking Better coherence Medium
Metadata filtering Targeted responses Low
Query expansion Better recall Low
Parent-child chunks Richer context Medium

🧬 Fine-Tuning: When and How

Fine-tuning modifies the model’s weights. It’s the heaviest but most powerful approach for style and tone.

When Fine-Tuning Is Justified

  • Your avatar needs a highly specific style (technical jargon, unique tone)
  • You have 1,000+ examples of ideal conversations
  • RAG isn’t enough to capture complex reasoning patterns
  • You want to reduce latency (no retrieval needed)

Prepare a JSONL Dataset

{"messages": [{"role": "system", "content": "You are the avatar of Dr. Martin, a cardiologist."}, {"role": "user", "content": "What are the symptoms of arrhythmia?"}, {"role": "assistant", "content": "Classic signs include palpitations, irregular heartbeat sensations, sometimes dizziness or shortness of breath. I systematically recommend a 24-hour ECG (Holter) to confirm the issue. Book an appointment via Doctolib to discuss."}]}
{"messages": [{"role": "system", "content": "You are the avatar of Dr. Martin, a cardiologist."}, {"role": "user", "content": "How much does a cardiac checkup cost?"}, {"role": "assistant", "content": "A full cardiac checkup in my office includes: consultation (€50, reimbursed), ECG (€35), and echocardiogram if needed (€95). Most costs are covered by national insurance + private insurance. My assistant can detail any out-of-pocket expenses."}]}

Dataset Preparation Script

import json
from pathlib import Path

def prepare_finetune_dataset(conversations_dir: str, output: str):
    """Converts conversations to JSONL format for fine-tuning."""
    dataset = []

    for file in Path(conversations_dir).glob("*.json"):
        conv = json.loads(file.read_text())

        # Validate structure
        if not all(m.get("role") in ("system", "user", "assistant")
                   for m in conv["messages"]):
            print(f"⚠️ Invalid format: {file}")
            continue

        # Check for at least 1 user/assistant exchange
        roles = [m["role"] for m in conv["messages"]]
        if "user" not in roles or "assistant" not in roles:
            continue

        dataset.append(conv)

    # Save
    with open(output, "w") as f:
        for item in dataset:
            f.write(json.dumps(item, ensure_ascii=False) + "\n")

    print(f"✅ {len(dataset)} conversations exported → {output}")

prepare_finetune_dataset("./conversations/", "avatar_finetune.jsonl")

Estimated Fine-Tuning Costs

Model Training Cost Inference Cost Technique
GPT-4o mini fine-tuned ~$3 / 1M tokens $0.30 / 1M tokens Full fine-tune
Llama 3.1 8B (LoRA) ~$20 on RunPod Self-hosted LoRA / QLoRA
Mistral 7B (LoRA) ~$15 on