How to train your AI avatar with your own data? This is the key question to transform a generic assistant into a true digital twin that speaks, thinks, and reacts like you. An AI avatar fed with your emails, documents, notes, and conversations becomes a remarkably powerful tool—provided you know how to go about it.
In this advanced guide, we explore the three main approaches—prompting, RAG, and fine-tuning—with code, comparative tables, and a complete example of training on 500 documents.
🎯 Why a Generic Avatar Isn’t Enough
A language model like Anthropic’s Claude excels at general knowledge. But ask it about your internal billing process, industry jargon, or customer preferences, and it will invent a plausible but false answer.
The fundamental problem: LLMs don’t know YOUR data. They were trained on the internet, not your business.
A truly useful AI avatar must:
- Know your context: history, clients, products, processes
- Adopt your tone: formal, casual, technical—your unique style
- Respond accurately: cite your documents, don’t hallucinate
- Evolve: integrate new data over time
The good news? Three approaches can achieve this, depending on your budget and technical skills.
🔀 The 3 Approaches: Prompting, RAG, and Fine-Tuning
Before diving into details, here’s an overview of the three strategies to personalize an AI avatar.
Advanced Prompting (Easy Level)
You inject your data directly into the prompt (system message). The model uses this context to respond. No additional infrastructure required.
RAG — Retrieval-Augmented Generation (Intermediate Level)
Your documents are chunked, vectorized, and stored in a vector database. For each question, relevant passages are retrieved and injected into the prompt. The model responds based on these extracts.
Fine-Tuning (Advanced Level)
You (partially) retrain the model on your data. The knowledge is embedded in the network’s weights. More expensive, but the model "knows" natively.
📊 Comparative Table of the 3 Approaches
| Criteria | Advanced Prompting | RAG | Fine-Tuning |
|---|---|---|---|
| Difficulty | ⭐ Easy | ⭐⭐ Intermediate | ⭐⭐⭐ Advanced |
| Initial Cost | ~$0 | $50–$200 | $500–$5,000 |
| Recurring Cost | Tokens (long context) | Vector DB hosting | Periodic retraining |
| Data Volume | <50 pages | 50 to 100,000+ docs | 1,000+ structured examples |
| Response Quality | Good if context is sufficient | Very good | Excellent in domain |
| Data Freshness | Immediate (copy-paste) | Near real-time | Requires retraining |
| Hallucinations | Medium risk | Low (sources cited) | Low but possible |
| Maintenance | Manual | Automatable | Heavy |
| Latency | Low | Medium (+retrieval) | Low |
| Ideal For | Prototyping, small volumes | Production, evolving docs | Specific tone/style, niche domain |
💡 Advanced Prompting: Techniques and Examples
Advanced prompting is the most accessible entry point. Three techniques stand out.
Few-Shot Prompting
Provide examples of ideal conversations in the system prompt:
You are the AI avatar of Marie Dupont, a digital transformation consultant.
Here are examples of how Marie responds:
Client: "How much does a digital audit cost?"
Marie: "Our audits start at €3,500 (excl. VAT) for SMEs with fewer than 50 employees.
The deliverable includes a 30-page report with a prioritized action plan.
We can discuss this in a free 30-minute call—should I send you my Calendly link?"
Client: "What tools do you use?"
Marie: "My main stack: Notion for project management, Miro for workshops,
and Power BI for dashboards. For AI, I recommend Claude for writing and Midjourney for visuals."
Chain-of-Thought (CoT)
Ask the model to reason step-by-step before answering:
When a client asks a complex question, reason as follows:
1. Identify the real need behind the question
2. Search the provided context for relevant information
3. Structure the answer with concrete figures
4. Propose a next step (call, quote, resource)
Complete System Prompt Template
# IDENTITY
You are the AI avatar of [NAME], [TITLE] at [COMPANY].
# STYLE
- Tone: professional yet approachable
- Length: concise answers (3–5 sentences), elaborate if requested
- Signature: always end with a question or CTA
# KNOWLEDGE (injected)
[Paste your FAQs, pricing, processes here—up to ~30 pages]
# RULES
- Never invent numbers. If unsure, say so.
- Always cite the source when using a document.
- Redirect to a human for: legal, medical, serious complaints.
Limitations: The context window is limited (200K tokens for Claude, ~150,000 words). Beyond this, switch to RAG.
🔍 RAG in Detail: The Complete Pipeline
RAG is the most popular approach in production. Here’s the full pipeline with functional Python code.
Pipeline Architecture
Documents → Chunking → Embeddings → Vector Store
↓
User question → Embedding → Similarity search → Top-K chunks
↓
Prompt + chunks → LLM → Response
Step 1: Document Chunking
Split your documents into 500–1,000-token chunks with overlap:
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=800,
chunk_overlap=200,
separators=["\n\n", "\n", ". ", " "]
)
# Load your documents
documents = []
for filepath in Path("./my_docs").glob("**/*.md"):
text = filepath.read_text()
chunks = splitter.split_text(text)
for i, chunk in enumerate(chunks):
documents.append({
"text": chunk,
"source": str(filepath),
"chunk_id": f"{filepath.stem}_{i}"
})
print(f"{len(documents)} chunks created")
Step 2: Generate Embeddings
Convert each chunk into a numerical vector:
import openai
client = openai.OpenAI(
base_url="https://openrouter.ai/api/v1", # via OpenRouter
api_key="sk-or-..."
)
def get_embedding(text: str) -> list[float]:
response = client.embeddings.create(
model="openai/text-embedding-3-small",
input=text
)
return response.data[0].embedding
# Vectorize all chunks
for doc in documents:
doc["embedding"] = get_embedding(doc["text"])
You can use OpenRouter to access different embedding models via a single API.
Step 3: Store in a Vector Database
import chromadb
client_db = chromadb.PersistentClient(path="./avatar_vectordb")
collection = client_db.get_or_create_collection(
name="my_documents",
metadata={"hnsw:space": "cosine"}
)
# Insert chunks
collection.add(
ids=[doc["chunk_id"] for doc in documents],
embeddings=[doc["embedding"] for doc in documents],
documents=[doc["text"] for doc in documents],
metadatas=[{"source": doc["source"]} for doc in documents]
)
print(f"{collection.count()} vectors stored")
Step 4: Retrieval and Generation
def ask_avatar(question: str, n_results: int = 5) -> str:
# 1. Search for relevant chunks
results = collection.query(
query_embeddings=[get_embedding(question)],
n_results=n_results
)
# 2. Build context
context = "\n\n---\n\n".join([
f"[Source: {meta['source']}]\n{doc}"
for doc, meta in zip(results["documents"][0], results["metadatas"][0])
])
# 3. Generate response
response = client.chat.completions.create(
model="anthropic/claude-sonnet-4",
messages=[
{"role": "system", "content": f"""You are the AI avatar of a consultant.
Respond ONLY based on the provided context.
If the information isn’t in the context, say so clearly.
CONTEXT:
{context}"""},
{"role": "user", "content": question}
]
)
return response.choices[0].message.content
# Test
print(ask_avatar("What are our audit pricing rates?"))
Key RAG Optimizations
| Technique | Impact | Complexity |
|---|---|---|
| Hybrid search (BM25 + vectors) | +15–20% relevance | Medium |
| Reranking (Cohere, cross-encoder) | +10–15% relevance | Low |
| Semantic chunking | Better coherence | Medium |
| Metadata filtering | Targeted responses | Low |
| Query expansion | Better recall | Low |
| Parent-child chunks | Richer context | Medium |
🧬 Fine-Tuning: When and How
Fine-tuning modifies the model’s weights. It’s the heaviest but most powerful approach for style and tone.
When Fine-Tuning Is Justified
- Your avatar needs a highly specific style (technical jargon, unique tone)
- You have 1,000+ examples of ideal conversations
- RAG isn’t enough to capture complex reasoning patterns
- You want to reduce latency (no retrieval needed)
Prepare a JSONL Dataset
{"messages": [{"role": "system", "content": "You are the avatar of Dr. Martin, a cardiologist."}, {"role": "user", "content": "What are the symptoms of arrhythmia?"}, {"role": "assistant", "content": "Classic signs include palpitations, irregular heartbeat sensations, sometimes dizziness or shortness of breath. I systematically recommend a 24-hour ECG (Holter) to confirm the issue. Book an appointment via Doctolib to discuss."}]}
{"messages": [{"role": "system", "content": "You are the avatar of Dr. Martin, a cardiologist."}, {"role": "user", "content": "How much does a cardiac checkup cost?"}, {"role": "assistant", "content": "A full cardiac checkup in my office includes: consultation (€50, reimbursed), ECG (€35), and echocardiogram if needed (€95). Most costs are covered by national insurance + private insurance. My assistant can detail any out-of-pocket expenses."}]}
Dataset Preparation Script
import json
from pathlib import Path
def prepare_finetune_dataset(conversations_dir: str, output: str):
"""Converts conversations to JSONL format for fine-tuning."""
dataset = []
for file in Path(conversations_dir).glob("*.json"):
conv = json.loads(file.read_text())
# Validate structure
if not all(m.get("role") in ("system", "user", "assistant")
for m in conv["messages"]):
print(f"⚠️ Invalid format: {file}")
continue
# Check for at least 1 user/assistant exchange
roles = [m["role"] for m in conv["messages"]]
if "user" not in roles or "assistant" not in roles:
continue
dataset.append(conv)
# Save
with open(output, "w") as f:
for item in dataset:
f.write(json.dumps(item, ensure_ascii=False) + "\n")
print(f"✅ {len(dataset)} conversations exported → {output}")
prepare_finetune_dataset("./conversations/", "avatar_finetune.jsonl")
Estimated Fine-Tuning Costs
| Model | Training Cost | Inference Cost | Technique |
|---|---|---|---|
| GPT-4o mini fine-tuned | ~$3 / 1M tokens | $0.30 / 1M tokens | Full fine-tune |
| Llama 3.1 8B (LoRA) | ~$20 on RunPod | Self-hosted | LoRA / QLoRA |
| Mistral 7B (LoRA) | ~$15 on |