📑 Table des matières

Fine-tuning vs RAG vs prompting : quelle approche choisir ?

LLM & Modèles 🟡 Intermédiaire ⏱️ 16 min de lecture 📅 2026-02-24

When facing an AI project, the same question always arises: should you fine-tune a model, implement RAG, or simply improve your prompting? The answer isn't "it depends"—it's "here's how to decide." This guide provides a clear decision tree, including the costs, complexity, and expected results of each approach.

Whether you're a developer, product manager, or entrepreneur, you'll know exactly which method to choose for your use case.


🌳 The Decision Tree: Where to Start?

Before diving into technical details, here’s the fundamental question:

Does the base model already provide decent results with a good prompt?

  • Yes, but not precise enough → Improve your prompting (Section 1)
  • No, it lacks specific knowledge → Implement RAG (Section 2)
  • No, it doesn’t understand the style/format/domain → Consider fine-tuning (Section 3)
                    ┌─────────────────────┐
                    │   Your use case     │
                    └──────────┬──────────┘
                               │
                    ┌──────────▼──────────┐
                    │ Does the base model │
                    │ understand the task?│
                    └──────────┬──────────┘
                          ┌────┴────┐
                         YES       NO
                          │         │
                 ┌────────▼───┐ ┌──▼─────────────┐
                 │ Results    │ │ Missing        │
                 │ sufficient?│ │ knowledge?     │
                 └─────┬──────┘ └──┬──────────────┘
                  ┌────┴────┐  ┌───┴───┐
                 YES       NO  YES    NO
                  │         │   │      │
              ┌───▼──┐ ┌───▼───▼┐ ┌───▼────────┐
              │ STOP │ │  RAG   │ │ FINE-TUNING│
              │      │ │        │ │            │
              └──────┘ └────────┘ └────────────┘

The Golden Rule: ALWAYS Start with Prompting

Advanced prompting is free (in terms of development), instantaneous, and often sufficient. Don’t skip this step.

Approach Start here if...
Prompting Always. This is your starting point.
RAG The model needs information it doesn’t have (internal docs, recent data)
Fine-tuning The model must adopt a very specific style/behavior at scale

🎯 Advanced Prompting: The Art of Asking Well

Why Prompting Is Underrated

80% of projects that think they need fine-tuning actually need a better prompt. Advanced prompting is far more than "ask your question clearly"—it’s an engineering discipline in its own right.

System Prompt: Your Foundation

The system prompt defines the model’s baseline behavior. It’s the most powerful and underused lever.

# ❌ Weak system prompt
You are a helpful assistant.

# ✅ Structured system prompt
You are an expert in French tax law specializing in SMEs.

## Your Role
- Answer tax-related questions for French SMEs
- Cite relevant legal articles (CGI, BOFiP)
- Flag when a question is outside your scope

## Response Format
1. Brief answer (2-3 sentences)
2. Legal basis (law articles)
3. Key considerations
4. Recommendation (consult an expert if needed)

## Strict Rules
- NEVER invent a law article
- Say "I’m not sure" rather than guessing
- Always mention the last update date of your knowledge

Few-Shot Prompting: Learning by Example

Few-shot prompting involves providing examples of input/output pairs to guide the model:

# Task: Classify support tickets

## Examples

Ticket: "My payment was debited twice"
→ Category: billing
→ Priority: high
→ Sentiment: frustrated

Ticket: "How do I change my password?"
→ Category: account
→ Priority: low
→ Sentiment: neutral

Ticket: "Your app crashes every time I open it since the update"
→ Category: bug
→ Priority: critical
→ Sentiment: dissatisfied

## New ticket to classify

Ticket: "I can’t download my January invoice"
→

Chain-of-Thought (CoT): Making the Model Reason

# Without CoT
Q: If a train leaves at 2:30 PM and takes 2 hours and 47 minutes, what time does it arrive?
A: 5:17 PM

# With CoT
Q: If a train leaves at 2:30 PM and takes 2 hours and 47 minutes, what time does it arrive?
Think step by step before answering.

A: Let’s break it down:
- Departure: 2:30 PM
- Duration: 2h47
- 2:30 PM + 2h = 4:30 PM
- 4:30 PM + 47min = 5:17 PM
The train arrives at 5:17 PM.

Structured Output: Controlling the Format

Requesting structured output (JSON, XML) improves reliability and reduces output tokens:

# Prompt for structured extraction
prompt = """Extract the information from this invoice in JSON format.

Invoice:
ABC Company - Invoice #2024-0892
Date: 01/15/2026
Client: Martin Dupont
Amount (excl. tax): €1,500.00
VAT (20%): €300.00
Total (incl. tax): €1,800.00

Respond ONLY with the JSON, no text before/after.

Expected format:
{
  "number": "string",
  "date": "YYYY-MM-DD",
  "client": "string",
  "amount_excl_tax": number,
  "vat": number,
  "total_incl_tax": number
}"""

When Prompting Isn’t Enough

Prompting reaches its limits when:

  • The model simply lacks the necessary knowledge (internal data, post-training info)
  • You need a highly specific style at scale (thousands of queries)
  • The latency of few-shot is too high (too many examples = too many tokens)
  • Consistency across thousands of queries is insufficient

This is where RAG and fine-tuning come into play.


📚 RAG: Giving the Model Knowledge

The Principle of RAG

RAG (Retrieval-Augmented Generation) involves searching for relevant information in a database and then injecting it into the prompt before generating the response.

User question
        │
        ▼
┌───────────────┐
│   Retrieval   │ ← Searches your documents
│  (search)     │
└───────┬───────┘
        │ Relevant documents
        ▼
┌───────────────┐
│  Augmentation │ ← Adds to the prompt
│   (context)   │
└───────┬───────┘
        │ Enriched prompt
        ▼
┌───────────────┐
│  Generation   │ ← The LLM responds
│   (answer)    │
└───────────────┘

Embeddings: Turning Text into Vectors

To search efficiently, text is converted into vectors (embeddings)—numerical representations that capture meaning.

from openai import OpenAI

client = OpenAI()

# Create an embedding
response = client.embeddings.create(
    model="text-embedding-3-small",
    input="How to set up OpenClaw on a VPS?"
)
vector = response.data[0].embedding
# → [0.0123, -0.0456, 0.0789, ...] (1536 dimensions)

# Two semantically similar texts
# will have close vectors (high cosine similarity)

Vector Databases: Storing and Searching

Vector databases allow storing millions of embeddings and finding the closest ones in milliseconds:

Vector Database Type Price Ideal For
Chroma Open-source, local Free Prototyping, small projects
Pinecone Managed cloud $0.33/M vectors/month Production, scalability
Weaviate Open-source + cloud Free → paid Multimodal, flexible
pgvector PostgreSQL extension Free If you already use PostgreSQL
Qdrant Open-source + cloud Free → paid Performance, Rust-based
FAISS Meta library Free Brute-force search, large volumes

Complete RAG Pipeline

# Simplified RAG pipeline
import chromadb
from openai import OpenAI

client = OpenAI()
chroma = chromadb.Client()
collection = chroma.create_collection("my_docs")

# 1. INDEXING (one-time)
documents = [
    "OpenClaw is an autonomous AI agent...",
    "To install OpenClaw on a VPS...",
    "Configuration is done via config.yaml...",
]

for i, doc in enumerate(documents):
    # Create embedding
    resp = client.embeddings.create(
        model="text-embedding-3-small", input=doc
    )
    # Store in ChromaDB
    collection.add(
        ids=[f"doc_{i}"],
        embeddings=[resp.data[0].embedding],
        documents=[doc]
    )

# 2. RETRIEVAL (per query)
question = "How to install OpenClaw?"
q_embedding = client.embeddings.create(
    model="text-embedding-3-small", input=question
).data[0].embedding

results = collection.query(
    query_embeddings=[q_embedding],
    n_results=3  # Top 3 relevant documents
)

# 3. GENERATION (augmented response)
context = "\n\n".join(results["documents"][0])
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": f"""Answer based
        SOLELY on the provided context. If the info isn’t
        in the context, say so.

        Context:\n{context}"""},
        {"role": "user", "content": question}
    ]
)
print(response.choices[0].message.content)

Chunking: Smart Document Splitting

RAG quality heavily depends on chunking (how documents are split):

# Chunking strategies

# 1. Fixed size (simple but suboptimal)
def chunk_fixed(text, size=500, overlap=50):
    chunks = []
    for i in range(0, len(text), size - overlap):
        chunks.append(text[i:i + size])
    return chunks

# 2. By paragraphs/sections (smarter)
def chunk_by_sections(text):
    sections = text.split("\n## ")
    return [s.strip() for s in sections if len(s.strip()) > 50]

# 3. Semantic (best, more complex)
# Uses a model to detect topic shifts
Method Quality Complexity Use Case
Fixed size ⭐⭐ Very simple Prototyping
By paragraph ⭐⭐⭐ Simple Structured documents
By section/title ⭐⭐⭐⭐ Medium Documentation, articles
Semantic ⭐⭐⭐⭐⭐ Complex Production, max quality

When to Use RAG

RAG is ideal when:
- Your data changes frequently (news, product docs)
- You need to cite sources (traceability)
- Your documents are too large for the context
- You want to keep the base model (no fine-tuning)

RAG is NOT ideal when:
- The issue is response style, not knowledge
- You need ultra-low latency
- Your data is simple and fits in the prompt


🔧 Fine-Tuning: Customizing the Model

What Is Fine-Tuning?

Fine-tuning involves re-training an existing model on your own data to adopt specific behaviors. It’s like giving the model private lessons.

Base model (GPT-4o, Claude, Llama)
        │
        ▼ + your training data
┌───────────────────┐
│   Fine-tuning     │
│  (re-training)    │
└───────┬───────────┘
        │
        ▼
Customized model
(understands your domain/style)

Required Data

Fine-tuning requires example pairs (input → expected output):

{"messages": [{"role": "system", "content": "You are XYZ Shop’s assistant."}, {"role": "user", "content": "What are your delivery times?"}, {"role": "assistant", "content": "At XYZ, we deliver within 24-48 hours in mainland France, and 3-5 days for overseas territories. Express delivery (24h) is free for orders over €50!"}]}
{"messages": [{"role": "system", "content": "You are XYZ Shop’s assistant."}, {"role": "user", "content": "I want to return an item"}, {"role": "assistant", "content": "Of course! At XYZ, you have 30 days to return an item. Go to Your Account > Orders > Return. Returns are free with the prepaid label provided."}]}

How Much Data Is Needed?

Objective Data Required Quality
Light customization (tone, style) 50–100 examples Medium
Domain adaptation (e.g., legal) 100–500 examples High
Complex behavior (e.g., multi-step) 500+ examples Very high

Fine-Tuning Process

  1. Prepare data: Clean, format in JSONL.
  2. Upload to platform (OpenAI, Hugging Face, etc.).
  3. Launch training (cost: ~$0.03 per 1K tokens for GPT-3.5).
  4. Evaluate: Test on unseen data.
  5. Deploy: Use your custom model.

When to Fine-Tune

Fine-tuning is ideal when:
- You need a consistent style/behavior at scale
- The model must specialize in a niche domain
- Prompting/RAG can’t achieve the desired quality
- You have high-quality training data

Fine-tuning is NOT ideal when:
- Your data changes frequently (retraining needed)
- You lack technical resources (data prep, evaluation)
- The use case is too broad (better to prompt)
- Cost is a blocker (~$100–$1,000 per training run)


📊 Comparison Table: Prompting vs RAG vs Fine-Tuning

Criteria Prompting RAG Fine-Tuning
Cost Free Low to medium High
Implementation Time Instant Hours to days Days to weeks
Data Needed None Documents Labeled examples
Latency Low Medium Low (after training)
Maintenance None Update documents Retrain model
Best For Quick tests, style Knowledge gaps Domain specialization
Scalability High Medium High
Consistency Medium High Very high

🚀 Decision Flowchart (Summary)

  1. Start with prompting (always!).
  2. If results are almost there but inconsistent → refine prompts (CoT, few-shot, system prompt).
  3. If the model lacks knowledge → implement RAG.
  4. If you need domain-specific behavior at scalefine-tune.
  5. If data changes frequently → prefer RAG over fine-tuning.

Pro Tips

  • Combine approaches: Use RAG + fine-tuning for complex cases.
  • Monitor costs: RAG scales with document size; fine-tuning with training runs.
  • Iterate: Start small (prompting), then scale up (RAG → fine-tuning).

Final Rule: The best approach is the simplest one that meets your needs. 90% of projects can succeed with prompting + RAG alone.