Advanced Prompting That Really Makes a Difference

LLM & Modèles 🟡 Intermediate ⏱️ 14 min read 📅 2026-02-24

Advanced prompting that really makes a difference

You've been using ChatGPT, Claude or Gemini for months, but you feel like you're not getting the most out of them? You're right. The difference between an average user and an AI expert often comes down to a single word: prompting.

Prompting is the art of communicating with a language model to get exactly what you want. And in 2026, with increasingly powerful models, mastering advanced prompting has become a true superpower.

In this article, we'll explore the techniques that really make a difference: structured system prompts, few-shot learning, chain-of-thought, structured output in JSON, and more. With concrete before/after examples for each technique.

🎯 Why prompting still matters in 2026

You might think that with smarter models, prompting becomes less important. It's the opposite. The more powerful a model is, the more capable it is of following complex instructions — and therefore the more advanced prompting makes a difference.

Here's what good prompting changes:

Aspect	Basic prompting	Advanced prompting
Response quality	Correct, generic	Precise, tailored, actionable
Consistency	Varies from request to request	Stable and predictable
Format	Free text, requires reformatting	Structured, ready to use
Hallucinations	Frequent	Drastically reduced
Cost	Wasting tokens on back-and-forth	Right answer on the first try

A well-designed prompt can turn an "average" model into an expert assistant. A bad prompt can make even the best models hallucinate. According to a 2025 Stanford University study, a structured prompt reduces the hallucination rate by 40% on average on reasoning tasks.

📝 Structured system prompts

The system prompt is the foundation of any interaction with an LLM. It's the text that defines who the model is, how it should behave, and what its constraints are.

Before: the naive system prompt

Tu es un assistant utile qui aide les utilisateurs.

This prompt is so vague that it's useless. The model will produce generic responses, without personality or constraints.

After: the structured system prompt

A structured system prompt clearly separates information into several sections: the role (senior Python development expert with 15 years of experience, tech lead in a SaaS startup), the communication style (direct responses, informal "tu", functional examples, mentions of edge cases), the constraints (Python 3.11+ only, readability prioritized, systematic type hints, mandatory error handling) and the response format (short explanation, commented code, usage example, points of attention). The difference is radical: with this structure, every response will be consistent and adapted to your context.

The SOUL.md and AGENTS.md approach

The SOUL.md file defines the agent's personality (identity, values, tone, limits), while AGENTS.md defines its working rules (response process, code standards, escalation conditions). It's exactly the same principle as the structured system prompt, but taken to the extreme with a clear separation between "personality" and "business rules".

The 5 components of an effective system prompt

Here is the structure I recommend for any system prompt:

Component	Description	Example
Role	Who is the model?	"Web security expert"
Context	In what environment?	"Fintech startup, Node.js stack"
Style	How to communicate?	"Concise, technical, informal 'tu'"
Constraints	What is forbidden?	"Never provide code without validation"
Format	How to structure the response?	"1. Explanation, 2. Code, 3. Tests"

The choice of model behind this system prompt also matters. To compare the instruction-following capabilities of Claude, GPT, Gemini and Llama, check out Claude, GPT, Gemini, Llama : quel modèle choisir en 2026 ?.

🔄 Few-Shot Learning: learning by example

Few-shot learning consists of giving the model examples of input/output pairs before asking it your question. It's the most underestimated and most effective technique.

Before: zero-shot (no examples)

Extrais les entités nommées de ce texte :
"Apple a annoncé son nouveau MacBook Pro lors de la WWDC 2026 à Cupertino."

Typical response (variable, inconsistent):

Les entités nommées sont : Apple (entreprise), MacBook Pro (produit),
WWDC 2026 (événement), Cupertino (lieu).

The format changes with every request. Sometimes it's a list, sometimes a paragraph, sometimes with parentheses, sometimes without.

After: few-shot (with examples)

By providing two input/output examples in JSON format before your request, you show the model the exact expected schema (entity type in quotes, nested structure). The model then faithfully reproduces this pattern, including for the entities in your text.

Response (consistent, formatted):

{"entities": [{"text": "Apple", "type": "ORG"}, {"text": "MacBook Pro", "type": "PRODUCT"}, {"text": "WWDC 2026", "type": "EVENT"}, {"text": "Cupertino", "type": "LOC"}]}

How many examples do you need?

Number of examples	Use case	Quality
0 (zero-shot)	Simple and obvious tasks	Variable
1-2 (few-shot)	Define the output format	Good
3-5 (few-shot)	Complex tasks, classification	Very good
5+	Ambiguous cases, fine nuances	Excellent

💡 Golden rule: always include at least one "edge case" example in your few-shots. It shows the model how to handle ambiguous situations.

Few-shot for classification

For a support ticket classification task, the logic is to build a message history containing three examples: a billing ticket (billing category, high priority, negative sentiment), a functional question (feature_question, low, neutral) and an edge case like positive feedback with a bug (bug_report, medium, mixed sentiment). Each assistant response is formatted in JSON with the same keys. When you send your real request, the model reproduces this schema exactly.

Be careful though: each example consumes input tokens. To understand the impact of few-shot on your bill, read Tokens, contexte, coûts : comprendre la facturation des LLM.

🧠 Chain-of-Thought: making the model think

Chain-of-Thought (CoT) is the technique with the most impact on the quality of reasoning. The idea is simple: ask the model to show its reasoning step by step before giving its final answer.

Before: direct response

Combien de fois la lettre "r" apparaît dans le mot "strawberry" ?

Response (often wrong):

La lettre "r" apparaît 2 fois dans "strawberry".

After: with Chain-of-Thought

Combien de fois la lettre "r" apparaît dans le mot "strawberry" ?

Réfléchis étape par étape : analyse chaque lettre du mot une par une.

Response (correct):

Analysons chaque lettre de "strawberry" :
s - non
t - non
r - OUI (1)
a - non
w - non
b - non
e - non
r - OUI (2)
r - OUI (3)
y - non

La lettre "r" apparaît 3 fois dans "strawberry".

Chain-of-Thought variants

1. Explicit CoT (the simplest)

Simply add "Think step by step" or "Show your reasoning" to the end of your prompt.

2. Structured CoT

Ask the model to follow a specific framework: OBSERVATION (what do we see in the data?), HYPOTHESIS (what are the possible explanations?), VERIFICATION (how to validate?), CONCLUSION (most likely answer).

3. CoT with self-critique

Ask for a response in three phases: initial first response, critique of its own errors, then corrected final response. This technique is particularly powerful for logic, math, and code problems.

When to use Chain-of-Thought?

Task	CoT useful?	Why
Math / Logic	✅ Very much	Reduces calculation errors
Code / Debug	✅ Very much	Forces systematic analysis
Text analysis	✅ Moderately	Improves nuance
Creative writing	❌ Rarely	Can make the text too analytical
Simple classification	❌ No	Unnecessary overhead, slows down
Translation	❌ No	The model already does this well

⚠️ Beware of the cost: CoT generates more output tokens. If you pay per token, it can double or triple the cost. Use it strategically.

📦 Structured Output: JSON responses

Structured output is the most useful technique for developers. Instead of receiving free text that needs parsing, you directly receive usable JSON.

Before: free text

Analyze this resume and give me the candidate's skills.

Response:

The candidate is proficient in Python, JavaScript, and SQL. They also have experience
with React and Node.js. In terms of soft skills, they mention teamwork
and project management.

How do you extract this data programmatically? It's a parsing nightmare.

After: structured JSON output

By specifying a precise JSON schema in the prompt (hard skills with name/level/years, soft skills with context, languages with CEFR level, certifications), the model directly returns a structured and usable object without any parsing necessary.

Forcing JSON with the API

Most modern APIs support native "JSON" mode: OpenAI and OpenRouter via the response_format={"type": "json_object"} parameter, Google Gemini via generation_config={"response_mime_type": "application/json"}, and Anthropic Claude, which has no native mode but follows formatting instructions in the system prompt very well.

JSON Schema for validation

For critical cases, you can define a complete JSON Schema with types, enumerations, min/max values, and length constraints, then inject it directly into the system prompt. This ensures the model respects a strict data contract.

🏗️ Combined advanced techniques

The most powerful techniques combine several approaches. To know when to prioritize advanced prompting over fine-tuning or RAG, check out Fine-tuning vs RAG vs prompting : quelle approche choisir ?.

The Meta-Prompt: a prompt that generates prompts

The technique consists of asking the LLM to generate the optimal prompt itself for a task you describe, including a structured system prompt, few-shot examples, chain-of-thought instructions, and a JSON output format if relevant. You use the model to optimize its own usage.

The Multi-Step prompt

Instead of doing everything in a single request, break it down into steps: extraction of key facts in JSON, then analysis of these facts (trends, contradictions, missing information), and finally a synthesis into an executive summary for a decision-maker. Each step receives a prompt optimized for its specific task and uses the context from the previous one.

The conditional prompt

This pattern analyzes the type of user request and adapts the response: technical question (code + explanations), bug report (checklist of missing info), feature request (feasibility + estimation + steps), or something else (reformulation for clarification). It's excellent for chatbots and multi-task agents.

❌ Common prompting mistakes

Mistake 1: too much politeness, not enough structure

LLMs are not offended by a direct tone. They perform better with clear and concise instructions. Replace polite formulas with a precise list of what you expect.

Mistake 2: contradictory instructions

"Be concise but detailed" is a classic example of a contradiction. Prefer a numbered structure that clearly separates levels of detail (2-sentence summary, then 5-10 lines of detail, then examples).

Mistake 3: not specifying the format

Without an explicit format, the model freely chooses how to present its response. Always ask for a precise format: markdown table with defined columns, JSON with a schema, numbered list, etc.

Mistake 4: ignoring the negative

Telling the model what it must NOT do is often just as important as what it must do: do not start with "Sure!", do not repeat the question, do not give unsolicited warnings. Negative constraints are very effective.

Mistake 5: the "one-size-fits-all" prompt

A single generic system prompt for all tasks yields mediocre results. Create specialized prompts: one for code (Python expert, no fluff), one for writing (SEO tone, catchy H2s), one for analysis (numbers, confidence intervals).

🛠️ Recommended tools

Claude (Anthropic): the model that best follows complex instructions and structured system prompts.
GPT-4.1 (OpenAI): excellent at few-shot and native structured JSON output.
DeepSeek V4: its two new Pro and Flash models change the game on the quality/price ratio for advanced prompting. To learn more, check out DeepSeek V4 : deux nouveaux modèles — Pro et Flash — changent la donne.
Qwen3.6 (Alibaba): a new family of models that performs very well on chain-of-thought and structured reasoning. Discover Qwen3.6 : Alibaba débarque avec une nouvelle famille de modèles LLM.
OpenRouter: the ideal platform for testing your prompts on different models with a single API.

📊 Measuring the impact of prompting

How do you know if your prompting is improving? Measure it! According to a Google DeepMind benchmark published in 2025, prompts optimized with CoT and few-shot achieve a first-correct-response rate of over 90% on reasoning tasks, compared to 60% with basic prompting.

Key metrics

Metric	How to measure	Goal
First correct response rate	% of usable responses without follow-up	>90%
Tokens per response	Average number of tokens generated	Stable or decreasing
Format consistency	% of responses in the correct format	>95%
Hallucination rate	% of responses with false info	<5%
User satisfaction	1-5 rating from users	>4

A/B testing prompts

The method consists of preparing two versions of the same prompt (for example, a generic version and a structured version), assigning them randomly to incoming requests, and then logging the version used, the prompt, the response, and a quality score to statistically compare performance.

🛠️ Toolkit: ready-to-use prompt templates

Template: Data extraction

The template asks to extract specific fields (name, date, amount, currency, category) in valid JSON, with the strict rule of using null if information is missing or ambiguous rather than guessing.

Template: Writing with constraints

This template specifies the type of content, topic, target length, tone, audience, keywords to include, and structure. It also lists prohibitions: no clichés, no rhetorical questions, no passive voice when active is possible.

Template: Code debugging

The process imposed on the model is in five steps: read the code, identify the bug in one or two sentences, explain why it's a bug, provide only the corrected lines, and provide a test that proves the fix works.

❓ FAQ

Is advanced prompting still useful with 2026 models?
Yes, more than ever. Recent models like DeepSeek V4 and Qwen3.6 are capable of following more complex instructions, which amplifies (and does not reduce) the gap between a good and a bad prompt.

How many tokens does a structured system prompt consume?
A 200-300 word system prompt consumes about 300-450 tokens. This is negligible compared to the quality gain and the reduction in back-and-forths. Check out Tokens, contexte, coûts : comprendre la facturation des LLM to optimize your budget.

Few-shot or chain-of-thought: which one to choose?
It depends on the task. Few-shot is ideal for locking in an output format (classification, JSON extraction). Chain-of-thought is preferable for reasoning (math, logic, debugging). You can combine both for complex tasks.

Should you always ask for JSON?
No. JSON is useful when you need to integrate the response into an automated pipeline. For writing, narrative analysis, or brainstorming, free text with markdown formatting is more appropriate.

Is the meta-prompt (a prompt that generates prompts) reliable?
It produces excellent first-draft results, but almost always requires human adjustment. Use it as a starting point, not as a final result.

📋 The essentials

Structure your system prompt: role, context, style, constraints, format
Use few-shot: 2-3 examples are worth more than 200 words of explanation
Activate Chain-of-Thought for complex reasoning
Ask for JSON when you need structured data
Say what you do NOT want — negative constraints are powerful
Test and measure — prompting is an iterative process
Specialize your prompts — one prompt per task, not one prompt for everything
Draw inspiration from SOUL.md/AGENTS.md — the personality/rules separation is an excellent model

Prompting is not magic — it's engineering. Like everything in engineering, it can be learned, practiced, and improved over time.

#AI Prompts #ia #llm #tutorial

📚 Related articles

LLM & Modèles 🟢 Débutant 11 min

Gemini 3.5 Flash : the fast model that beats Opus 4.7 and GPT-5.5 on agent benchmarks — 289 tokens/second

Discover Gemini 3.5 Flash: the ultra-fast model at 289 tokens/sec beating Claude Opus 4.7 and GPT-5.5 on agent benchmarks.

2026-05-20 14:09

LLM & Modèles 🟢 Débutant 14 min

General Preference RL: this paper unifies reinforcement learning and preference optimization for LLMs

Discover the General Preference RL paper unifying reinforcement learning and preference optimization to solve LLM post-training.

2026-05-19 18:01

LLM & Modèles 🟢 Débutant 12 min

OpenAI Parameter Golf: The challenge that proves small models are the future of AI

Discover the OpenAI Parameter Golf challenge: why compressing an LLM into 16 MB proves small models are the future of AI.

2026-05-18 17:02

📑 Table of contents