Advanced prompting that really makes a difference
You've been using ChatGPT, Claude or Gemini for months, but you feel like you're not getting the most out of them? You're right. The difference between an average user and an AI expert often comes down to a single word: prompting.
Prompting is the art of communicating with a language model to get exactly what you want. And in 2026, with increasingly powerful models, mastering advanced prompting has become a true superpower.
In this article, we'll explore the techniques that really make a difference: structured system prompts, few-shot learning, chain-of-thought, structured output in JSON, and more. With concrete before/after examples for each technique.
🎯 Why prompting still matters in 2026
You might think that with smarter models, prompting becomes less important. It's the opposite. The more powerful a model is, the more capable it is of following complex instructions — and therefore the more advanced prompting makes a difference.
Here's what good prompting changes:
| Aspect | Basic prompting | Advanced prompting |
|---|---|---|
| Response quality | Correct, generic | Precise, tailored, actionable |
| Consistency | Varies from request to request | Stable and predictable |
| Format | Free text, requires reformatting | Structured, ready to use |
| Hallucinations | Frequent | Drastically reduced |
| Cost | Wasting tokens on back-and-forth | Right answer on the first try |
A well-designed prompt can turn an "average" model into an expert assistant. A bad prompt can make even the best models hallucinate. According to a 2025 Stanford University study, a structured prompt reduces the hallucination rate by 40% on average on reasoning tasks.
📝 Structured system prompts
The system prompt is the foundation of any interaction with an LLM. It's the text that defines who the model is, how it should behave, and what its constraints are.
Before: the naive system prompt
Tu es un assistant utile qui aide les utilisateurs.
This prompt is so vague that it's useless. The model will produce generic responses, without personality or constraints.
After: the structured system prompt
A structured system prompt clearly separates information into several sections: the role (senior Python development expert with 15 years of experience, tech lead in a SaaS startup), the communication style (direct responses, informal "tu", functional examples, mentions of edge cases), the constraints (Python 3.11+ only, readability prioritized, systematic type hints, mandatory error handling) and the response format (short explanation, commented code, usage example, points of attention). The difference is radical: with this structure, every response will be consistent and adapted to your context.
The SOUL.md and AGENTS.md approach
The SOUL.md file defines the agent's personality (identity, values, tone, limits), while AGENTS.md defines its working rules (response process, code standards, escalation conditions). It's exactly the same principle as the structured system prompt, but taken to the extreme with a clear separation between "personality" and "business rules".
The 5 components of an effective system prompt
Here is the structure I recommend for any system prompt:
| Component | Description | Example |
|---|---|---|
| Role | Who is the model? | "Web security expert" |
| Context | In what environment? | "Fintech startup, Node.js stack" |
| Style | How to communicate? | "Concise, technical, informal 'tu'" |
| Constraints | What is forbidden? | "Never provide code without validation" |
| Format | How to structure the response? | "1. Explanation, 2. Code, 3. Tests" |
The choice of model behind this system prompt also matters. To compare the instruction-following capabilities of Claude, GPT, Gemini and Llama, check out Claude, GPT, Gemini, Llama : quel modèle choisir en 2026 ?.
🔄 Few-Shot Learning: learning by example
Few-shot learning consists of giving the model examples of input/output pairs before asking it your question. It's the most underestimated and most effective technique.
Before: zero-shot (no examples)
Extrais les entités nommées de ce texte :
"Apple a annoncé son nouveau MacBook Pro lors de la WWDC 2026 à Cupertino."
Typical response (variable, inconsistent):
Les entités nommées sont : Apple (entreprise), MacBook Pro (produit),
WWDC 2026 (événement), Cupertino (lieu).
The format changes with every request. Sometimes it's a list, sometimes a paragraph, sometimes with parentheses, sometimes without.
After: few-shot (with examples)
By providing two input/output examples in JSON format before your request, you show the model the exact expected schema (entity type in quotes, nested structure). The model then faithfully reproduces this pattern, including for the entities in your text.
Response (consistent, formatted):
{"entities": [{"text": "Apple", "type": "ORG"}, {"text": "MacBook Pro", "type": "PRODUCT"}, {"text": "WWDC 2026", "type": "EVENT"}, {"text": "Cupertino", "type": "LOC"}]}
How many examples do you need?
| Number of examples | Use case | Quality |
|---|---|---|
| 0 (zero-shot) | Simple and obvious tasks | Variable |
| 1-2 (few-shot) | Define the output format | Good |
| 3-5 (few-shot) | Complex tasks, classification | Very good |
| 5+ | Ambiguous cases, fine nuances | Excellent |
💡 Golden rule: always include at least one "edge case" example in your few-shots. It shows the model how to handle ambiguous situations.
Few-shot for classification
For a support ticket classification task, the logic is to build a message history containing three examples: a billing ticket (billing category, high priority, negative sentiment), a functional question (feature_question, low, neutral) and an edge case like positive feedback with a bug (bug_report, medium, mixed sentiment). Each assistant response is formatted in JSON with the same keys. When you send your real request, the model reproduces this schema exactly.
Be careful though: each example consumes input tokens. To understand the impact of few-shot on your bill, read Tokens, contexte, coûts : comprendre la facturation des LLM.
🧠 Chain-of-Thought: making the model think
Chain-of-Thought (CoT) is the technique with the most impact on the quality of reasoning. The idea is simple: ask the model to show its reasoning step by step before giving its final answer.
Before: direct response
Combien de fois la lettre "r" apparaît dans le mot "strawberry" ?
Response (often wrong):
La lettre "r" apparaît 2 fois dans "strawberry".
After: with Chain-of-Thought
Combien de fois la lettre "r" apparaît dans le mot "strawberry" ?
Réfléchis étape par étape : analyse chaque lettre du mot une par une.
Response (correct):
Analysons chaque lettre de "strawberry" :
s - non
t - non
r - OUI (1)
a - non
w - non
b - non
e - non
r - OUI (2)
r - OUI (3)
y - non
La lettre "r" apparaît 3 fois dans "strawberry".
Chain-of-Thought variants
1. Explicit CoT (the simplest)
Simply add "Think step by step" or "Show your reasoning" to the end of your prompt.
2. Structured CoT
Ask the model to follow a specific framework: OBSERVATION (what do we see in the data?), HYPOTHESIS (what are the possible explanations?), VERIFICATION (how to validate?), CONCLUSION (most likely answer).
3. CoT with self-critique
Ask for a response in three phases: initial first response, critique of its own errors, then corrected final response. This technique is particularly powerful for logic, math, and code problems.
When to use Chain-of-Thought?
| Task | CoT useful? | Why |
|---|---|---|
| Math / Logic | ✅ Very much | Reduces calculation errors |
| Code / Debug | ✅ Very much | Forces systematic analysis |
| Text analysis | ✅ Moderately | Improves nuance |
| Creative writing | ❌ Rarely | Can make the text too analytical |
| Simple classification | ❌ No | Unnecessary overhead, slows down |
| Translation | ❌ No | The model already does this well |
⚠️ Beware of the cost: CoT generates more output tokens. If you pay per token, it can double or triple the cost. Use it strategically.
📦 Structured Output: JSON responses
Structured output is the most useful technique for developers. Instead of receiving free text that needs parsing, you directly receive usable JSON.
Before: free text
Analyze this resume and give me the candidate's skills.
Response:
The candidate is proficient in Python, JavaScript, and SQL. They also have experience
with React and Node.js. In terms of soft skills, they mention teamwork
and project management.
How do you extract this data programmatically? It's a parsing nightmare.
After: structured JSON output
By specifying a precise JSON schema in the prompt (hard skills with name/level/years, soft skills with context, languages with CEFR level, certifications), the model directly returns a structured and usable object without any parsing necessary.
Forcing JSON with the API
Most modern APIs support native "JSON" mode: OpenAI and OpenRouter via the response_format={"type": "json_object"} parameter, Google Gemini via generation_config={"response_mime_type": "application/json"}, and Anthropic Claude, which has no native mode but follows formatting instructions in the system prompt very well.
JSON Schema for validation
For critical cases, you can define a complete JSON Schema with types, enumerations, min/max values, and length constraints, then inject it directly into the system prompt. This ensures the model respects a strict data contract.
🏗️ Combined advanced techniques
The most powerful techniques combine several approaches. To know when to prioritize advanced prompting over fine-tuning or RAG, check out Fine-tuning vs RAG vs prompting : quelle approche choisir ?.
The Meta-Prompt: a prompt that generates prompts
The technique consists of asking the LLM to generate the optimal prompt itself for a task you describe, including a structured system prompt, few-shot examples, chain-of-thought instructions, and a JSON output format if relevant. You use the model to optimize its own usage.
The Multi-Step prompt
Instead of doing everything in a single request, break it down into steps: extraction of key facts in JSON, then analysis of these facts (trends, contradictions, missing information), and finally a synthesis into an executive summary for a decision-maker. Each step receives a prompt optimized for its specific task and uses the context from the previous one.
The conditional prompt
This pattern analyzes the type of user request and adapts the response: technical question (code + explanations), bug report (checklist of missing info), feature request (feasibility + estimation + steps), or something else (reformulation for clarification). It's excellent for chatbots and multi-task agents.
❌ Common prompting mistakes
Mistake 1: too much politeness, not enough structure
LLMs are not offended by a direct tone. They perform better with clear and concise instructions. Replace polite formulas with a precise list of what you expect.
Mistake 2: contradictory instructions
"Be concise but detailed" is a classic example of a contradiction. Prefer a numbered structure that clearly separates levels of detail (2-sentence summary, then 5-10 lines of detail, then examples).
Mistake 3: not specifying the format
Without an explicit format, the model freely chooses how to present its response. Always ask for a precise format: markdown table with defined columns, JSON with a schema, numbered list, etc.
Mistake 4: ignoring the negative
Telling the model what it must NOT do is often just as important as what it must do: do not start with "Sure!", do not repeat the question, do not give unsolicited warnings. Negative constraints are very effective.
Mistake 5: the "one-size-fits-all" prompt
A single generic system prompt for all tasks yields mediocre results. Create specialized prompts: one for code (Python expert, no fluff), one for writing (SEO tone, catchy H2s), one for analysis (numbers, confidence intervals).
🛠️ Recommended tools
- Claude (Anthropic): the model that best follows complex instructions and structured system prompts.
- GPT-4.1 (OpenAI): excellent at few-shot and native structured JSON output.
- DeepSeek V4: its two new Pro and Flash models change the game on the quality/price ratio for advanced prompting. To learn more, check out DeepSeek V4 : deux nouveaux modèles — Pro et Flash — changent la donne.
- Qwen3.6 (Alibaba): a new family of models that performs very well on chain-of-thought and structured reasoning. Discover Qwen3.6 : Alibaba débarque avec une nouvelle famille de modèles LLM.
- OpenRouter: the ideal platform for testing your prompts on different models with a single API.
📊 Measuring the impact of prompting
How do you know if your prompting is improving? Measure it! According to a Google DeepMind benchmark published in 2025, prompts optimized with CoT and few-shot achieve a first-correct-response rate of over 90% on reasoning tasks, compared to 60% with basic prompting.
Key metrics
| Metric | How to measure | Goal |
|---|---|---|
| First correct response rate | % of usable responses without follow-up | >90% |
| Tokens per response | Average number of tokens generated | Stable or decreasing |
| Format consistency | % of responses in the correct format | >95% |
| Hallucination rate | % of responses with false info | <5% |
| User satisfaction | 1-5 rating from users | >4 |
A/B testing prompts
The method consists of preparing two versions of the same prompt (for example, a generic version and a structured version), assigning them randomly to incoming requests, and then logging the version used, the prompt, the response, and a quality score to statistically compare performance.
🛠️ Toolkit: ready-to-use prompt templates
Template: Data extraction
The template asks to extract specific fields (name, date, amount, currency, category) in valid JSON, with the strict rule of using null if information is missing or ambiguous rather than guessing.
Template: Writing with constraints
This template specifies the type of content, topic, target length, tone, audience, keywords to include, and structure. It also lists prohibitions: no clichés, no rhetorical questions, no passive voice when active is possible.
Template: Code debugging
The process imposed on the model is in five steps: read the code, identify the bug in one or two sentences, explain why it's a bug, provide only the corrected lines, and provide a test that proves the fix works.
❓ FAQ
Is advanced prompting still useful with 2026 models?
Yes, more than ever. Recent models like DeepSeek V4 and Qwen3.6 are capable of following more complex instructions, which amplifies (and does not reduce) the gap between a good and a bad prompt.
How many tokens does a structured system prompt consume?
A 200-300 word system prompt consumes about 300-450 tokens. This is negligible compared to the quality gain and the reduction in back-and-forths. Check out Tokens, contexte, coûts : comprendre la facturation des LLM to optimize your budget.
Few-shot or chain-of-thought: which one to choose?
It depends on the task. Few-shot is ideal for locking in an output format (classification, JSON extraction). Chain-of-thought is preferable for reasoning (math, logic, debugging). You can combine both for complex tasks.
Should you always ask for JSON?
No. JSON is useful when you need to integrate the response into an automated pipeline. For writing, narrative analysis, or brainstorming, free text with markdown formatting is more appropriate.
Is the meta-prompt (a prompt that generates prompts) reliable?
It produces excellent first-draft results, but almost always requires human adjustment. Use it as a starting point, not as a final result.
📋 The essentials
- Structure your system prompt: role, context, style, constraints, format
- Use few-shot: 2-3 examples are worth more than 200 words of explanation
- Activate Chain-of-Thought for complex reasoning
- Ask for JSON when you need structured data
- Say what you do NOT want — negative constraints are powerful
- Test and measure — prompting is an iterative process
- Specialize your prompts — one prompt per task, not one prompt for everything
- Draw inspiration from SOUL.md/AGENTS.md — the personality/rules separation is an excellent model
Prompting is not magic — it's engineering. Like everything in engineering, it can be learned, practiced, and improved over time.