🏷️ post-training

2 articles about post-training — guides, tutorials and comparisons to master this topic on AI-master.dev.

General Preference RL: this paper unifies reinforcement learning and preference optimization for LLMs

Découvrez le papier General Preference RL qui unifie le reinforcement learning et l'optimisation de préférences pour résoudre le post-training des LLM.

LLM & Modèles débutant

SDAR: how to train AI agents with reinforcement learning without breaking them — self-distillation agentic

Découvrez le SDAR (Self-Distillation Agentic Reinforcement) : la méthode pour entraîner vos agents IA avec du reinforcement learning sans les casser.

LLM & Modèles débutant