📑 Table of contents

Agent Skills: addyosmani's framework that standardizes the workflows of coding AI agents

Agents IA 🟢 Beginner ⏱️ 15 min read 📅 2026-05-18

Agent Skills: the addyosmani framework that standardizes AI coding agent workflows

🔎 Vibe coding just found its discipline

In May 2026, a GitHub repo blew up the trending charts. Not a new model, not a startup raising $100 million. A Google engineer, Addy Osmani, published a framework that solves the most frustrating problem of vibe coding: the random quality of code generated by AI agents.

The concept is brutally efficient. Instead of letting an AI agent improvise with every prompt, you inject it with SKILL.md files that encode the workflows, quality gates, and checklists of a senior engineer. The agent no longer drifts. It follows a structured, reproducible, verifiable process.

The movement immediately exploded. A marketplace (claude-plugins.dev) and a registry (agentskill.sh with 44,000+ indexed skills) emerged in a matter of weeks. Anthropic itself adopted the format with its financial-services repo (12,000+ stars). Agent skills are no longer an experiment: they are becoming the de facto standard for structuring the behavior of coding agents.


The essentials

  • Agent skills are SKILL.md files that encode the workflows and quality gates of senior engineers in a format readable by AI coding agents.
  • The addyosmani/agent-skills framework is compatible with Claude Code, Cursor, Copilot, Gemini CLI, Windsurf, OpenCode, and Kiro IDE.
  • The ecosystem has exploded: agentskill.sh lists 44,000+ skills, and claude-plugins.dev offers a marketplace with one-command installation.
  • Anthropic adopted the concept through its anthropics/financial-services repo, validating the format as an industry standard.
  • The architecture relies on 3 layers: trigger (when to activate the skill), process (steps to follow), quality (validation gates).

Tool Main Usage Price (June 2025, check on site.com) Ideal for
agent-skills Production-grade skills framework Free (open source) Developers looking to structure their agents
claude-plugins.dev Skill marketplace with auto-indexing Free Finding and installing skills in one command
agentskill.sh Largest skills registry (44,000+) Free Exploring the complete skills ecosystem
claude-skills Community collection of 263+ skills Free (open source) Copying ready-to-use structured skills

How agent skills actually work

An agent skill is a SKILL.md file placed in the project directory (usually in .claude/skills/, .cursor/skills/ or .github/skills/). When the coding AI agent detects a matching context, it automatically loads and follows the file's instructions.

The format is deliberately simple. No complex YAML, no verbose JSON schema. Just structured markdown that any LLM can parse effortlessly. This is what makes it universal: a SKILL.md written for Claude Code also works in Cursor or Copilot with zero adaptation.

Wonderlab's framework analysis (dev.to, May 2026) describes a 3-layer architecture. The trigger layer defines when the skill should activate (pattern matching on the context, file type, prompt intent). The process layer describes the sequential steps to follow. The quality layer sets the validation criteria that the result must satisfy before being considered complete.

In practice, when you ask Claude Code to "create an API endpoint for payments", the agent no longer jumps in blindly. It detects that an "api-endpoint-creation" skill is available, loads it, and follows the workflow step by step: check the project's routing conventions, write the tests first, implement the logic, validate against the quality gates.

This is the difference between a junior coding by feel and a senior following a proven process. Except here, the "senior" is a 50-line text file.


Addy Osmani is no stranger. A Chrome engineer at Google, author of "Learning JavaScript Design Patterns" and "Image Optimization", he has massive credibility in the frontend community. When he publishes a repo, developers listen.

What makes the addyosmani/agent-skills framework so powerful is its production-grade approach. These aren't just snippets or vaguely structured prompts. Each skill encodes a complete workflow with conditional decisions, reference checklists, and measurable quality criteria.

The community collection alirezarezvani/claude-skills (263+ skills) proves the massive adoption of the format. Each skill is structured identically there: a SKILL.md, a clear workflow, a decision framework for edge cases. The consistency of the format allows developers to mix skills from different sources without conflict.

The repo also benefited from a network effect. Early adopters (primarily teams using Claude Code and Cursor) started publishing their own skills. Marketplaces automatically indexed them. Other developers discovered them, installed them, and published their own. In two weeks, the movement went from a GitHub repo to a complete ecosystem.


The ecosystem is exploding: marketplaces and registries

The strongest signal that agent skills have become a standard is the spontaneous emergence of infrastructure around the concept. Two platforms stand out.

claude-plugins.dev works like a marketplace/registry with auto-indexing of public GitHub skills. You find a skill that interests you, copy a command, and it installs into your project. The site references the "open agentskills" specs, an attempt at an open standardization of the SKILL.md format.

agentskill.sh is more ambitious. It is the largest marketplace for coding agents, supporting Claude Code, Cursor, Copilot, Codex, Windsurf, Zed and more than 20 other tools. With 44,000+ skills indexed in May 2026, it has become the mandatory entry point for anyone wanting to explore the ecosystem.

The landscape published by explainx.ai in May 2026 lists at least 10 major registries: claude-plugins.dev, agentskill.sh, SkillsMP, LobeHub, and others. The ecosystem is exploding, with new registries appearing literally every week.

This fragmentation is both a sign of health (high adoption) and a risk (no single standard). But because the basic markdown format is so simple, interoperability remains excellent in practice.


The most useful skills in practice

Not all skills are created equal. After analyzing the agentskill.sh registry and the alirezarezvani/claude-skills collection, the most popular and useful categories clearly stand out.

Code review skills are the most downloaded. They encode review checklists: security, performance, accessibility, naming conventions. The agent doesn't just check syntax. It verifies that inputs are sanitized, that SQL queries are parameterized, and that components respect WCAG accessibility guidelines.

Testing skills come in second place. They force the agent to write tests before the code (TDD), cover edge cases, and check error types. A well-configured testing skill transforms an agent that generates fragile code into an agent that generates resilient code.

Architecture skills are rarer but more valuable. They guide the agent in structural decisions: when to create a new module, how to split responsibilities, which pattern to apply. This is where the concept of a Skill system makes complete sense — the agent that learns and improves over the course of interactions.

Deployment skills close the loop. They encode pre-deploy checklists: verifying environment variables, validating database migrations, testing rollbacks. For teams deploying on hosts like Hostinger, these skills prevent configuration errors that cost hours of debugging.


Impact on the quality of generated code

The central question: do agent skills actually improve quality, or is it just process decoration?

The available data is still mostly anecdotal, but the feedback is unanimous. The structured workflow drastically reduces regressions. When an agent follows a code review skill with explicit quality gates, it won't approve a PR that introduces an XSS vulnerability or breaks an existing test.

The most visible impact is on consistency. In a project without skills, the style of generated code varies depending on the prompt, the time of day, and the model used. With skills, conventions are encoded once and applied systematically. The code generated on Monday by GPT-5.5 looks like the code generated on Friday by Claude Opus 4.7.

An often underestimated point: skills reduce prompt engineering debt. Instead of rewriting complex instructions in every prompt, you encode them once in a SKILL.md. The time savings are massive for teams that use coding agents daily.

For high-level agentic models like GPT-5.5 (score of 98.2 on agentic benchmarks) or Claude Opus 4.7 (94.3), skills act as an additional safeguard. Even the best model can drift without constraints. Skills maintain discipline.


The internal architecture of a skill

Understanding the structure of a skill allows you to create your own or evaluate those of others. The format is not formally standardized, but a clear consensus has emerged around certain sections.

The header contains the metadata: name, description, triggers (when the skill activates), compatible tools. This section allows marketplaces like claude-plugins.dev to automatically index the skill.

The "Context" section describes the problem the skill solves and the preconditions. For example: "This skill activates when the agent needs to create a new React component in a project using Next.js 15+ and Tailwind CSS 4+."

The "Workflow" section is the core. It describes the sequential steps, with conditional decisions in natural pseudo-code. "If the component has local state, use useState. If the state is shared between components, use the project's Zustand store."

The "Quality Gates" section defines the validation criteria. "The component must have at least one unit test. The bundle size must not exceed 5 KB gzipped. Accessibility must pass axe-core checks."

This 4-section structure is the one found in almost all of the 263+ skills in the alirezarezvani/claude-skills collection, confirming that it has become the de facto standard.


How AI agents use skills

The mechanism for detecting and applying skills varies depending on the tool, but the principle is always the same: contextual scanning.

When you type a prompt in Claude Code, the agent scans the SKILL.md files available in the project. It compares the context of your request (modified files, detected language, prompt intent) with the triggers for each skill. If a match is found, the skill is injected into the agent's context before it generates its response.

For the best autonomous AI agents like Claude Code with Claude Opus 4.7, this mechanism is native. The agent naturally understands the markdown format and follows structured instructions. For less advanced tools, skills still work, but with lower execution fidelity.

The declared compatibility of the addyosmani/agent-skills framework is impressive: Claude Code, Cursor, Copilot, Gemini CLI, Windsurf, OpenCode, Kiro IDE. In practice, compatibility depends on the tool's ability to read local files and integrate them into the LLM's context. All major tools do this today.

For those who want to go further and create a custom AI agent, skills offer an elegant architectural model. Instead of hardcoding behaviors into the agent's code, they are externalized into markdown files that can be modified without redeployment.


Anthropic validates the concept with financial-services

The strongest signal of legitimacy comes from Anthropic itself. The anthropics/financial-services repo, with over 12,000 stars, uses the SKILL.md format to structure the behavior of an agent specialized in financial services.

Anthropic didn't just adopt the format. It adapted it to a regulated domain where precision is not optional. The skills in the financial-services repo encode compliance checks, rules for managing sensitive data, and quality gates specific to the banking sector.

This choice sends a clear message: agent skills are not a gimmick for hobbyists. Even the company behind Claude considers structuring its agents' behavior via SKILL.md files to be the right approach for critical use cases.

For developers who were hesitating to adopt the format, this is the definitive green light. If Anthropic structures its own agents with skills, the format has a lifespan that extends beyond the current hype cycle.


Agent skills and task delegation

A particularly powerful use case for skills is delegation among sub-agents. When a main agent receives a complex task, it can delegate subtasks to specialized agents, each equipped with its own skills.

This is exactly the pattern described in task delegation and sub-agent orchestration. The orchestrator agent identifies the subtasks, assigns each subtask to an agent with the appropriate skills, and validates the results via the quality gates of each skill.

For example, for a complete feature, the orchestrator can delegate: an agent with the "database-migration" skill for the schema, an agent with the "api-endpoint" skill for the backend, an agent with the "react-component" skill for the frontend. Each agent produces a result that complies with its quality gates, and the orchestrator validates the overall consistency.

This orchestration pattern is particularly effective with state-of-the-art agentic models. GPT-5.5 (98.2) and Gemini 3 Pro Deep Think (95.4) excel in planning and delegation. Claude Opus 4.7 (94.3) shines in the faithful execution of skills. Combined, they form a remarkably reliable development pipeline.


Agent skills and local models with Ollama

Not everyone uses paid cloud APIs. For developers running open source AI agents with Ollama locally, skills are even more relevant.

Local models are generally less powerful than their cloud equivalents. Kimi K2.6 (88.1 in self-host) and GLM-5 Reasoning (82.0) are competent but lack the reasoning depth of a GPT-5.5. Skills partially compensate for this difference by guiding the model toward the right decisions.

A local model well-guided by skills can produce a result comparable to a cloud model without guidance. It's an efficiency multiplier particularly valuable for teams with cost or confidentiality constraints.

The choice of LLM for agents then becomes a different calculation. Instead of aiming for the most powerful model, the goal becomes the fastest and cheapest model that knows how to follow structured instructions. Skills change the game in this equation.


Configuring OpenClaw with agent skills

For OpenClaw users, integrating skills is done at the SOUL and AGENTS configuration level. Each agent defined in OpenClaw can be associated with a set of skills that structure its behavior.

The OpenClaw configuration: SOUL, AGENTS, and Skills allows you to define agent profiles with specific skills. A "reviewer" agent is assigned code review skills. An "architect" agent receives design skills. A "tester" agent inherits testing skills.

This modular approach is elegant. Skills are reusable behavior blocks that are composed at the agent configuration level. You can create a new agent by combining existing skills, without writing a single line of code.

For teams already using OpenClaw as an orchestrator, agent skills are the natural complement. The SOUL defines the personality, the AGENTS define the roles, and the Skills define the processes. Together, these three layers form a coherent and powerful AI development system.


❌ Common mistakes

Mistake 1: Confusing skills and prompts

A skill is not a giant prompt. A prompt says "do this". A skill defines a conditional process with quality gates. If your SKILL.md looks like a massive prompt, you haven't understood the concept. Refactor by separating triggers, workflow, and quality criteria.

Mistake 2: Creating skills that are too generic

A "code well" skill is useless. Skills must be context-specific: "create a REST endpoint in a Next.js project with Prisma and Zod validation". Specificity is what allows the right agent to load the right skill at the right time.

Mistake 3: Ignoring triggers

A skill without clear triggers will never be activated automatically. You will end up invoking it manually, which defeats the advantage of the system. Precisely define the conditions (language, framework, file type, intention) that trigger the skill.

Mistake 4: Stacking too many skills

20 active skills simultaneously create a context bloat that degrades the LLM's performance. Limit yourself to 5-8 relevant skills for your active project. Marketplaces like agentskill.sh allow you to browse 44,000+ skills, but you only need a handful.

Mistake 5: Not keeping skills up to date

A skill that references deprecated APIs or obsolete conventions is worse than no skill at all. The agent will faithfully follow incorrect instructions. Update your skills as often as your technical documentation.


❓ Frequently Asked Questions

Do agent skills replace prompt engineering?

No, they complement it. Prompt engineering remains useful for one-off interactions. Skills structure recurring processes. The two coexist: the prompt for direction, the skill for execution.

Is Claude Code the best tool to use skills with?

It has the most native support, but agentskill.sh shows that Cursor, Copilot, Codex and 20+ other tools are compatible. The markdown format is universal. Choose the tool that suits you, not the other way around.

Can skills be used with local models via Ollama?

Yes, absolutely. Skills are just markdown text. Any LLM capable of reading local files can use them. It is even particularly recommended to compensate for the reasoning gap of local models.

How many skills are needed for a typical project?

5 to 8 skills cover 90% of needs: code review, testing, component creation, endpoint creation, error management, deployment, documentation. Add specific business skills if necessary, but don't exceed a dozen or so.

Will the SKILL.md format be formally standardized?

The "open agentskills" specs from claude-plugins.dev are attempting a standardization, but for now it is a de facto consensus, not a formal standard. The simplicity of the markdown format makes rigid standardization less urgent than it would be for a more complex format.


✅ Conclusion

addyosmani's agent skills do exactly what vibe coding couldn't do: turn improvisation into discipline, without losing fluidity. A 50-line SKILL.md file does the work of a lead dev micromanaging every PR. The ecosystem has exploded in a few weeks, Anthropic has adopted the format, and 44,000+ skills are already available on agentskill.sh. The de facto standard is here — all that's left is to configure your agents.