GenericAgent : the open-source AI agent that builds its own skill tree — 6,700 stars in one week on GitHub
🔎 An agent that doesn't receive skills, it grows them
January 2026. The world of AI agents is saturated with frameworks that promise autonomy but deliver pre-wired workflows. LangGraph, CrewAI, AutoGen: they all share the same premise — you design skills, you plug them in, you pray. GenericAgent radically reverses this model. Released in January 2026 with a paper on arXiv and a GitHub repo, this 3,000-line code framework racked up +3,536 stars in 7 days to surpass 6,700 in total according to Andrew.ooo.
The concept is as simple as it is unsettling: "don't ship skills, grow them". Each solved task is crystallized into a reusable skill. The skill tree grows organically with use. Measured result: 6x fewer tokens consumed per task compared to classic approaches, according to ByteIota. This is perhaps the first agent framework that literally gets better every day without any developer intervention.
The essentials
- 3,000 lines of code only for an agent with complete system control, according to the official repo.
- +6,700 GitHub stars in one week, with +3,536 added in the first 7 days.
- 5-layer on-demand memory and 9 built-in atomic tools, according to AntiGravity Codes.
- 6x fewer tokens consumed thanks to the crystallization of skills into long-term memory.
- 9 atomic tools that replace dozens of specialized tools in competing frameworks.
- Open-source, usable with any LLM including locally.
Recommended tools
| Tool | Main usage | Price (June 2025, check on site.com) | Ideal for |
|---|---|---|---|
| GenericAgent | Self-evolving agent with skill tree | Free (MIT) | Projects where the agent must improve over time |
| Hostinger | Hosting for deploying agents | From €2.99/month | Low-cost deployment of autonomous agents |
| DeerFlow (ByteDance) | Open-source agent research + code + long-term creation | Free | Long-term research and creation tasks |
| ruflo | Multi-agent orchestration | Free | Coordinating multiple agents in parallel |
The architecture in 3,000 lines: why it's credible
GenericAgent fits in 3K lines of code. For an autonomous agent framework with 5-layer memory, that's almost absurd. For comparison, LangGraph easily exceeds 50,000 lines. But lightness is not a compromise — it's a direct consequence of the design.
No preloaded workflows. No complex state graphs. No stacked prompt chains. The core of GenericAgent relies on a self-evolution loop that takes a task, solves it with the available atomic tools, and then decides whether the trajectory deserves to be crystallized. If yes, the skill is stored in L3 memory and becomes available for future tasks.
Mervin Praison sums up the stance well: the framework focuses on self-evolving skills rather than bulky preloaded workflows. It is this architectural simplicity that allows complete control of the local system from such a compact codebase. AISignal confirms: the agent achieves complete system control with this 3.3K-line seed.
The 9 atomic tools: less is more
Most agent frameworks ask you to define custom tools for every use case. One tool to search the database, one to send an email, one to scrape a page, etc. GenericAgent operates on the principle that 9 primitive tools are enough to cover virtually any task.
These 9 atomic tools are designed to be composable. The agent combines them dynamically based on the task, and if a successful combination repeats, it becomes a high-level skill. It's exactly like a programming language: you don't need a keyword for every operation. A few well-chosen primitives and composition do the rest.
AntiGravity Codes details this architecture: the 9 atomic tools cover reading, writing, executing commands, searching, and file manipulation. Nothing revolutionary individually. It's their combination by the agent itself that creates the magic.
The 5-layer memory: the real differentiator
This is where GenericAgent becomes seriously interesting. The framework implements on-demand 5-layer memory. Each layer has a specific role, and the agent dynamically decides which ones to activate based on the complexity of the task.
L1 — Working Memory: the immediate context of the current task. Equivalent to the classic "prompt context", but managed explicitly rather than by window overflow.
L2 — Episodic Memory: past attempts, successful or failed. The agent remembers what it has already tried so as not to repeat the same mistakes.
L3 — Skill Memory: the heart of the system. Verified task trajectories crystallized into reusable skills. This is the skill tree that grows organically.
L4 — Semantic Memory: factual knowledge extracted from previous tasks. Not skills, but information the agent learned by doing.
L5 — Procedural Memory: meta-patterns, high-level problem-solving strategies. The most abstract layer, which emerges over time.
According to ByteIota, this inverted architecture — where the agent memorizes, accumulates skills, and improves with each task — is what enables the 6x reduction in token consumption. Instead of re-explaining the context at every execution, the agent draws from its memory layers what is already known.
How skill crystallization actually works
The crystallization mechanism is the central point of GenericAgent. Here is what actually happens when you give a task to the agent.
First, the agent analyzes the task and checks in its Skill Memory (L3) if it already has a relevant skill. If yes, it applies it directly — without going back through the atomic tools. This is where the token savings occur. If no skill matches, the agent breaks the task down into subtasks and uses the 9 atomic tools to solve it step by step.
During execution, every action, every intermediate result, every reasoning step is traced in the Episodic Memory (L2). When the task is complete, the agent evaluates whether the trajectory is generalizable. If yes — and it is the agent that decides — the sequence of actions is abstracted into a high-level skill and stored in L3.
The next time a similar task arrives, the agent already has the skill. It doesn't start from scratch. This is what the GitHub repo calls "crystallizing every verified task trajectory into a reusable skill". The skill tree grows organically with use, as specified by Andrew.ooo.
The CIDM concept: maximizing contextual information density
GenericAgent's approach relies on a principle the authors call "Context Information Density Maximization" (CIDM). The idea is elegant: instead of maximizing the amount of context provided to the agent (which most frameworks do with complex RAG), you maximize the information density of that context.
A skill crystallized in L3 is a distillation of a complete trajectory. Instead of returning the 50 steps of a past reasoning process to the agent, you give it the abstract skill that represents those 50 steps in a few lines. The information density per token explodes. This is why GenericAgent consumes 6x fewer tokens according to ByteIota.
This CIDM principle is fundamentally different from the classic RAG approach. In RAG, you add documents to the context. In CIDM, you compress the agent's experience into reusable structures. The more the agent works, the denser its skills become, the more efficient it is. It is a virtuous circle that exists in no other mainstream framework.
GenericAgent vs established frameworks: another paradigm
Comparing GenericAgent to LangGraph or CrewAI is tricky because they are not the same beasts. But the comparison is inevitable.
| Criterion | GenericAgent | LangGraph | CrewAI |
|---|---|---|---|
| Codebase size | ~3,000 lines | 50,000+ lines | 30,000+ lines |
| Skills approach | Self-evolving, crystallization | Pre-defined by the dev | Pre-defined by the dev |
| Memory | 5 on-demand layers | Basic (state graph) | Basic |
| Tokens per task (reduced) | 6x less | Baseline | Baseline |
| Learning curve | Low | High | Medium |
| Improvement over time | Yes (organic) | No | No |
AntiGravity Codes points out that GenericAgent gives any LLM its 9 tools and its 5-layer memory without requiring complex configuration. No state graphs to draw, no roles to assign. You launch the agent, give it a task, and it improves.
Mervin Praison notes that this compact approach is particularly suited for local system control. For the best autonomous AI agents, the question is no longer "how many skills can you code?" but "how many skills can your agent cultivate?".
Which LLMs to run with GenericAgent
GenericAgent is agnostic to the underlying model. But not all LLMs are equal for a self-evolution architecture. The model must be capable of reliable chain-of-thought reasoning, honest self-evaluation (to decide if a skill is worth crystallizing), and tool composition.
The best candidates according to the June 2025 agentic benchmarks:
GPT-5.5 (OpenAI) — Agentic score 98.2. The safest choice to fully exploit GenericAgent. Its reasoning is reliable enough for skill crystallization to be relevant.
Gemini 3 Pro Deep Think (Google) — Score 95.4. Excellent for tasks that require depth of reasoning before crystallization.
Claude Opus 4.7 Adaptive (Anthropic) — Score 94.3. Particularly good for code tasks where crystallized skills are programming patterns.
Kimi K2.6 Moonshot AI (Self-host) — Score 88.1. The most relevant self-host option. If you want a 100% local agent with GenericAgent, this is the most coherent combo.
GLM-5 Reasoning Z.AI (Self-host) — Score 82. Interesting self-host alternative, especially for deployments in China or constrained environments.
For those who want to run everything locally, open-source AI agents with Ollama remain a complementary option. But for GenericAgent's specific architecture, a model with built-in reasoning yields better crystallization results.
How to create an agent with GenericAgent
Creating an AI agent with GenericAgent is radically simpler than with classic frameworks. No graph definitions, no role configurations, no prompt chains.
Installation is standard. Clone the repo, install the dependencies, configure your API key or local endpoint. The framework exposes a minimalist interface: you initialize the agent with an LLM, and you pass it tasks.
The first task you give the agent will be the most expensive in tokens. That's normal — the agent doesn't have any crystallized skills yet. It has to solve everything with the 9 atomic tools. It's from the second similar task that the token economy starts to manifest. The crystallized skill replaces the chain of atomic tools.
AISignal points out that the agent reaches its full system control progressively. The first few hours of use are a seeding phase — the agent plants the first branches of its skill tree. Then, growth accelerates because each new skill facilitates the acquisition of the next.
For orchestrating multiple GenericAgent instances in parallel, ruflo becomes relevant. One GenericAgent per domain, orchestrated by ruflo, and you get a multi-agent system where each sub-agent self-improves independently.
The LLMs best suited for self-evolving agents
Not all LLMs are equal when faced with a self-evolution architecture. Skill crystallization demands an capacity for abstraction that only the most advanced models truly possess.
The model must be able to identify a pattern in its own execution trajectory. This is a meta-reasoning that goes beyond simple task resolution. GPT-5.5 and Claude Opus 4.7 excel here because their ability to reflect on their own processes is documented in agentic benchmarks.
Self-host models like Kimi K2.6 are interesting but require more seeding tasks before the skill tree becomes truly useful. For choosing the LLM suited to this type of architecture, the best LLMs for AI agents remain the reference for comparing pure reasoning scores.
A crucial point: the model must be honest enough not to crystallize a defective skill. If the model lies about the success of a task, the crystallized skill will be poisoned. This is an inherent risk of self-evolution, and it's why models with the best hallucination scores are preferred.
DeerFlow vs GenericAgent: two visions of self-evolution
ByteDance's DeerFlow and GenericAgent share a common ambition: agents that improve over time. But their approaches diverge deeply.
DeerFlow focuses on research, code, and long-term creation. It's a vertical agent, optimized for a specific creative workflow. GenericAgent is a horizontal framework — it presupposes no domain. You can use it for code, research, system administration, or anything else.
DeerFlow's memory is project-oriented: the agent remembers the context of a project over time. GenericAgent's memory is skill-oriented: the agent remembers cross-functional know-how. A skill learned while fixing a database bug can be reused to administer a server.
In practice, the two approaches are complementary. DeerFlow for long creative workflows, GenericAgent for systems that need to adapt to unpredictable tasks. The fact that both projects emerged almost simultaneously in early 2026 confirms a trend: the AI community is moving away from static workflows toward agents that grow.
The current limitations of GenericAgent
The enthusiasm around its 6,700 stars must not mask the framework's real limitations.
The seeding phase is costly. The first tasks consume more tokens than a classic framework because the agent has no skills. The 6x economy only appears after several dozen tasks. For one-off use, GenericAgent is probably less efficient than a simple script.
The risk of skill poisoning. If the agent crystallizes a skill based on flawed reasoning that nevertheless led to an apparently correct result, this skill will contaminate all future similar tasks. There is not yet an automatic "pruning" mechanism for defective skills in the current version.
Dependence on the underlying LLM. The quality of crystallization depends entirely on the model's ability to abstract correctly. With a weak model, the skill tree will be filled with skills that are too specific or too vague to be reusable.
The absence of native monitoring. The repo is minimal. No dashboard to visualize the skill tree, no skill health metrics, no interface to inspect the 5 memory layers. It's researcher code, not a product.
AISignal positions GenericAgent as a 3.3K-line seed — the word is important. It's a seed, not a mature tree. The community will have to build around it.
❌ Common mistakes
Mistake 1: Judging GenericAgent on the first task
Running a single task and concluding that GenericAgent consumes too many tokens is like judging an engine on its first warm-up lap. The 6x economy is measured on repetitive tasks where the skills have had time to be crystallized. Give it at least 20-30 tasks in the same domain before evaluating.
Mistake 2: Using an LLM without reasoning capability
Plugging in a small 7B model without built-in reasoning and expecting relevant skill crystallization. Models from the agentic list with scores above 80 are the bare minimum. Below that, the agent crystallizes noise.
Mistake 3: Ignoring the seeding phase
Assuming the agent is "ready" right after installation. The first few hours of use are a skill tree building phase. If you give it critical tasks right from the start, the agent doesn't have the skills yet to handle them correctly.
Mistake 4: Comparing line by line with LangGraph
Criticizing the absence of feature X or Y that LangGraph offers. GenericAgent does not want to replace LangGraph on its own turf (deterministic workflows). It offers a different paradigm. The relevant comparison is on results, not on features.
❓ Frequently Asked Questions
Does GenericAgent replace LangGraph or CrewAI?
No. GenericAgent addresses a different problem: the self-evolution of skills. If you have a well-defined deterministic workflow, LangGraph remains more suitable. If you have an agent that needs to adapt to unpredictable tasks and improve over time, GenericAgent is the right tool.
Can GenericAgent be used locally without a paid API?
Yes. The framework is compatible with self-hosted models like Kimi K2.6 (score 88.1) or GLM-5 Reasoning (score 82). Skill crystallization will work, but will require more seeding tasks than with GPT-5.5.
What happens if a crystallized skill is incorrect?
This is a real risk of the current system. The incorrect skill will be applied to similar future tasks, amplifying the error. The current version does not offer an automatic mechanism for detecting or removing defective skills. This requires manual oversight.
How many tasks before seeing the 6x savings?
The arXiv paper does not give an exact figure, but user feedback suggests that significant savings appear between 15 and 30 tasks in a similar domain. Below 10 tasks, the overhead of the 5-layer memory can even make GenericAgent more expensive than a traditional agent.
Is the framework production-ready?
Not on its own. The 3K-line repo is an extremely promising proof-of-concept, but it lacks the monitoring, debugging, and skill lifecycle management tools necessary for a production deployment. You will need to build these layers around it, or wait for the community to develop them.
✅ Conclusion
GenericAgent is the most convincing proof to date that an AI agent can learn by doing rather than by being programmed. In 3,000 lines of code, the framework solves a problem the industry was treating by always adding more complexity: how to make an agent improve organically without human intervention. The GitHub repo deserves its 6,700 stars — not because it is finished, but because the seed is the right one. To explore the landscape of self-evolving agents more broadly, check out our guide to the best autonomous AI agents.