ByteDance's DeerFlow: the open-source agent that researches, codes, and creates over the long term
🔎 Why the AI world just shifted toward long-horizon agents
Since late 2024, the AI agent ecosystem has resembled an army of soldiers perfect for 5-minute missions. They draft an email, fix a bug, summarize a PDF. But as soon as you entrust them with a project spanning several hours or days, everything falls apart.
The problem is no longer the reasoning capabilities of the models. GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro reason well enough to plan a complex task. The problem is orchestration: memory, state persistence, parallel sub-task management, and above all, the ability to resume work after an interruption.
This is exactly the gap ByteDance has just filled with DeerFlow. In early 2025, the Chinese giant open-sourced a framework designed from the ground up for long-term missions. The project surpassed the 65,000-star mark on GitHub in a matter of weeks, a strong signal that goes beyond a mere fad.
DeerFlow does not seek to replace Claude Code or AutoGPT on their own turf. It changes the very definition of what an agent is capable of enduring.
The essentials
- DeerFlow is an open-source "long-horizon SuperAgent harness" developed by ByteDance, designed to execute complex tasks spanning hours or days.
- Its architecture rests on four pillars: isolated sandboxes, persistent memory, a delegated sub-agent system, and a modular skill library.
- Unlike point agents that lose the thread after a few iterations, DeerFlow maintains a structured context via checkpoints and a memory graph.
- The project positions itself as a model-agnostic orchestration layer, working with GPT-4o, Claude, or open-source models.
Recommended tools
| Tool | Main usage | Price (January 2025, check on github.com) | Ideal for |
|---|---|---|---|
| DeerFlow | Long-horizon agent | Open-source (Apache 2.0) | Complex projects over several days |
| AutoGPT | General-purpose autonomous agent | Open-source | Rapid prototyping of short tasks |
| CrewAI | Multi-agent orchestration | Open-source / Enterprise | Collaborative workflows between agents |
| Claude Code | Development agent | Claude Pro/Max subscription | Interactive software development |
DeerFlow Architecture — An engine built to last
DeerFlow is not an agent. It is a harness that transforms any LLM into an agent capable of managing long-term projects. The distinction is fundamental.
A classic agent like AutoGPT encapsulates model, prompts, and tools in a single loop. DeerFlow separates these layers with an architectural rigor reminiscent of microservices.
The sandbox system
Each complex task is executed in an isolated sandbox. This is not just a simple Docker container: DeerFlow manages environments with their own state, a persistent file system, and dedicated environment variables.
If a sub-agent needs to install Python dependencies, compile code, or execute scripts, it does so without risking the corruption of another sub-agent's context. This approach solves the classic problem of agents self-sabotaging by modifying shared files.
The sandbox also includes a snapshot system. At each critical step, the complete state of the environment is saved. In the event of failure or drift, the agent can revert to a previous checkpoint without starting from scratch. This is a mechanism found in specialized projects like Code Execution and Checkpoints in Hermes Agent, but here it is native to the entire pipeline.
Persistent and structured memory
Most agents manage memory via simple context windowing or basic RAG. DeerFlow implements a three-layer memory graph.
The first layer is working memory: the immediate conversational context, limited as it is for any LLM. The second layer is episodic memory: a structured log of all actions, decisions, and intermediate results. The third layer is semantic memory: a vector space where the agent stores the knowledge acquired during the project.
Concretely, if DeerFlow spends 6 hours researching data on a financial sector, it does not lose this knowledge when it moves on to the drafting phase. The mechanism is similar to what we see in the custom architectures described in Comment donner une mémoire long-terme à son avatar IA, but adapted to a production context rather than a conversational avatar.
Sub-agents and delegation
DeerFlow breaks down complex tasks into a sub-task graph, then assigns each sub-task to a specialized agent. These sub-agents share the project's memory but operate in their respective sandboxes.
A "researcher" agent can browse the web and compile data. A "coder" agent can implement features based on this data. A "reviewer" agent can audit the produced code. Coordination is handled by an "orchestrator" agent that manages dependencies and data flow.
This delegation architecture is not new in itself. But where frameworks like CrewAI offer it as an option, DeerFlow makes it mandatory for any task exceeding a certain complexity threshold. The délégation et sous-agents de Hermes Agent system explores similar mechanisms, but with a focus more oriented toward pure developers.
Concrete use cases — What DeerFlow actually does
The promise of a long-term agent remains abstract without tangible examples. Here are three scenarios where DeerFlow shows a decisive advantage.
In-depth financial research
An analyst asking a classic agent to "analyze the European solar energy sector" will get a generic summary based on the model's training data, possibly enriched with a few superficial web searches.
With DeerFlow, the same request triggers a process that can last several hours. One sub-agent maps the companies in the sector. Another downloads and analyzes annual financial reports. A third cross-references this data with country-specific regulations. The final result is a structured report with verifiable sources.
This is the same type of approach found in specialized agents like Dexter : un agent IA autonome qui fait de la recherche financière profonde, but DeerFlow opens it up to any domain via its modular skill system.
Multi-file software development
Claude Code excels at modifying a file or implementing a circumscribed feature. But asking it to "build a complete SaaS application with authentication, database, frontend, and API" quickly pushes it to its contextual limits.
DeerFlow tackles this type of project by breaking it down into dozens of sub-tasks: initial architecture, database schema, API endpoints, frontend components, unit tests, documentation. Each sub-task is assigned to a specialized agent in a dedicated sandbox. The orchestrator manages dependencies and validates that each piece fits together correctly.
Large-scale content creation
A content creation project that requires researching 50 sources, synthesizing them into a detailed outline, and then drafting 20 interconnected articles cannot be managed by a classic agent. DeerFlow can orchestrate this workflow over several days, with checkpoints allowing for editorial direction validation midway through.
DeerFlow vs the existing ecosystem — Where does it really stand
The AI agent market is crowded. Positioning DeerFlow requires being precise about what it doesn't do as much as what it does.
DeerFlow vs AutoGPT
AutoGPT was the pioneer in 2023. Its approach: a single-loop agent with tool access, launched and left to its own devices. The result was often disappointing: the agent loops endlessly, loses the thread, gets bogged down in recursive loops.
DeerFlow learns from these failures. Where AutoGPT is an agent that tries to do everything alone, DeerFlow is an orchestrator that delegates intelligently. AutoGPT handles failure poorly: when an action fails, it retries with similar variations. DeerFlow isolates the failure in its sandbox and escalates the problem to the orchestrator, which can restructure the plan.
The fundamental difference is the granularity of control. AutoGPT is an autonomous agent. DeerFlow is a supervised multi-agent system.
DeerFlow vs Claude Code
Anthropic's Claude Code is remarkable for interactive software development. It reads your codebase, understands the context, and proposes precise modifications. But it remains a synchronous agent: you give it an instruction, it executes, you validate.
DeerFlow operates in asynchronous and autonomous mode. You give it an objective, and it works for hours without intervention. This difference isn't just a matter of convenience. It opens up entire categories of tasks that are simply impossible in synchronous mode.
However, for day-to-day interactive development, Claude Code remains probably more ergonomic. DeerFlow is a project tool, not a session assistant.
DeerFlow vs CrewAI and LangGraph
CrewAI and LangGraph are mature multi-agent orchestration frameworks. They allow you to define teams of agents with roles and workflows. So what justifies DeerFlow's 65K stars?
The answer is native long-term integration. CrewAI and LangGraph give you the building blocks to construct a long-horizon system, but you have to design the memory, checkpoints, and persistence yourself. DeerFlow delivers all of that ready-to-use.
In terms of a metaphor: CrewAI is a set of LEGOs, LangGraph is a carpentry workshop, DeerFlow is a furnished house. You can customize the house, but it's livable from the start.
Skills — The real technical differentiator
DeerFlow's skill system deserves special attention because that is where much of its operational power lies.
A "skill" in DeerFlow is an autonomous module that encapsulates specific expertise: in-depth web research, financial document analysis, code generation following a specific pattern, data extraction from an API, etc.
These skills are composable. The orchestrator can chain a "web_research" skill with a "data_extraction" skill, then a "report_generation" skill. The output of one skill feeds the input of the next, with the memory graph ensuring coherence.
The system is open: developers can create their own skills and share them. ByteDance publishes an official registry of skills, but the framework is designed for the community to develop new ones. It's an ecosystem approach reminiscent of the plugin model, but with much deeper integration at the agent's reasoning level.
Limits and challenges — What DeerFlow doesn't solve yet
Despite its impressive architecture, DeerFlow is not a magic solution. Several structural challenges remain.
Computational cost
A project that mobilizes 5 sub-agents for 6 hours, with regular checkpoints and persistent vector memory, massively consumes tokens. On proprietary models like GPT-4o or Claude 3.5 Sonnet, the bill can quickly become prohibitive for personal projects.
Using local open-source models (Llama 3, Qwen 2.5) via providers like vLLM or Ollama reduces this cost, but at the price of a significant drop in reasoning capabilities for the most complex tasks.
Latency of long projects
A project that spans 8 hours is not instantaneous. DeerFlow manages this latency well in technical terms, but the user experience still needs refining. Receiving a report 8 hours after launching a request requires a change in habit compared to the immediacy expected from AI.
The intermediate validation problem
For very long projects, how do you ensure the agent doesn't gradually drift from the initial objective? DeerFlow implements human-in-the-loop checkpoint mechanisms, but finding and configuring them correctly requires expertise.
❌ Common mistakes
Mistake 1: Confusing DeerFlow with an LLM
DeerFlow is not a language model. It's an orchestration framework that uses existing LLMs. Asking it to "generate text" without having configured an appropriate skill is like using a car engine without a body. The added value is in the orchestration, not in the generation.
Mistake 2: Launching DeerFlow without defining checkpoints
The temptation is great to configure an objective, launch the agent, and come back the next day. It's the best way to discover 15 hours of wasted computation because the agent took a wrong turn in the second hour. Define human validation points, especially during initial uses.
Mistake 3: Using GPT-4o for every sub-agent
Not all sub-agents need the most expensive model. The researcher can use a performant model for understanding. The coder can use a model specialized in code. The reviewer can use a lighter model. Decoupling the model choice by role is a critical optimization that DeerFlow allows but must be configured manually.
Mistake 4: Ignoring sandbox size
Each sandbox consumes resources. Launching 10 sub-agents in parallel with full sandboxes can saturate your infrastructure. Start with 2-3 sub-agents and increase gradually while monitoring consumption.
❓ Frequently asked questions
Does DeerFlow replace Claude Code or Copilot?
No. Claude Code and Copilot are interactive development assistants for punctual tasks. DeerFlow is a long-term project orchestrator. They are complementary: you can use Claude Code for an immediate fix and DeerFlow to build a complete project.
Can DeerFlow be used with local open-source models?
Yes. DeerFlow is independent of the underlying model via a provider abstraction. You can configure sub-agents with Llama 3 or Qwen 2.5 via vLLM or Ollama. Reasoning performance will be lower than GPT-4o on the most complex tasks.
What is the real cost of a DeerFlow project?
It depends entirely on the duration, the number of sub-agents, and the model used. A 4-hour project with 3 sub-agents on GPT-4o can cost between 10 and 50 dollars. The same project with local open-source models only costs in compute infrastructure.
Is DeerFlow suitable for enterprises?
The framework is open-source (Apache 2.0), which allows code auditing. However, secrets management, GDPR compliance for vector memory, and network isolation for sandboxes require rigorous enterprise configuration that is not provided out-of-the-box.
What is the difference between a skill and a sub-agent?
A skill is a competency module (research, code, analysis) that a sub-agent can use. A sub-agent is an autonomous instance with its own context, sandbox, and reasoning loop. A sub-agent can use multiple skills.
✅ Conclusion
DeerFlow marks an inflection point in agentic AI: the transition from the answer-end agent to the project-manager agent. Its architecture of sandboxes, persistent memory, and delegated sub-agents solves the real problem of complex long-term tasks that neither AutoGPT nor Claude Code address head-on. The project is still young, configuration requires expertise, and computational cost remains a barrier, but the foundations are solid. For anyone who has ever seen a classic agent collapse after 30 minutes of work, exploring the DeerFlow repository on GitHub is worth the investment.
```