ByteDance's DeerFlow: the open-source agent that researches, codes, and creates over the long term
🔎 The agent that doesn't stop at the first answer
On February 28, 2026, a ByteDance GitHub repo exploded to the top of the global trending charts. DeerFlow 2.0 now boasts over 65,000 stars, a score rarely reached by an open-source project in agentic AI.
The reason is simple: DeerFlow doesn't just answer a question. It orchestrates complex tasks that unfold over hours, or even days. In-depth research, sandboxed code execution, file writing, spawning specialized sub-agents — all with memory that persists between sessions.
This is the shift from "one-shot" agents to "long-horizon" agents. And it's open-source.
The essentials
- DeerFlow is a super agent harness: a main agent that orchestrates sub-agents, each with its own context and dedicated tools.
- Each task executes in an isolated container (sandbox), with a complete filesystem — no risk of breaking your machine.
- Long-term memory is persisted locally, with automatic fact deduplication and an evolving user profile.
- Skills are defined in Markdown (
.skillfiles), infinitely extensible, and loaded on demand. - DeerFlow 2.0 is a complete rewrite that shares no code with version 1. It's a production framework, not a POC.
- It is model-agnostic: compatible with any OpenAI-compatible model, from the best LLMs for AI agents to local models.
Recommended tools
| Tool | Main use | Price (June 2025, check official site) | Ideal for |
|---|---|---|---|
| DeerFlow | Open-source super agent | Free (MIT) | Long-horizon complex tasks |
| Claude Code | Agentic IDE | From $20/month | Agent-assisted development |
| Hermes Agent | Agent with delegation and checkpoints | Free (open-source) | Agentic pipelines with resumption |
| Perplexity | AI search | Free / $20/month Pro | Fast factual research |
| Cursor | AI code editor | $20/month | Daily coding with context |
Architecture: a harness, not just a simple agent
DeerFlow doesn't present itself as just another agent. It's a harness — a hosting structure that coordinates multiple autonomous components.
The main agent receives a task. It breaks it down, decides which sub-agents to mobilize, which tools to load, and supervises the execution. Each sub-agent operates in a scoped context: it only sees what it needs.
This architecture fundamentally differs from the best autonomous AI agents like AutoGPT, which launched into chaotic loops without fine orchestration. DeerFlow brings the structuring that AutoGPT never had.
The Skills system
Skills are the extensible core of DeerFlow. Each skill is a Markdown file (.skill) that describes a workflow: what to do, which tools to use, how to structure the output.
Loading is progressive. DeerFlow only loads into memory the skills relevant to the current task. You can add a skill simply by dropping a file — no need to modify the source code.
A claude-code skill even allows you to interact directly with Claude Code from within DeerFlow. It's an integration, not a competition.
Parallel sub-agents
When a task breaks down into independent subtasks, DeerFlow spawns them in parallel. Each sub-agent has its own context, its own sandbox, and its own tools.
A sub-agent researching the Gemini 3.1 Pro documentation doesn't need to know that another sub-agent is writing unit tests. The separation is strict.
Sandbox: executing without risk
Every task in DeerFlow executes in an isolated environment via an AioSandboxProvider. Concretely: a container with a complete filesystem.
You ask DeerFlow to scrape a site, analyze the data, generate a Python script, and execute it? Everything happens inside the container. Nothing touches your host system.
This is a decisive advantage over agents that execute directly in your terminal. Hermes Agent also offers code execution with checkpoints, but DeerFlow's containerized approach goes further in terms of isolation.
The sandbox filesystem is persistent for the duration of the task. A sub-agent can write a file, and another can read it — as long as they share the same task context.
Why this is critical for the long-horizon
A task lasting 4 hours will generate dozens of intermediate files, logs, and artifacts. Without a sandbox, it's chaos on your disk. With one, everything remains contained and cleanable in one go.
Long-term memory: the real differentiator
Most agents forget everything at the end of the conversation. DeerFlow maintains a persistent user profile locally, stored on your machine.
This profile evolves over sessions. If you tell DeerFlow that you prefer TypeScript over Python, this preference is recorded and reused. No system prompt to repeat at each session.
Automatic fact deduplication prevents redundant proliferation. If three consecutive sessions learn that you work for a SaaS startup, DeerFlow only stores the information once.
This is a similar approach to what long-term memory for an AI avatar allows, but applied to a production agent rather than a conversational chatbot.
Context Engineering
DeerFlow doesn't just store facts. It practices context engineering: selecting and organizing the relevant information for each task, based on the current context.
If you launch a research task, DeerFlow draws from memory the elements related to your area of expertise. If it's a coding task, it pulls up your technical preferences. Memory isn't a raw dump — it's a structured knowledge base.
Compatible models: from GPT-5.5 to local
DeerFlow is model-agnostic. It works with any model compatible with the OpenAI API. In practice, some models are better suited than others for agentic tasks.
For super agent orchestration, the agentic leaderboard gives a clear advantage to OpenAI's GPT-5.5 (agentic score: 98.2) and Gemini 3 Pro Deep Think (95.4). Claude Opus 4.7 Adaptive follows at 94.3.
For specialized sub-agents, you can go lower in the range. A sub-agent whose only job is to extract links from an HTML page doesn't need GPT-5.5. Claude Sonnet 4.6 (agentic score: 81.4) or even a local model via Ollama can be sufficient.
This flexibility has a direct impact on costs. You reserve premium models for orchestration and use lightweight models for subordinate tasks.
| Model | Agentic Score | Ideal role in DeerFlow |
|---|---|---|
| GPT-5.5 (OpenAI) | 98.2 | Main orchestrator |
| Gemini 3 Pro Deep Think | 95.4 | Complex research tasks |
| Claude Opus 4.7 Adaptive | 94.3 | Orchestration + code |
| GPT-5.4 Pro (OpenAI) | 91.8 | Code sub-agent |
| Kimi K2.6 (Self-host) | 88.1 | Local sub-agent |
| Claude Sonnet 4.6 | 81.4 | Simple tasks, extraction |
Concrete use cases
Multi-day in-depth research
You ask: "Analyze all the publications from Stanford University on transformers since 2024, synthesize the findings, and produce a report with reproducible code."
DeerFlow will: launch sub-agents to browse the archives, download the PDFs, analyze them, cross-reference the results, write the report in LaTeX, generate the graphs in Python in the sandbox, and put it all together. It can take hours. DeerFlow handles it.
This is where the difference with Perplexity or AI search engines is glaring. Perplexity gives an answer in 30 seconds. DeerFlow works for hours and produces a complete deliverable.
Full project development
Give it a spec sheet. DeerFlow can: scaffold the project, write the code, run the tests in the sandbox, fix the errors, iterate. All with memory of your coding conventions.
The Claude Code integration via skill even allows you to switch to an agentic IDE for the refinement phases. The two ecosystems communicate.
Automated tech watch
Set up a recurring task: DeerFlow monitors GitHub repos, arXiv, and tech blogs. It synthesizes the news into a weekly report, stores the trends in long-term memory, and alerts you on topics relevant to your stack.
DeerFlow vs AutoGPT: the end of chaos
AutoGPT (2023) was a promise: an autonomous agent that accomplishes tasks alone. The reality: infinite loops, uncontrolled API calls, exploding costs, zero structuring.
DeerFlow learned from these failures. The comparison is telling.
| Criterion | AutoGPT | DeerFlow 2.0 |
|---|---|---|
| Architecture | Monolithic loop | Harness with sub-agents |
| Isolation | None (direct execution) | Containerized sandbox |
| Memory | Conversation context only | Persistent, deduplicated |
| Extensibility | Unreliable plugins | Markdown skills |
| Observability | Basic logs | LangSmith / Langfuse tracing |
| Production-ready | No | Yes |
AutoGPT was a proof of concept. DeerFlow is a production framework. The difference isn't incremental — it's a category shift.
DeerFlow vs Claude Code vs Hermes Agent
The agentic ecosystem of 2026 is rich, and each tool has its place. The question isn't "which one is the best" but "what problem does it solve".
Claude Code excels in interactive development. You are in your editor, Claude Code understands your codebase, suggests modifications, applies them. It's a proximity agent — it works alongside you, in real time.
Hermes Agent shines in pipelines with delegation and checkpoints. You define an agent workflow, each step is checkpointed, resumable in case of failure. It's orchestration with a safety net.
DeerFlow targets the long-horizon. A task that overflows the session, that requires research, code, synthesis, over hours. Claude Code doesn't do that. Hermes Agent does it partially. DeerFlow is designed for it.
The choice between these approaches echoes the broader question of RAG vs fine-tuning vs agents: there is no universal solution, only the right approach for the right problem.
MCP Server and integrations
DeerFlow supports the MCP (Model Context Protocol). Concretely, you can connect external tools without modifying DeerFlow's code — just plug in an MCP server.
This opens up the field of possibilities: connection to databases, enterprise APIs, monitoring tools, remote file systems. The super agent can then interact with your existing infrastructure.
MCP support is still in active development, but it's clearly the direction the agentic ecosystem is taking in 2026. Agents that don't support MCP risk becoming islands.
Observability: LangSmith and Langfuse
An agent that runs for 4 hours without you knowing what's happening is a nightmare. DeerFlow natively integrates tracing via LangSmith and Langfuse.
You can follow in real time: which sub-agent was spawned, what tools it called, how many tokens were consumed, where the time was spent. If something is blocking, you see it immediately.
This is a point often underestimated in agent comparisons, but in production, it's what makes the difference between a usable tool and a toy.
Hosting and infrastructure
DeerFlow runs locally. You clone the repo, configure your API keys, and launch it. No proprietary cloud, no lock-in.
For long-horizon tasks, you need a machine that runs. A VPS does the job. Hostinger offers instances starting from a few euros per month, sufficient to run DeerFlow with lightweight sub-agents.
If you want to keep everything local (models included), you can couple DeerFlow with Ollama and local open source AI agents. The performance will be lower than GPT-5.5, but confidentiality is total.
❌ Common mistakes
Mistake 1: Using DeerFlow for simple tasks
Launching DeerFlow to summarize a 500-word article is using a jackhammer to drive a nail. The orchestration overhead (spawning sub-agents, initializing the sandbox, loading skills) is counter-productive. Use Perplexity or a classic chat.
Mistake 2: Putting GPT-5.5 everywhere
GPT-5.5 is expensive and slow compared to lightweight models. If you put it on every sub-agent, your costs explode. Reserve it for the orchestrator. For text extraction or file sorting, Claude Sonnet 4.6 or a local model are sufficient.
Mistake 3: Ignoring the sandbox
Disabling the sandbox to "go faster" is a mistake. Without isolation, a sub-agent that writes a malicious script (not intentionally, but through hallucination) can affect your system. The sandbox is not optional in production.
Mistake 4: Not configuring memory
DeerFlow's long-term memory is powerful, but it requires a minimum of configuration. If you don't give it an initial context (your preferences, your stack, your goals), the profile remains empty and the agent personalizes nothing. Invest 10 minutes in the initial profile.
Mistake 5: Wrongly comparing it to chatbots
DeerFlow is not a chatbot. It's not Claude, it's not ChatGPT. If you evaluate it on the quality of its one-shot responses, you're missing the point. It must be evaluated on its ability to carry a complex project from start to finish.
❓ Frequently asked questions
Does DeerFlow replace Claude Code?
No. Claude Code is an interactive development agent in your editor. DeerFlow is a long-horizon task orchestrator. They are complementary — DeerFlow even integrates Claude Code via a skill.
Can DeerFlow be used without an internet connection?
Partially. If you couple DeerFlow with local models via Ollama, the inference is done locally. But web research tasks require a connection. The sandbox works entirely locally.
What is the difference between v1 and v2?
Everything. DeerFlow 2.0 is a complete rewrite that shares no code with v1. The sub-agent architecture, the sandbox, the Markdown skills, the persistent memory — everything is new. v1 was experimental, v2 is production.
How much does a DeerFlow task cost?
It depends entirely on the models used and the duration. A 2-hour task with GPT-5.5 as orchestrator and Claude Sonnet 4.6 as sub-agents can cost between $2 and $15 in tokens. With local models, it's free but slower.
Is DeerFlow suitable for enterprises?
The framework is. But the locally stored memory and the lack of native access control require encapsulation work for serious enterprise use. ByteDance uses it internally, but their setup is obviously more sophisticated than the open-source repo.
✅ Conclusion
DeerFlow marks the transition from toy agents to production agents: structured orchestration, isolated sandbox, persistent memory, extensible skills. It's the framework AutoGPT should have been.
If you have complex tasks that go beyond the scope of a single conversation — multi-source research, project development, automated monitoring — DeerFlow deserves your attention. And to understand how to integrate it into a broader AI strategy, see our guide on how to create an AI agent.