DeerFlow from ByteDance: the open-source agent that researches, codes, and creates over the long term

Agents IA 🟢 Beginner ⏱️ 12 min read 📅 2026-05-09

ByteDance's DeerFlow: the open-source agent that researches, codes, and creates over the long term

🔎 The agent that doesn't stop at the first answer

On February 28, 2026, a ByteDance GitHub repo exploded to the top of the global trending charts. DeerFlow 2.0 now boasts over 65,000 stars, a score rarely reached by an open-source project in agentic AI.

The reason is simple: DeerFlow doesn't just answer a question. It orchestrates complex tasks that unfold over hours, or even days. In-depth research, sandboxed code execution, file writing, spawning specialized sub-agents — all with memory that persists between sessions.

This is the shift from "one-shot" agents to "long-horizon" agents. And it's open-source.

The essentials

DeerFlow is a super agent harness: a main agent that orchestrates sub-agents, each with its own context and dedicated tools.
Each task executes in an isolated container (sandbox), with a complete filesystem — no risk of breaking your machine.
Long-term memory is persisted locally, with automatic fact deduplication and an evolving user profile.
Skills are defined in Markdown (.skill files), infinitely extensible, and loaded on demand.
DeerFlow 2.0 is a complete rewrite that shares no code with version 1. It's a production framework, not a POC.
It is model-agnostic: compatible with any OpenAI-compatible model, from the best LLMs for AI agents to local models.

Recommended tools

Tool	Main use	Price (June 2025, check official site)	Ideal for
DeerFlow	Open-source super agent	Free (MIT)	Long-horizon complex tasks
Claude Code	Agentic IDE	From $20/month	Agent-assisted development
Hermes Agent	Agent with delegation and checkpoints	Free (open-source)	Agentic pipelines with resumption
Perplexity	AI search	Free / $20/month Pro	Fast factual research
Cursor	AI code editor	$20/month	Daily coding with context

Architecture: a harness, not just a simple agent

DeerFlow doesn't present itself as just another agent. It's a harness — a hosting structure that coordinates multiple autonomous components.

The main agent receives a task. It breaks it down, decides which sub-agents to mobilize, which tools to load, and supervises the execution. Each sub-agent operates in a scoped context: it only sees what it needs.

This architecture fundamentally differs from the best autonomous AI agents like AutoGPT, which launched into chaotic loops without fine orchestration. DeerFlow brings the structuring that AutoGPT never had.

The Skills system

Skills are the extensible core of DeerFlow. Each skill is a Markdown file (.skill) that describes a workflow: what to do, which tools to use, how to structure the output.

Loading is progressive. DeerFlow only loads into memory the skills relevant to the current task. You can add a skill simply by dropping a file — no need to modify the source code.

A claude-code skill even allows you to interact directly with Claude Code from within DeerFlow. It's an integration, not a competition.

Parallel sub-agents

When a task breaks down into independent subtasks, DeerFlow spawns them in parallel. Each sub-agent has its own context, its own sandbox, and its own tools.

A sub-agent researching the Gemini 3.1 Pro documentation doesn't need to know that another sub-agent is writing unit tests. The separation is strict.

Sandbox: executing without risk

Every task in DeerFlow executes in an isolated environment via an AioSandboxProvider. Concretely: a container with a complete filesystem.

You ask DeerFlow to scrape a site, analyze the data, generate a Python script, and execute it? Everything happens inside the container. Nothing touches your host system.

This is a decisive advantage over agents that execute directly in your terminal. Hermes Agent also offers code execution with checkpoints, but DeerFlow's containerized approach goes further in terms of isolation.

The sandbox filesystem is persistent for the duration of the task. A sub-agent can write a file, and another can read it — as long as they share the same task context.

Why this is critical for the long-horizon

A task lasting 4 hours will generate dozens of intermediate files, logs, and artifacts. Without a sandbox, it's chaos on your disk. With one, everything remains contained and cleanable in one go.

Long-term memory: the real differentiator

Most agents forget everything at the end of the conversation. DeerFlow maintains a persistent user profile locally, stored on your machine.

This profile evolves over sessions. If you tell DeerFlow that you prefer TypeScript over Python, this preference is recorded and reused. No system prompt to repeat at each session.

Automatic fact deduplication prevents redundant proliferation. If three consecutive sessions learn that you work for a SaaS startup, DeerFlow only stores the information once.

This is a similar approach to what long-term memory for an AI avatar allows, but applied to a production agent rather than a conversational chatbot.

Context Engineering

DeerFlow doesn't just store facts. It practices context engineering: selecting and organizing the relevant information for each task, based on the current context.

If you launch a research task, DeerFlow draws from memory the elements related to your area of expertise. If it's a coding task, it pulls up your technical preferences. Memory isn't a raw dump — it's a structured knowledge base.

Compatible models: from GPT-5.5 to local

DeerFlow is model-agnostic. It works with any model compatible with the OpenAI API. In practice, some models are better suited than others for agentic tasks.

For super agent orchestration, the agentic leaderboard gives a clear advantage to OpenAI's GPT-5.5 (agentic score: 98.2) and Gemini 3 Pro Deep Think (95.4). Claude Opus 4.7 Adaptive follows at 94.3.

For specialized sub-agents, you can go lower in the range. A sub-agent whose only job is to extract links from an HTML page doesn't need GPT-5.5. Claude Sonnet 4.6 (agentic score: 81.4) or even a local model via Ollama can be sufficient.

This flexibility has a direct impact on costs. You reserve premium models for orchestration and use lightweight models for subordinate tasks.

Model	Agentic Score	Ideal role in DeerFlow
GPT-5.5 (OpenAI)	98.2	Main orchestrator
Gemini 3 Pro Deep Think	95.4	Complex research tasks
Claude Opus 4.7 Adaptive	94.3	Orchestration + code
GPT-5.4 Pro (OpenAI)	91.8	Code sub-agent
Kimi K2.6 (Self-host)	88.1	Local sub-agent
Claude Sonnet 4.6	81.4	Simple tasks, extraction

Concrete use cases

Multi-day in-depth research

You ask: "Analyze all the publications from Stanford University on transformers since 2024, synthesize the findings, and produce a report with reproducible code."

DeerFlow will: launch sub-agents to browse the archives, download the PDFs, analyze them, cross-reference the results, write the report in LaTeX, generate the graphs in Python in the sandbox, and put it all together. It can take hours. DeerFlow handles it.

This is where the difference with Perplexity or AI search engines is glaring. Perplexity gives an answer in 30 seconds. DeerFlow works for hours and produces a complete deliverable.

Full project development

Give it a spec sheet. DeerFlow can: scaffold the project, write the code, run the tests in the sandbox, fix the errors, iterate. All with memory of your coding conventions.

The Claude Code integration via skill even allows you to switch to an agentic IDE for the refinement phases. The two ecosystems communicate.

Automated tech watch

Set up a recurring task: DeerFlow monitors GitHub repos, arXiv, and tech blogs. It synthesizes the news into a weekly report, stores the trends in long-term memory, and alerts you on topics relevant to your stack.

DeerFlow vs AutoGPT: the end of chaos

AutoGPT (2023) was a promise: an autonomous agent that accomplishes tasks alone. The reality: infinite loops, uncontrolled API calls, exploding costs, zero structuring.

DeerFlow learned from these failures. The comparison is telling.

Criterion	AutoGPT	DeerFlow 2.0
Architecture	Monolithic loop	Harness with sub-agents
Isolation	None (direct execution)	Containerized sandbox
Memory	Conversation context only	Persistent, deduplicated
Extensibility	Unreliable plugins	Markdown skills
Observability	Basic logs	LangSmith / Langfuse tracing
Production-ready	No	Yes

AutoGPT was a proof of concept. DeerFlow is a production framework. The difference isn't incremental — it's a category shift.

DeerFlow vs Claude Code vs Hermes Agent

The agentic ecosystem of 2026 is rich, and each tool has its place. The question isn't "which one is the best" but "what problem does it solve".

Claude Code excels in interactive development. You are in your editor, Claude Code understands your codebase, suggests modifications, applies them. It's a proximity agent — it works alongside you, in real time.

Hermes Agent shines in pipelines with delegation and checkpoints. You define an agent workflow, each step is checkpointed, resumable in case of failure. It's orchestration with a safety net.

DeerFlow targets the long-horizon. A task that overflows the session, that requires research, code, synthesis, over hours. Claude Code doesn't do that. Hermes Agent does it partially. DeerFlow is designed for it.

The choice between these approaches echoes the broader question of RAG vs fine-tuning vs agents: there is no universal solution, only the right approach for the right problem.

MCP Server and integrations

DeerFlow supports the MCP (Model Context Protocol). Concretely, you can connect external tools without modifying DeerFlow's code — just plug in an MCP server.

This opens up the field of possibilities: connection to databases, enterprise APIs, monitoring tools, remote file systems. The super agent can then interact with your existing infrastructure.

MCP support is still in active development, but it's clearly the direction the agentic ecosystem is taking in 2026. Agents that don't support MCP risk becoming islands.

Observability: LangSmith and Langfuse

An agent that runs for 4 hours without you knowing what's happening is a nightmare. DeerFlow natively integrates tracing via LangSmith and Langfuse.

You can follow in real time: which sub-agent was spawned, what tools it called, how many tokens were consumed, where the time was spent. If something is blocking, you see it immediately.

This is a point often underestimated in agent comparisons, but in production, it's what makes the difference between a usable tool and a toy.

Hosting and infrastructure

DeerFlow runs locally. You clone the repo, configure your API keys, and launch it. No proprietary cloud, no lock-in.

For long-horizon tasks, you need a machine that runs. A VPS does the job. Hostinger offers instances starting from a few euros per month, sufficient to run DeerFlow with lightweight sub-agents.

If you want to keep everything local (models included), you can couple DeerFlow with Ollama and local open source AI agents. The performance will be lower than GPT-5.5, but confidentiality is total.

❌ Common mistakes

Mistake 1: Using DeerFlow for simple tasks

Launching DeerFlow to summarize a 500-word article is using a jackhammer to drive a nail. The orchestration overhead (spawning sub-agents, initializing the sandbox, loading skills) is counter-productive. Use Perplexity or a classic chat.

Mistake 2: Putting GPT-5.5 everywhere

GPT-5.5 is expensive and slow compared to lightweight models. If you put it on every sub-agent, your costs explode. Reserve it for the orchestrator. For text extraction or file sorting, Claude Sonnet 4.6 or a local model are sufficient.

Mistake 3: Ignoring the sandbox

Disabling the sandbox to "go faster" is a mistake. Without isolation, a sub-agent that writes a malicious script (not intentionally, but through hallucination) can affect your system. The sandbox is not optional in production.

Mistake 4: Not configuring memory

DeerFlow's long-term memory is powerful, but it requires a minimum of configuration. If you don't give it an initial context (your preferences, your stack, your goals), the profile remains empty and the agent personalizes nothing. Invest 10 minutes in the initial profile.

Mistake 5: Wrongly comparing it to chatbots

DeerFlow is not a chatbot. It's not Claude, it's not ChatGPT. If you evaluate it on the quality of its one-shot responses, you're missing the point. It must be evaluated on its ability to carry a complex project from start to finish.

❓ Frequently asked questions

Does DeerFlow replace Claude Code?

No. Claude Code is an interactive development agent in your editor. DeerFlow is a long-horizon task orchestrator. They are complementary — DeerFlow even integrates Claude Code via a skill.

Can DeerFlow be used without an internet connection?

Partially. If you couple DeerFlow with local models via Ollama, the inference is done locally. But web research tasks require a connection. The sandbox works entirely locally.

What is the difference between v1 and v2?

Everything. DeerFlow 2.0 is a complete rewrite that shares no code with v1. The sub-agent architecture, the sandbox, the Markdown skills, the persistent memory — everything is new. v1 was experimental, v2 is production.

How much does a DeerFlow task cost?

It depends entirely on the models used and the duration. A 2-hour task with GPT-5.5 as orchestrator and Claude Sonnet 4.6 as sub-agents can cost between $2 and $15 in tokens. With local models, it's free but slower.

Is DeerFlow suitable for enterprises?

The framework is. But the locally stored memory and the lack of native access control require encapsulation work for serious enterprise use. ByteDance uses it internally, but their setup is obviously more sophisticated than the open-source repo.

✅ Conclusion

DeerFlow marks the transition from toy agents to production agents: structured orchestration, isolated sandbox, persistent memory, extensible skills. It's the framework AutoGPT should have been.

If you have complex tasks that go beyond the scope of a single conversation — multi-source research, project development, automated monitoring — DeerFlow deserves your attention. And to understand how to integrate it into a broader AI strategy, see our guide on how to create an AI agent.

#deerflow #bytedance #agent-ia #intelligence-artificielle #code-ia #Open Source

📚 Related articles

Agents IA 🟢 Débutant 14 min

Is Grep All You Need? : Why AI Agents Prefer Grep to Vector Search

Discover why AI agents prefer grep over vector search and RAG. A study shows 93% accuracy with a simple grep.

2026-05-17 17:05

Agents IA 🟢 Débutant 15 min

FutureSim: this benchmark makes AI agents replay 3 months of real events to evaluate them

Discover FutureSim: the new benchmark making AI agents replay 3 months of real events to evaluate their continuous adaptation capacity.

2026-05-17 16:02