📑 Table of contents

Anthropic Dreaming: Claude agents learn from their dreams between sessions

Agents IA 🟢 Beginner ⏱️ 14 min read 📅 2026-05-14

Anthropic Dreaming: Claude agents learn from their dreams between sessions

🔎 An AI agent that improves in its sleep — it's happening now

On May 6, 2026, at the Code with Claude Conference in San Francisco, Anthropic unveiled a game-changing feature in the world of AI agents: Dreaming. The concept is both simple and radical. A Claude Managed Agents agent finishes its workday. Then, instead of remaining frozen in its state, it "dreams" — it reviews its past sessions offline, detects recurring error patterns, and refines its memory for the next session.

This is a break from the classic paradigm where an agent forgets everything between two executions. Anthropic isn't just making Claude smarter model by model. The startup is making the agent self-improving, regardless of the underlying model.

The timing is not insignificant. Anthropic was targeting 10x annualized growth in Q1 2026 — the result was 80x. Claude API volume increased by approximately 70x year-over-year according to Forbes. In this context of exploding usage, agent reliability becomes the number one bottleneck. Dreaming is the answer.


The essentials

  • Dreaming is a research-preview feature that allows Claude agents to review their past sessions offline to self-improve their memory and future behavior.
  • Outcomes introduces a rubric-based evaluation system, allowing you to define a quality threshold that the agent must meet before considering a task complete.
  • Multiagent orchestration makes it possible to parallelize complex tasks across multiple agents — Netflix is already using it in production.
  • Harvey reports a 6x leap in task completion rates with these new capabilities.
  • Anthropic claims that teams using Managed Agents ship 10x faster than before.

Tool Main use Price (May 2026, check on claude.com) Ideal for
Claude Managed Agents Self-improving agents with Dreaming On quote (enterprise) Teams that want agents that learn
Claude Opus 4.7 (Adaptive) Anthropic's most powerful agentic model Per API usage Complex tasks requiring deep reasoning
Claude Sonnet 4.6 Cost/perf balanced agentic model Per API usage High-volume agents with a good quality/price ratio

Dreaming: how an agent "dreams" exactly

Dreaming doesn't make the agent float in an abstract dream space. The mechanism is concrete and grounded in execution logs.

When a Claude Managed Agents agent finishes a session, Dreaming triggers in the background. It takes the entirety of the interactions — successful actions, errors, hesitations, rollbacks. Then it applies a structured review process to extract lessons from them.

What the agent detects during its "dreams" goes beyond what a human or a prompt could identify. According to Ars Technica, Dreaming spots patterns invisible to an agent alone: recurring errors in specific contexts, workflows that converge toward dead ends, preferences shared within a team of agents.

The analogy with human sleep isn't just a marketing gimmick. During dreaming, the human brain consolidates memory and replays the day's events to extract patterns. Dreaming does the same thing with the agent's execution traces. The difference: it's systematic, exhaustive, and it doesn't produce absurd dreams.

The result is a refined memory that persists between sessions. The agent arrives the next day "knowing" that a certain approach failed three times yesterday, and that a specific code pattern systematically needs a specific correction.

What sets Dreaming apart from a simple log replay

A log replay is passive. You replay the tape, nothing changes. Dreaming is active: the agent generates insights, structures them, and injects them into its configuration for subsequent sessions.

India Today describes it as a reorganization of memory — the agent doesn't store sessions raw, it digests them. This is what makes it fundamentally different from a classic RAG system that injects raw logs into context.

This is where the comparison with Hermes Agent and its context files becomes relevant. Hermes uses CLAUDE.md and AGENTS.md files to manually structure an agent's context. Dreaming automates part of this work: instead of you writing the lessons learned in a file, the agent deduces and integrates them. Both approaches are complementary — you can still provide a framework via context files, and let Dreaming refine it.


Outcomes: the quality safeguard that was missing

An agent that improves is good. An agent that knows if it has improved is better. That's the role of Outcomes.

Outcomes introduces a rubric-based evaluation system. You define success criteria — code accuracy, style compliance, test coverage, adherence to an output format. The agent evaluates itself against these rubrics at the end of each task.

The quality threshold becomes an explicit parameter. If the agent doesn't reach the defined score, it doesn't validate the task. It iterates, asks for help, or signals a blockage.

This mechanism solves a classic problem with autonomous agents: the fact that they can appear to have completed a task when the result is mediocre. With Outcomes, completion isn't binary (done / not done) — it's gradual and measurable.

LetsDataScience reports that this Dreaming + Outcomes combination is what allows Harvey (the legal AI) to achieve its 6x leap in task completion rates. The agent doesn't just do more — it does better, and it knows it.


Multiagent orchestration: when a single agent is no longer enough

Some tasks are too large, too varied, or too parallelizable for a single agent. This is where multiagent orchestration comes in.

The principle: you define a workflow where multiple Claude agents work simultaneously on distinct subtasks, with a coordinator agent synchronizing the results. No artificial sequentiality — independent tasks execute in parallel.

AI News specifies that this system makes it possible to handle tasks that are too large or too varied for a single agent. A typical case: one agent researches, another codes, a third tests, a fourth integrates.

Netflix has already adopted multiagent orchestration

Cite Solutions reports that Netflix is already using Claude's multiagent orchestration in production. For a streaming giant whose technical infrastructure is massive, this is a strong signal of maturity.

The implication is clear: we're no longer in proof-of-concept territory. Claude Managed Agents' multiagent orchestration is going into production at leading companies.

When compared with the best autonomous AI agents, the difference in approach is clear. Frameworks like AutoGPT or agents based on open-source models offer orchestration, but without the native coupling with Dreaming and Outcomes. Anthropic is selling an integrated system where orchestration, self-improvement, and quality control form a coherent whole.

Webhooks: integration with the existing ecosystem

Less sexy than Dreaming but just as crucial: Anthropic has added webhook support to Claude Managed Agents. Your agents can now trigger actions in your existing systems — CI/CD, Slack notifications, Jira ticket updates — without additional middleware.

This is the kind of feature that transforms a demo agent into a production tool. Without webhooks, the agent is an island. With them, it becomes a node in your engineering pipeline.


Dreaming vs alternatives: DeerFlow, Hermes Agent and the rest

The AI agent framework market is crowded. Where does Dreaming stand against existing alternatives?

DeerFlow and the workflow approach

DeerFlow (and similar graph-based workflow frameworks) structure an agent's work into predefined steps. It's deterministic and predictable. But the agent doesn't learn between runs — it follows the same graph with the same decisions.

Dreaming is orthogonal: it doesn't replace the workflow structure, it sits on top. You can have a structured workflow AND an agent that improves between runs.

Hermes Agent and context files

As mentioned above, Hermes Agent relies on explicit context files (CLAUDE.md, AGENTS.md) to guide the agent's behavior. It's a top-down approach — the human writes the rules.

Dreaming is bottom-up — the agent discovers the rules itself. In practice, the two complement each other perfectly. You provide the strategic framework via files, and Dreaming refines the tactical details through self-observation.

Open source agents with Ollama

The local approach with Ollama offers total control and data privacy. You choose your model — Kimi K2.6 self-hosted, GLM-5 Reasoning, or others — and you build your stack. But self-improvement between sessions remains your responsibility.

Dreaming is a competitive advantage unique to the Anthropic ecosystem. You pay for a managed product, but you gain a capability that nobody has yet packaged as cleanly in open source.

Comparison of self-improving agent approaches

Approach Self-improvement Native quality control Multiagent orchestration Hosting
Claude Managed Agents + Dreaming Yes (offline) Yes (Outcomes) Yes (native) Anthropic Cloud
Hermes Agent (context files) No (manual) No Partial Local / Cloud
DeerFlow (workflows) No No Yes (graph) Local / Cloud
Open source Ollama Agents No (custom) No (custom) Yes (third-party frameworks) Local only

The models behind it: Claude Opus 4.7 and Sonnet 4.6

Dreaming, Outcomes, and multiagent orchestration are layers above the model. But the underlying model remains crucial for the quality of "dreams" and evaluations.

Anthropic's Claude Opus 4.7 (Adaptive) scores 94.3 on agentic benchmarks — behind GPT-5.5 (98.2) and Gemini 3 Pro Deep Think (95.4), but ahead of GPT-5.4 Pro (91.8). It's the model of choice for complex tasks where deep reasoning during the "dream" makes the difference.

Claude Sonnet 4.6, with its score of 81.4, is the workhorse for high-volume agents. Cheaper, faster, smart enough for the majority of orchestration workflows.

The key point: Dreaming improves both. A Sonnet 4.6 that "dreams" can outperform an Opus 4.7 that doesn't dream, on repetitive tasks where cumulative learning compensates for the difference in raw capability.

For the curious who want to understand how Claude positions itself against the competition on the ground of code and agents, our Claude vs ChatGPT comparison details the strengths and weaknesses of each ecosystem. And for choosing the optimal model in an agentic context, our guide to the best LLMs for coding and the best LLMs for AI agents remains the reference.


The infrastructure question: 220,000 GPUs for all this

Dreaming isn't free in terms of resources. Revising past sessions, detecting patterns, refining memory — all of this consumes tokens and GPU cycles. A lot of them.

This is likely why Anthropic signed a deal with SpaceX for Colossus 1: 220,000 GPUs and 300 MW of power for Claude. The infrastructure needed to make thousands of agents "dream" simultaneously is colossal. Our analysis of this partnership details the stakes of this mega-infrastructure.

Without this computing power, Dreaming would remain a lab demo. With it, it becomes an enterprise-scalable product. The 80x growth Anthropic recorded in Q1 2026 demands this level of infrastructure.


ToolCUA and the evolution toward agents that choose their interface

Dreaming is part of a broader movement: agents becoming smarter about how they interact with the world. ToolCUA illustrates this trend — Computer Use agents that learn to choose between a GUI interaction and an API call depending on the context.

Dreaming goes in the same direction. The agent no longer blindly executes instructions. It reflects on its own interaction patterns and optimizes its approach. The convergence between Dreaming (learning from past mistakes) and ToolCUA (learning to choose the right interface) points toward a fundamentally meta-cognitive generation of agents.


The dashboard that kills the terminal: Claude Code Agent View

To leverage Dreaming effectively, you need to be able to see what the agent has learned. This is where the Claude Code Agent View dashboard comes into play.

Anthropic understood that the terminal alone is no longer sufficient for monitoring complex agents. The dashboard offers a split view of sessions, agent decisions, and — crucially — the insights generated by Dreaming. You can see what the agent "dreamed" and decided to change.

Without this visibility, Dreaming would be an improved black box. With it, it's a transparent process that you can audit and correct.


The numbers that matter: from demo to production

Anecdotes are nice, numbers are better. Here is what sources report:

  • Harvey (legal AI): +6x in task completion rates with Dreaming + Outcomes (LetsDataScience)
  • Managed Agents teams: shipped 10x faster (official Anthropic announcement)
  • Anthropic Q1 2026: 80x growth (vs 10x target), API volume +70x YoY (Forbes)
  • Netflix: adoption of multiagent orchestration in production (Cite Solutions)

The combo is formidable: agents that improve on their own (Dreaming), that know how to evaluate themselves (Outcomes), that parallelize (Orchestration), and that you can monitor (Agent View). The 10x in shipping speed isn't an inflated figure — it's the sum of these four levers.


❌ Common mistakes

Mistake 1: Confusing Dreaming with fine-tuning

Dreaming doesn't modify the model's weights. It refines the agent's memory and behavior at the session configuration level. It's closer to an automatic and cumulative prompt engineering system than to fine-tuning. If you expect Dreaming to make Claude Opus 4.7 smarter as a model, you will be disappointed. It makes the deployed agent more effective.

Mistake 2: Enabling Dreaming on an agent without Outcomes rubrics

An agent that dreams but doesn't know if it did well is an agent that risks memorizing its mistakes as successes. Dreaming and Outcomes are designed to work together. Only enable Dreaming after you have defined clear evaluation rubrics.

Mistake 3: Expecting immediate results

Dreaming is a research-preview feature. Data World Bank reminds us: the agent needs several accumulated sessions before patterns emerge. The first "dreams" will be inconclusive. It's after 10-20 sessions on the same type of task that the difference becomes striking.

Mistake 4: Ignoring the cost dimension

Each Dreaming cycle consumes tokens. On an agent running 50 times a day, this can represent a non-negligible extra cost. Start by enabling Dreaming on your most critical and recurring agents, not across your entire fleet.


❓ Frequently Asked Questions

Is Dreaming available for all Claude users?

No. Dreaming is a feature of Claude Managed Agents, Anthropic's enterprise product. It is not available in the consumer interface or the standard API. It is a quote-based access, geared towards engineering teams.

Does Dreaming work with any Claude model?

Anthropic positions it as model-agnostic, but in practice the most performant models (Claude Opus 4.7, Sonnet 4.6) produce higher quality "dreams" thanks to their superior reasoning capabilities.

Can the agent "unlearn" during a dream?

Theoretically yes, if the Outcomes rubrics are poorly configured and the agent interprets a failure as a success. This is why Outcomes and Dreaming are coupled by design — the rubrics serve as a safeguard.

How does Dreaming compare to traditional fine-tuning?

Fine-tuning modifies the model's weights for all future uses. Dreaming modifies the behavior of a specific agent without touching the model. Lighter, more reversible, more targeted — but less deep.

Is Netflix also using Dreaming or only multiagent orchestration?

The sources only cite Netflix's adoption of multiagent orchestration. It is not specified whether they also use Dreaming. Orchestration is the easiest feature to adopt first.


✅ Conclusion

Dreaming transforms the AI agent from an executive tool into a system that learns from its experience. Add Outcomes for quality control and multiagent orchestration for scaling, and you get what Anthropic promises: teams that ship 10x faster. It remains to monitor compute costs — and to ensure that your agents' dreams remain productive dreams. To dive deeper into the topic of autonomous agents, our comparison of the best AI agents covers the entire 2026 landscape.