📑 Table of contents

Is Grep All You Need? : Why AI Agents Prefer Grep to Vector Search

Agents IA 🟢 Beginner ⏱️ 14 min read 📅 2026-05-17

🔎 RAG just lost its default status

For three years, the AI community built a dogma: for an agent to access a knowledge base, it needs a vector database, embeddings, and a RAG pipeline. This dogma has just shown its age.

A paper published on arXiv in May 2026 (2605.15184) shows that a CLI agent achieves 93% accuracy with a simple inline grep on complex questions, but drops to 55% when the same grep is called in programmatic mode. Same algorithm, same data, radically different results. The changing variable? The agent harness architecture.

Anthropic even replaced its own RAG pipeline with agentic search, according to Robert Heubanks' testimony. An Amazon Science paper (AAAI 2026) measures agentic keyword search at 94.5% of RAG's fidelity, without a vector store. The message is clear: retrieval is no longer an algorithm problem, it's an agent architecture problem.

Doug Turnbull, research engineer and commentator on the paper, summarizes the situation: the agent that dynamically builds its own grep commands in a bash shell outperforms any predefined RAG pipeline. grep becomes a native tool of the agent, not an external service.


The essentials

  • A CLI agent with inline grep achieves 93% accuracy on LongMemEval (116 questions), but drops to 55% in programmatic mode — same algorithm, different architecture.
  • Anthropic replaced its internal RAG pipeline with agentic search driven by a frontier model.
  • An Amazon Science paper (AAAI 2026) shows that Search-R1, trained via RL, beats classic RAG by 24% in relative terms, without a vector store.
  • The vector database goes from being the default to a fallback in modern stacks.
  • The agent harness architecture (how the agent invokes the tool) matters more than the choice of retrieval algorithm.

Tool Main usage Price (June 2025, check official website) Ideal for
Claude Code CLI agent with native shell access Pro/Team subscription Development with inline grep
Codex CLI OpenAI CLI agent ChatGPT Plus subscription Autonomous command-line agents
Gemini CLI Google CLI agent Google AI subscription Multi-source agentic search
Hostinger Hosting to deploy agents Starting at 2.99 €/month AI agent infrastructure

What the paper actually proves

The paper "Is Grep All You Need? How Agent Harnesses Reshape Agentic Search" does not say that grep is magic. It says something deeper: the boundary between "retrieval strategy" and "agent capability" has collapsed.

The experiment is rigorous. 116 questions from LongMemEval, a custom harness called Chronos, and three native CLIs tested: Claude Code, Codex and Gemini CLI. The protocol compares two invocation modes of the same grep tool: inline (the agent types the command directly into the shell) and programmatic (the agent calls grep via an API or a structured wrapper).

The results are unequivocal. 93% in inline, 55% in programmatic. This is not a marginal difference; it is a 38-point chasm. And yet, the executed grep command is identical.

What the paper isolates is the harness effect. When the agent has native shell access, it can iterate on its grep command, refine patterns, combine with other Unix tools, and build a dynamic retrieval chain. In programmatic mode, it is constrained by the API schema, predefined parameters, and context loss between calls.

As Doug Turnbull notes in his analysis, grep as a native tool in bash allows the agent to build its own commands dynamically, rather than being limited by a predefined search API. The agent harness architecture takes precedence over the retrieval algorithm.


Why classic RAG has become a bottleneck

RAG was the default answer to a real problem: how to give an LLM access to a corpus of documents without putting it in the prompt. The approach was logical: embed the documents, store them in a vector database, perform a semantic search, inject the top-k results into the context.

The problem? This approach freezes the retrieval strategy at the time of pipeline design. The engineer chooses the embedding model, the similarity threshold, the number of chunks to retrieve. The agent, on the other hand, has no control over this strategy. It receives the results and has to make do.

This is exactly what the paper from Amazon Science (AAAI 2026) demonstrates. Buzz Grewal reports that Search-R1, a model trained with reinforcement learning for agentic search, beats RAG by 24% in relative terms. Agentic keyword search reaches 94.5% of RAG's fidelity, without any vector database.

The reason is structural. In agentic search, the agent decides on the retrieval strategy at the time of the query. It can choose grep for an exact search, recursive grep for a code pattern, or switch to vector search for a fuzzy semantic query. This is what Pasquale Pillitteri describes as "hybrid pipelines where the agent decides on the tool based on the type of query."

Anthropic came to this conclusion internally. Robert Heubanks reports that Anthropic replaced its RAG pipeline with agentic search. When a frontier model pilots the search, even in hybrid mode, agentic search clearly outperforms traditional RAG. The signal is strong coming from one of the labs that most popularized RAG.

To understand how agents build these dynamic retrieval chains, see Les 5 patterns d'agents IA qui marchent, which details agent architectures that work in production.


Inline grep vs programmatic: understanding the 38-point gap

The key figure in the paper is this: 93% vs 55%. Same tool, same corpus, same model. The only difference is how the agent invokes grep.

In inline mode, the agent has direct shell access. It types grep -rn "pattern" --include="*.md" and sees the results in real time. If there are too many results, it refines the pattern. If there aren't enough, it broadens it. It can chain with head, awk, sed, find. The shell becomes an interactive retrieval environment.

In programmatic mode, the agent calls a search(query="pattern", top_k=10) function. It receives a structured JSON in return. It cannot iterate finely on the command, it cannot combine it with other shell tools, it loses the fluidity of the terminal. The API wrapper has killed the agent's ability to iterate.

This is a result that echoes a fundamental principle of human-computer interaction: the most powerful tool is the one that offers the least abstraction when the user (or agent) knows how to use it. The bash terminal is a richer retrieval interface than a search API, because it composes with the entire Unix ecosystem.

This idea of parallel processing and tool composition resonates with the architectures described in Multi-Stream LLMs : pourquoi le futur des agents IA passe par le traitement parallèle.


Not all models are equal when it comes to agentic search. The ability to build dynamic grep commands, iterate on results, and decide when to switch tools requires specific agentic skills.

The June 2025 agentic ranking is revealing. GPT-5.5 dominates with 98.2, followed by Gemini 3 Pro Deep Think (95.4) and Claude Opus 4.7 Adaptive (94.3). These three models are the ones that, in the paper, achieve the best results with the Chronos harness in inline grep mode.

The correlation is logical. A model with a high agentic score is precisely one that knows how to plan a sequence of actions, use tools, and iterate based on results. These are exactly the skills needed to leverage grep as a dynamic retrieval tool.

On the other hand, models that are less performant in agentic, such as Grok 4.1 (79) or GPT-5.3 Codex (80), tend to generate grep commands that are either too broad or too specific, without the ability for fine-tuning. Their drop in performance in programmatic mode is even more marked, as they cannot compensate through shell iteration.

To choose the right model for your search agents, check out our guide to the best LLMs for AI agents.


RAG is not dead, it is demoted

It would be wrong to conclude that RAG is useless. The paper and the accompanying analyses say something more nuanced: the vector database shifts from being the default to being the fallback.

The architecture recommended by enterprise implementations in 2026, described by Pasquale Pillitteri, is a hybrid pipeline. The agent has several retrieval tools — grep, vector search, structured wiki — and chooses which one to use based on the nature of the query. For an exact search on a technical term, grep. For a fuzzy semantic query on an abstract concept, vector search. For structured facts, the wiki.

This is a major paradigm shift. In classic RAG, vector search was the only tool, applied to all queries. In agentic search, it becomes one tool among others, used when appropriate. The agent is the conductor, not the pipeline.

Julian Pavlov, in his LinkedIn debate, summarizes the tension: the shift from classic RAG to agentic search in 2026 is not a technological replacement, it is a change in the abstraction layer. We are moving from a rigid pipeline to an agent that builds its pipeline on the fly.

For cases where vector search remains relevant, see our comparison of the best LLMs for search.


The impact on production agent architectures

What does this paper change for teams building AI agents in production? Three concrete things.

First, stop building API wrappers around your search tools. If your agent needs to search through files, give it shell access, not a search() API. The paper shows that API abstraction destroys 38 points of precision. That is an unacceptable cost.

Next, invest in the agent harness, not in the retrieval algorithm. Teams spend weeks optimizing their embedding model, their chunking strategy, their similarity threshold. The paper suggests that this effort would be better invested in designing the harness — how the agent invokes tools, how it iterates, how it composes results.

Finally, rethink your retrieval stack in terms of a toolbox, not a pipeline. The agent must have access to grep, to vector search, to SQL queries if relevant, and it must decide which one to use. This is the approach described in our guide to configurer OpenClaw : SOUL, AGENTS et Skills, where the modular architecture allows the agent to choose its tools dynamically.

For teams that want to keep control over their data locally, the agents IA open source avec Ollama offer a framework where the shell and system tools are directly accessible to the agent.


The implications for enterprise information retrieval

The paper has implications that go beyond the scope of CLI agents. In enterprises, millions are invested in vector databases for RAG. This paper suggests that a portion of these investments is misallocated.

Buzz Grewal reports that the Amazon Science paper measures agentic keyword search at 94.5% of RAG's fidelity. In other words, for 94.5% of use cases, an agent doing agentic grep is just as reliable as a full RAG pipeline — without the cost of the vector database, without the cost of continuous embedding, without the maintenance complexity.

The remaining 5.5% corresponds to queries where semantic search is truly superior: abstract concepts, distant paraphrases, crossing of non-explicit themes. This is where the vector database remains useful, as a fallback.

For enterprise teams, the lesson is pragmatic. Start with an agent with grep and keyword search. Add the vector database only for cases where grep is not enough. You will save on infrastructure, on maintenance, and probably on overall performance — because a simpler pipeline is also a pipeline with fewer points of failure.

For cases where an autonomous agent must conduct a multi-step in-depth search, check out our guide to the best AI for search.


The limitations of the paper and what it doesn't prove

The paper has biases that need to be acknowledged. The evaluation is conducted on LongMemEval, a benchmark of 116 questions. This is a specific corpus, with specific types of queries. Generalization to other domains is not demonstrated.

Furthermore, the results depend heavily on the model used. A frontier model like GPT-5.5 (98.2 in agentic) or Claude Opus 4.7 (94.3) knows how to leverage the shell in a sophisticated way. A less capable model might very well achieve better results with a RAG pipeline that guides its search, rather than with an open shell where it gets lost.

The paper also doesn't prove that grep is superior to vector search in absolute terms. It proves that, in the specific context of a CLI agent with a well-designed harness, the simplest retrieval tool (grep) can be the most effective when the agent controls the invocation. This is subtle but important.

Finally, the inline grep mode assumes that the agent has full shell access, which raises security questions in production. Giving bash access to an agent is not without risk, and enterprise security constraints could make the programmatic mode inevitable — along with the 38-point drop that comes with it. This is a trade-off that the paper does not explore.

To explore the most advanced agents that know how to manage these trade-offs, see our comparison of the best autonomous AI agents.


❌ Common mistakes

Mistake 1: Confusing "grep is better than vector search" with "the agent harness is more important than the algorithm"

The paper does not say that grep beats vector search. It says that the way the agent invokes the tool (inline vs programmatic) has more impact than the choice of algorithm. Replacing your vector database with grep without changing your agent architecture will bring you nothing.

Mistake 2: Removing your vector database after reading this article

The vector database goes from default to fallback, it does not disappear. For the 5 to 10% of queries that require true semantic search, it remains the appropriate tool. The mistake is swinging from one dogma to another.

Mistake 3: Giving shell access without guardrails

The paper measures performance in inline mode, but unsandboxed shell access is a major security risk in production. The 93% results are those of a controlled benchmark environment, not a production system.

Mistake 4: Ignoring the correlation with the model's agentic score

A model with a low agentic score will not know how to exploit a shell, even if you give it inline access. The harness does not compensate for the model's shortcomings. Choose the right model first, then optimize the harness.


❓ Frequently Asked Questions

Does agentic search completely replace RAG?

No. Agentic search redefines the role of RAG: vector search becomes a fallback tool among others in the agent's toolkit, not the default pipeline. Hybrid architectures are the recommended approach in 2026.

Why is inline grep so superior to programmatic mode?

Because the shell gives the agent the ability to iterate, combine Unix commands, and adjust its patterns in real time. Programmatic mode freezes the invocation in an API schema that prevents this fine-grained iteration.

Which models use agentic search the best?

The models with the highest agentic scores: GPT-5.5 (98.2), Gemini 3 Pro Deep Think (95.4), Claude Opus 4.7 Adaptive (94.3). Their planning and iteration capability is directly correlated with their performance in agentic search.

Is this applicable outside of CLI agents?

The principle applies beyond the CLI: any agent that can dynamically build its retrieval query (dynamic SQL, Elasticsearch queries built on the fly) will benefit from the same effect. The shell is simply the purest textbook case.

Should we invest in vector databases in 2026?

Yes, but as a fallback, not as a default. Invest first in the agent harness and your agent's ability to choose its retrieval tools. Add the vector database for cases where other tools are not enough.


✅ Conclusion

The paper "Is Grep All You Need?" doesn't kill RAG — it dethrones it. The lesson is not that grep is magical, but that the agent harness architecture has more impact on retrieval quality than the choice of algorithm. The 38-point gap between inline and programmatic proves this irrefutably. To build high-performing search agents, start with the harness, not the vector database.