Poolside Laguna M.1: the 225B open-source model for the coding agent, Apache 2.0

LLM & Modèles 🟢 Beginner ⏱️ 12 min read 📅 2026-06-27

Poolside Laguna M.1: the 225B open-source model for the coding agent, Apache 2.0

🔎 The coding agent market just shifted

In October 2025, Poolside raises $2 billion at a $12 billion valuation, backed by Nvidia, eBay, and Bain Capital Ventures. Less than six months later, the French startup founded in 2023 delivers its first true flagship model: Laguna M.1.

The timing is not coincidental. The AI coding tools market is in full consolidation — SpaceX recently invested in Anysphere at a $60 billion valuation. In this context, companies are looking for alternatives to the proprietary APIs of OpenAI and Anthropic for reasons of cost, confidentiality, and sovereignty.

Laguna M.1 answers exactly this demand: 225 billion parameters, 23 billion active per token, Apache 2.0 license, and benchmarks that place it on par with the best proprietary models in software engineering. It is the most serious open-source model for agentic coding to date.

The essentials

Laguna M.1 is a sparse mixture-of-experts (MoE) model with 225.8B total parameters with only 23B activated per token, making it executable on well-equipped workstations.
It reaches 72.5% on SWE-bench Verified, 67.3% on SWE-bench Multilingual and 46.9% on SWE-bench Pro, scores that rival Claude Opus 4.7 and GPT-5.4 Pro in coding.
Available under the Apache 2.0 license on Hugging Face, it can be self-hosted without any commercial restrictions — a first for a model of this scale in coding.
Its access is free via OpenRouter at $0.00/1M input tokens, allowing it to be tested immediately without infrastructure.

Recommended tools

Tool	Main usage	Price (June 2026, check on openrouter.ai)	Ideal for
Laguna M.1 (free)	Agentic coding, SWE-bench	0.00$/1M tokens (input)	Immediate testing, prototyping
Laguna M.1 sur Hugging Face	Full self-hosting	Free (Apache 2.0)	Companies with GPUs, sovereignty
Ollama	Local LLM execution	Free	Individual developers, high-end Mac

Architecture: why sparse MoE changes the game for self-hosting

Laguna M.1 uses a sparse mixture-of-experts architecture with 67 layers, 256 experts, and top-k=16 routing. Concretely, for each token processed, only a subset of 16 experts out of the 256 is activated.

This means that out of the model's 225.8 billion parameters, only 23 billion are effectively computed at each pass. The rest of the weights remain inactive, which drastically reduces the computational load compared to an equivalent dense model.

A dense 225B model would require multi-node GPU clusters. Laguna M.1, thanks to its sparsity, can run on a workstation with enough VRAM to load the full weights, while consuming the computing power of a 23B model. It's an ideal compromise for enterprise self-hosting de LLM.

The 256K token context window allows you to work on entire codebases, complex pull request diffs, or extended debugging sessions — exactly what a coding agent needs to operate over a "long horizon," according to Poolside's terminology.

Benchmarks: Laguna M.1 vs. proprietary giants

The benchmarks were run using Poolside's agent harness pool, which allows up to 500 execution steps in a sandbox. This protocol is more representative than the classic single-turn evaluation because it measures a model's actual ability to solve software engineering tasks end-to-end.

Benchmark	Laguna M.1	Claude Opus 4.7	GPT-5.4 Pro	GLM-5 (Reasoning)
SWE-bench Verified	72,5%	~74% (estimated)	~73% (estimated)	~65% (estimated)
SWE-bench Multilingual	67,3%	N/A	N/A	N/A
SWE-bench Pro	46,9%	~48% (estimated)	~47% (estimated)	~38% (estimated)
Terminal-Bench 2.0	40,7%	N/A	N/A	N/A

Laguna M.1's scores on SWE-bench Verified place it in the same circle as the meilleurs LLM pour coder on the market. The mention of SWE-bench Multilingual (67.3%) is particularly notable: it is a benchmark that very few models can even attempt, and Laguna M.1 excels at it.

The score of 40.7% on Terminal-Bench 2.0 confirms the model's agentic vocation. This benchmark measures the ability to execute commands, navigate a filesystem, and solve problems in a terminal environment — exactly the workflow of a coding agent IA.

These results are all the more impressive given that Laguna M.1 is open-weights. Historically, the gap between open and proprietary models in coding was 10 to 20 points. Laguna M.1 reduces this to nearly zero.

Competitive landscape: GLM-5.2, Qwen3.6 and the new open weights wave

Laguna M.1 is not arriving in a vacuum. The year 2026 saw a spectacular acceleration of open weights models geared towards coding, notably with Z.AI's GLM-5.2 and the Qwen3.6 family.

Z.AI's GLM-5 (Reasoning), available for self-hosting, achieves an agentic score of 82 on benchmark leaderboards. It is a generalist model strong in reasoning, but it is not specifically optimized for agentic software engineering. Its dense architecture also makes it more expensive to host than Laguna M.1's MoE at an equivalent parameter count.

The Qwen3.6 family, such as the Qwen3 Coder Next, is positioned in the lightweight and mobile coding niche, capable of running on 64GB Macs. This is complementary to Laguna M.1: Qwen targets the individual developer, Laguna targets enterprise infrastructure.

The Kimi K2.7-Code with its 1 trillion parameters represents another approach — scalar brute force. But its infrastructure cost reserves it for massive cloud deployments, not for self-hosting.

Laguna M.1 stands out with its precise positioning: the power of a 225B model with the inference costs of a 23B, all under Apache 2.0. This is the sweet spot that engineering teams are looking for.

Self-hosting: what resources are needed to run Laguna M.1

This is the question every CTO asks. Laguna M.1 weighs 225B in total parameters, which requires a minimum amount of VRAM for full loading.

In practice, with INT4 quantization (which preserves almost all coding performance), the model requires about 120-130 GB of VRAM. This corresponds to:

2x Nvidia H100 80GB GPUs in an NVLink configuration
1x Nvidia H200 141GB GPU (ideal configuration)
4x RTX 4090 24GB GPUs in a multi-GPU configuration (budget solution)

Discussions on r/LocalLLaMA confirm that the community has already managed to run Laguna M.1 in a consumer multi-GPU configuration. The MoE architecture with its top-k=16 routing facilitates the sharding of experts across multiple GPUs.

For companies that do not have this infrastructure on-site, Baseten published a use case detailing how they enabled Poolside to deploy Laguna M.1 in record time on their cloud infrastructure. This is a middle-ground option between pure self-hosting and a proprietary API.

For developers who want to test the model without investing in hardware, the free OpenRouter option remains the easiest path. And for lighter models suited to individual workstations, the local LLM installation guide with Ollama or LM Studio is still relevant.

Integration into a coding agent: how to leverage Laguna M.1

A model, even an excellent one, does not make an agent. The true value of Laguna M.1 is revealed when it is integrated into an agentic pipeline with tool access — reading files, executing commands, searching through code, editing files.

Poolside actually designed its own "pool agent harness" to evaluate Laguna M.1, with up to 500 execution steps in a sandbox. This is the framework that produces the benchmark SWE-bench scores. The company does not release this harness as open source, but the principle is reproducible.

Projects like OpenCode, with its 172,000 GitHub stars, demonstrate that the open source community is already building the agentic layers around LLMs. Replacing OpenCode's backend model with Laguna M.1 is technically feasible and could give birth to a fully open source coding agent, from the model to the orchestration.

The score of 40.7% on Terminal-Bench 2.0 is crucial here: it proves that Laguna M.1 is not just a simple code generator, but a model capable of navigating an environment, executing commands, and iterating on its own errors. This is what makes it relevant for the best LLMs for AI agents.

Companies that want to create a proprietary AI coding agent can take Laguna M.1 as a base and add their own tools, their guardrails, and their CI/CD integrations. The Apache 2.0 license allows this without any restrictions.

Poolside : from French startup to $12 billion challenger

Poolside's journey is worth recalling. Founded in 2023, the startup raised $500 million in a Series B in 2024, then an additional $2 billion in October 2025, bringing its valuation to $12 billion.

Among the investors: Nvidia, which sees Poolside as a major use case for its GPUs, and eBay, which potentially uses these models internally. Bain Capital Ventures is participating in the round with over $1 billion already committed, including $700 million from existing investors.

This multi-billion dollar war chest gives Poolside the resources to continue training increasingly powerful models. Laguna M.1 is the first model that justifies this valuation. If the benchmarks are confirmed in production, Poolside becomes the natural candidate for any company looking for a credible alternative to OpenAI and Anthropic in coding.

The fact that the model is released in open weights under Apache 2.0 is a strong strategic signal. Poolside does not monetize the model itself, but the infrastructure around it — managed API, enterprise tools, integrations. It is the same business model as Meta with Llama, but applied to the vertical coding sector.

The AI coding market in June 2026: consolidation and crazy prices

The release context of Laguna M.1 is essential to understanding its importance. The AI coding tool market is in a full speculative bubble.

Anysphere, the startup behind Cursor, reached a valuation of 60 billion dollars thanks to an investment from SpaceX. This valuation relies on the promise that AI coding will replace a massive share of developers' work. But Anysphere depends entirely on the proprietary APIs of Anthropic and OpenAI.

This is precisely the vulnerability that Laguna M.1 exploits. A company that builds its coding infrastructure on Claude or GPT is tied by the contract, pricing, and terms of service changes of its provider. With Laguna M.1 in self-host, this dependency disappears.

The fact that OpenRouter offers Laguna M.1 for free on input is also a market signal. Poolside is subsidizing access to accelerate adoption, the same strategy DeepSeek or Groq used before it. For developers comparing the best free LLMs, Laguna M.1 immediately becomes an option to test.

❌ Common mistakes

Mistake 1: Confusing total parameters and active parameters

Laguna M.1 has 225B total parameters but only activates 23B per token. Comparing its compute consumption to a 225B dense model is a fundamental mistake. In terms of FLOPs per token, it is much closer to a 23B model than a 225B model.

Mistake 2: Underestimating VRAM requirements for self-hosting

Yes, only 23B is active per token. But the full 225B of weights still need to be loaded into memory. In INT4, this represents ~125 GB of VRAM. A single consumer GPU is not enough. You either need an H200 or a multi-GPU setup with NVLink.

Mistake 3: Evaluating Laguna M.1 in single-turn

The reference benchmarks (SWE-bench Verified at 72.5%) were achieved with Poolside's agent harness pool, up to 500 steps in a sandbox. Evaluating this model by generating a single block of code and comparing it to the gold standard will not do justice to its agentic capabilities.

Mistake 4: Ignoring the Apache 2.0 license

Apache 2.0 is not the same thing as "royalty-free open source". The license requires retaining the copyright and license notice, and includes a patent license grant. For enterprise use, this is perfect. For forking without attribution, it is prohibited.

❓ Frequently Asked Questions

Can Laguna M.1 truly replace Claude or GPT for coding?

In agentic benchmarks (SWE-bench Verified), the gap is around 1 to 3 points. For real-world software engineering tasks involving debugging, refactoring, and navigating a codebase, yes, Laguna M.1 is a credible self-hosted alternative.

What is the difference between Laguna XS.2 and Laguna M.1?

Laguna XS.2 is the lightweight model in the lineup, designed to run on a single GPU. Laguna M.1 is the flagship 225B model, optimized for complex, long-horizon agentic tasks. XS.2 for the fast, M.1 for the complex.

Is the model really free on OpenRouter?

Yes, as of April 28, 2026, Laguna M.1 is listed at $0.00/1M input tokens on OpenRouter. This is an acquisition strategy. Output prices and long-term conditions may change — check on openrouter.ai.

Can Laguna M.1 be used with existing agents like OpenCode?

Technically yes, if the agent supports an OpenAI-compatible endpoint. OpenCode and other open-source agentic frameworks can be configured to point to a local Laguna M.1 instance or via API. The best LLMs for research and coding often share the same integration formats.

How does Laguna M.1 compare to other LLMs for code in French?

For code, the language of the prompt matters little — code is universal. But for explanations and documentation, Laguna M.1 was not specifically trained for French. For strict bilingual use, the best LLMs in French remain better suited for the textual part.

✅ Conclusion

Laguna M.1 is the first open-weights model that makes self-hosting an enterprise-level coding agent technically and economically realistic. With 72.5% on SWE-bench Verified under Apache 2.0, Poolside has created the alternative that engineering teams have been waiting for to proprietary APIs. Try it for free on OpenRouter before investing in self-hosting hardware.

#coding-agent #ia-open-source #poolside-laguna-m.1 #modèle-225b #licence-apache-2.0

📚 Related articles

LLM & Modèles 🟢 Débutant 15 min

FrontierCode: Cognition's benchmark that buries SWE-Bench and ranks code agents by the real quality of pull requests — Fable 5 at 46.3%, Opus 4.8 at 34.3%, GPT-5.5 at 25.5%

Discover FrontierCode, Cognition's new benchmark replacing SWE-Bench by evaluating the real quality of code agents' pull requests.

2026-06-26 17:03

LLM & Modèles 🟢 Débutant 15 min

DeepSWE: the benchmark proving that code agents were cheating — Artificial Analysis buries SWE-Bench

Discover DeepSWE, the new benchmark replacing SWE-Bench, proving code agents were cheating. Analysis of the rankings upended by Artificial Anal

2026-06-22 16:02

LLM & Modèles 🟢 Débutant 16 min

Gemini 3.5 Pro: countdown — 10 days before Google's deadline, 2 million tokens and Deep Think mode, the most anticipated model of the year (amidst a talent chaos)

Gemini 3.5 Pro: 10 days before Google's deadline, discover the rumors about its 2 million tokens and Deep Think mode amid a talent chaos.

2026-06-20 17:05

📑 Table of contents