GLM-5.2 : the world's most powerful open weights model — 753B MoE, 1M context, MIT license, the LLM landscape shifts
🔎 The day China made 753 billion parameters royalty-free
On June 13, 2026, Z.ai (formerly Zhipu AI) releases GLM-5.2. A 753-billion-parameter model in a Mixture of Experts architecture, with only 40 billion active per token. MIT license. 1 million tokens of context. Available in FP8 weights on HuggingFace.
The timing is anything but coincidental. The day before, the US Department of Commerce blocks the export of Anthropic's Claude Fable 5 and Mythos 5 to several jurisdictions, further tightening export controls. Z.ai does not officially comment on this timing, but the effect is mathematical: every US restriction creates a void that Chinese open weights models fill.
Simon Willison summarizes the situation on his blog: "probably the most powerful text-only open weights LLM". The Artificial Analysis Intelligence Index now ranks it first among all open weights models, ahead of DeepSeek V4-Pro, Qwen3-Coder-480B and Llama 4 Maverick. The landscape has just shifted.
The essentials
- GLM-5.2 takes the first place open weights spot on the Artificial Analysis Intelligence Index, dethroning DeepSeek V4-Pro.
- 753B MoE parameters, 40B active per token, 1M context — 5x more than GLM-5.1.
- MIT license: no commercial restrictions, no reserved usage, including for US companies.
- API pricing: $1.40 (input) / $4.40 (output) per million tokens on OpenRouter (June 2026, check on openrouter.ai).
- Terminal-Bench 2.1: jumps from 62 (GLM-5.1) to 81.0, a 30% leap reminiscent of DeepSeek's generational jumps.
- FP8 weights available on HuggingFace, making self-hosting feasible on high-end consumer hardware.
Recommended tools
| Tool | Main usage | Price (June 2026, check on site) | Best for |
|---|---|---|---|
| GLM-5.2 sur OpenRouter | API access | $1.40/$4.40 per M tokens | Quick integration, prototyping |
| GLM-5.2 sur HuggingFace | Self-hosting (FP8) | Free | Local deployment, research |
| Artificial Analysis | Comparative benchmarking | Free | Objective inter-model comparisons |
| WaveSpeed API | Optimized alternative API | Variable | Reduced latency, production |
The numbers that matter — GLM-5.2 vs. the competition
GLM-5.2 doesn't just take the lead. It widens the gap on several key metrics. Its Mixture of Experts architecture with 40B active parameters makes it both powerful and relatively efficient during inference compared to an equivalent dense model.
| Model | Parameters | Architecture | Context | License | AA Index Score |
|---|---|---|---|---|---|
| GLM-5.2 | 753B (40B active) | MoE | 1M tokens | MIT | 1st open weights |
| DeepSeek V4 Pro (Max) | 685B (37B active) | MoE | 256K tokens | MIT | 2nd open weights |
| Qwen3-Coder-480B | 480B (32B active) | MoE | 256K tokens | Apache 2.0 | Top 5 open weights |
| Llama 4 Maverick | 400B (17B active) | MoE | 1M tokens | Llama License | Top 10 open weights |
The difference in context is the most striking point. GLM-5.2 quadruples the context window of DeepSeek V4 Pro and matches Llama 4 Maverick in this regard. But where Llama 4 imposes commercial restrictions via its proprietary license, GLM-5.2 is MIT. No revenue cap, no exclusion clause. This is a structural point, not a minor detail.
The Terminal-Bench 2.1 score deserves a closer look. This benchmark measures a model's ability to execute real-world tasks on the command line: file navigation, code editing, debugging, tool chaining. GLM-5.1 capped at 62. GLM-5.2 reaches 81.0. For an open weights model, this is unprecedented and places it in the zone of high-end proprietary models.
Architecture: why 753B MoE with 40B active changes the game
The Mixture of Experts architecture is not new. DeepSeek popularized it, Qwen adopted it, and Meta did too. But GLM-5.2 pushes the total/active ratio to an extreme level: 18.8 to 1. For each token generated, only 5.3% of the parameters are activated.
This ratio has concrete consequences on inference cost. A dense 753B model would be prohibitively expensive to run. In MoE with 40B active, the compute cost per token comes close to that of a dense 40B model — while benefiting from the reasoning capabilities of a near-terminal model. It is this tension between total size and activation efficiency that makes the model viable in production.
Z.AI's research on GLM models, documented from GLM-130B up to GLM-4 All Tools, shows a consistent trajectory. Each generation has expanded the context, refined the MoE architecture, and improved multilingualism. GLM-5.2 is not a one-shot. It is the result of three years of iterations on the same architectural family.
The 1M token window, in particular, results from work on attention mechanisms. Z.AI didn't simply extend RoPE (Rotary Position Embedding) — they introduced specific modifications to maintain semantic coherence over very long sequences. Needle in a Haystack benchmarks at 1M tokens confirm a recall rate above 99.2%, which is in the same league as Gemini 3.1 Pro on the proprietary side.
The MIT license: the invisible geopolitical weapon
Licenses don't inspire dreams. But in the LLM war, they have become a battlefield. The MIT license of GLM-5.2 is not a symbolic gesture — it's an ecosystem strategy.
Let's step back. Meta's Llama license imposes a cap of 700 million monthly users beyond which a separate commercial license is required. DeepSeek V4 Pro's license is MIT, which had already caused a sensation. GLM-5.2 follows the same path but with a significantly more powerful model.
Direct consequence: any startup, any research lab, any company — including American ones — can download the weights, modify them, commercialize them, without paying a single cent in royalties. The US Bureau of Industry and Security (BIS) can block the export of Claude Fable 5. It cannot block the download of an MIT-licensed model from HuggingFace.
This is precisely the paradox that Simon Willison identifies in his June 17 analysis: US export controls create an artificial demand signal for Chinese open weights alternatives. Every block strengthens the position of models like GLM-5.2. The restriction becomes a driver of adoption.
For self-hosting, the MIT changes everything. Companies that hesitated on DeepSeek V4 Pro for regulatory compliance reasons see in GLM-5.2 an alternative at least as powerful, with a 4x larger context, and the same level of legal freedom. If you are looking to install an LLM locally, GLM-5.2 in FP8 becomes a serious candidate for servers equipped with 2-4 80GB+ NVIDIA GPUs.
Benchmarks: where GLM-5.2 wins, where it doesn't win
The Artificial Analysis Intelligence Index aggregates several benchmarks. GLM-5.2 takes the top spot for open weights there, but you need to dig deeper to understand what this actually means.
Where it dominates:
Terminal-Bench 2.1 (81.0) is its most striking strong point. Its ability to chain commands, read files, and correct errors in a loop makes it a natural candidate for agents IA. On multistep coding tasks, it rivals Claude Opus 4.7 and GPT-5.5, which is remarkable for an open weights model.
Multilingualism is another advantage. Z.AI has always invested in French, unlike some Chinese competitors who only optimize for English and Chinese. GLM-5.2 naturally positions itself among the meilleurs LLM en français, with translation and generation performance that approaches that of Claude Sonnet 4.6 on non-technical texts.
Where it doesn't win:
Pure mathematical reasoning remains the domain of Gemini 3 Pro Deep Think and GPT-5.5. On MATH-500 and formal proof benchmarks, GLM-5.2 is good but not at the level of dedicated reasoning models. The same observation applies to multimodal tasks — GLM-5.2 is text-only, which automatically puts it out of the running on vision benchmarks.
The Artificial Analysis agentic ranking is revealing. GLM-5.1 score 83 points in the general category, but GLM-5.2 does not yet appear in the agentic top (which remains dominated by GPT-5.5 at 98.2 and Gemini 3 Pro Deep Think at 95.4). The leap in Terminal-Bench suggests that GLM-5.2's agentic score could be revised upwards, but the official data have not yet been published.
Pricing API: the price war intensifies
API pricing is the area where the impact of GLM-5.2 is felt immediately. WaveSpeed published a detailed analysis of the pricing on the very day of release.
| Model | Input / M tokens | Output / M tokens | Quality/price ratio |
|---|---|---|---|
| GLM-5.2 | $1.40 | $4.40 | Excellent |
| DeepSeek V4 Pro | $1.10 | $3.80 | Very good |
| GPT-5.5 | $12.00 | $48.00 | Medium |
| Claude Opus 4.7 | $15.00 | $75.00 | Low |
| Gemini 3.1 Pro | $7.00 | $21.00 | Fair |
GLM-5.2 is about 8.5x cheaper than GPT-5.5 for input and almost 11x cheaper for output. For companies processing large volumes — ingestion of long documents, log analysis, RAG on extensive corpora — the difference amounts to thousands of dollars per month.
The 1M token context makes the comparison even more favorable. With GPT-5.5, sending 1M tokens in input costs $12,000. With GLM-5.2, it's $1,400. For RAG on complete document databases, this is an economic paradigm shift.
If you compare the meilleurs LLM gratuits, GLM-5.2 obviously isn't directly on the list. But the pressure it exerts on pricing benefits the entire ecosystem, including Gemini's freemium offerings and Groq's free quotas.
Impact on self-hosting: FP8 makes the impossible conceivable
753 billion parameters, even in MoE, is massive. In FP16, the weights alone would take up about 1.4 TB. In FP8, the format in which Z.AI publishes the weights on HuggingFace, it drops to about 750 GB. That's still considerable, but it's within the realm of feasibility.
Estimated minimum hardware configuration for self-hosting in FP8:
- 2x NVIDIA H100 80GB: possible but tight, requires aggressive additional quantization.
- 4x NVIDIA A100 80GB: realistic configuration for batch processing.
- 8x NVIDIA L40S 48GB: more accessible alternative, total ~384 GB VRAM, requires offloading.
This is far from a consumer setup. But for labs, mid-sized companies, and research communities, it's a crossable threshold. A year ago, no model of this class was available for self-hosting, regardless of the configuration. Today, les meilleurs LLM locaux include models that would have been considered impossible to run outside the cloud.
The tool ecosystem is adapting rapidly. Ollama, vLLM, SGLang have all added support for GLM-5.2 within 48 hours of its release. agents IA open source avec Ollama can now rely on a frontier model without going through a proprietary API.
Geopolitics of AI: when export controls produce the opposite effect
The blockade of Claude Fable 5 and Mythos 5 on June 12, 2026, was supposed to protect the American advantage. The effect was exactly the opposite. GLM-5.2 is released the next day, and the narrative takes hold: the United States restricts, China opens up.
This isn't just storytelling. Data from llm-stats.com shows a clear acceleration: in 18 months, Chinese open weights models (DeepSeek, Qwen, GLM, Kimi) have closed the performance gap with American proprietary models. On certain benchmarks — Terminal-Bench, multilingual coding, context window — they even surpass them.
The American strategy relies on a postulate: restricting access to the best proprietary models prevents adversarial actors from accessing the frontier. This postulate ignores the fact that the frontier is also shifting to the open weights side. When GLM-5.2 is freely downloadable under an MIT license, the export control on Claude Fable 5 becomes symbolic.
This is a structural change. The open source LLM war is no longer just a question of technology. It is a question of business model and geopolitics. Chinese open weights models are becoming a tool of soft power: they demonstrate that China can produce frontier AI while making it accessible to the entire world.
GLM-5.2 in the overall ranking: where does it really stand?
It is crucial to distinguish "best open weights" from "best model overall". GLM-5.2 ranks first among open weights, but the comparison of the best LLMs still places Gemini 3.1 Pro (92 points) and GPT-5.5 (91 points) above everything else.
GLM-5.2 is likely in the 85-88 point zone of the overall Artificial Analysis ranking, which would put it on par with DeepSeek V4 Pro (Max) at 88 and Claude Opus 4.6 at 87. Impressive for a downloadable model. But not yet at the level of the most optimized proprietary models.
The important distinction lies in the performance/accessibility trade-off. Gemini 3.1 Pro is more performant, but you cannot download it. GLM-5.2 is slightly less performant, but you can modify it, fine-tune it, and deploy it wherever you want. For many users, especially in research and the best LLMs for research, this trade-off is more relevant than the raw score.
Specifically on the agentic front, if you are looking for the best LLMs for AI agents, GLM-5.2 is a serious contender thanks to its Terminal-Bench score. But GPT-5.5 (98.2) and Gemini 3 Pro Deep Think (95.4) remain the benchmarks for critical agentic workflows in production.
Comparison with DeepSeek V4 Pro: the changing of the guard
DeepSeek V4 Pro marked a turning point with its MIT license and frontier performance. GLM-5.2 doesn't crush it — it marginally outperforms it, but sufficiently to take the top spot.
The key differences:
Context: 1M vs 256K. This is the most discriminating factor. For ingesting entire books, complete codebases, or legal corpora, GLM-5.2 has no open weights equivalent.
Coding: DeepSeek V4 Pro remains slightly better on pure code benchmarks (SWE-bench, HumanEval+). But GLM-5.2 compensates with Terminal-Bench, which measures coding in real-world situations, with tools and environment.
Price: DeepSeek remains cheaper (1.10$/3.80$ vs 1.40$/4.40$). The gap is modest but exists.
Multilingualism: GLM-5.2 is noticeably better in French, German, Spanish, and Japanese. DeepSeek remains optimal for English and Chinese.
In practice, the choice between the two depends on your use case. If you are doing code review in English, DeepSeek V4 Pro remains the rational choice. If you are doing multilingual RAG on long documents, GLM-5.2 becomes the obvious one.
MiniMax M3 and Qwen3-Coder: the other contenders
The open weights landscape is not limited to a GLM vs DeepSeek duel. MiniMax M3 with its MSA (Mixture of Sparse Attention) architecture and 1M context is a direct competitor to GLM-5.2 on paper. But in practice, MiniMax M3 suffers from lower adoption and a less mature tool ecosystem.
Qwen3-Coder-480B, for its part, specifically targets coding. With 480B parameters and a code-first optimization, it remains the benchmark for pure code generation tasks in open weights. GLM-5.2 is more versatile but less specialized.
The article on VaultGemma also reminds us that there are other approaches to open weights — Google DeepMind's differential privacy being radically different from Z.AI's strategy. Each player performs a distinct part in the same ecosystem.
What GLM-5.2 means for developers
For a developer choosing a model today, GLM-5.2 adds an option that didn't exist yesterday. A frontier model, open weights, MIT, with a 1M context. The practical implications are numerous.
In RAG (Retrieval-Augmented Generation), the 1M token window changes the game. You no longer need aggressive chunking, complex reranking, or multi-step retrieval pipelines. You can literally push a 750,000-word document into the context and ask questions about it. The quality of the answers improves mechanically because the model has access to the entire context, not just fragments.
For the meilleurs LLM pour coder, GLM-5.2 is not the best pure coding model. But its Terminal-Bench score makes it an excellent choice for agent-assisted development workflows — where the model has to navigate a codebase, read files, run tests, and iterate. This is different from snippet generation, and it is precisely where GLM-5.2 shines.
In fine-tuning, the MIT license opens all doors. You can fine-tune GLM-5.2 on your business data, deploy it in production, and owe nothing to anyone. No proprietary model offers this combination of base power and freedom of modification.
❌ Common mistakes
Mistake 1: Confusing "open weights" and "open source"
GLM-5.2 is open weights: you download the parameters, not the training code, nor the data, nor the complete pipeline. This is already huge, but it is not open source in the strict sense. Z.AI does not publish the training data or the complete details of the pretraining pipeline. The distinction is not academic — it has practical implications for reproducibility and auditing.
Mistake 2: Thinking that 753B in MoE is equivalent to 753B dense
A 753B MoE model with 40B active does not have the computational capacity of a 753B dense model. It has the representation capacity of a very large model (the experts cover a vast knowledge space) but the compute capacity per token of a 40B model. It is an advantage in inference, not a magical superpower. On tasks that require simultaneously activating a lot of knowledge, the MoE model shows its limits compared to an equivalent dense model.
Mistake 3: Ignoring the hardware constraints of self-hosting
"It is open weights, so I'll run it on my Mac" — no. Even in FP8, GLM-5.2 requires a minimum of 750 GB of VRAM for a comfortable deployment without massive offloading. This is server hardware, not consumer hardware. Enthusiastic announcements on social media often omit this reality. Check your resources before downloading.
Mistake 4: Using GLM-5.2 for multimodal
GLM-5.2 is text-only. It does not process images, videos, or audio. If your use case requires vision-language, turn to Gemini 3.1 Pro or GPT-5.5. Forcing a multimodal pipeline around a text-only model adds unnecessary complexity and points of failure.
❓ Frequently Asked Questions
Is GLM-5.2 really better than DeepSeek V4 Pro?
In the Artificial Analysis composite score, yes, marginally. In practice, it depends on the use case: GLM-5.2 wins on context (1M vs 256K) and multilingualism, DeepSeek remains better on pure code and pricing.
Does the MIT license allow unlimited commercial use?
Yes. The MIT license does not impose any commercial restrictions, no revenue caps, nor a mandatory attribution clause (although it is courteous). You can integrate GLM-5.2 into a SaaS product and monetize it directly.
What is the minimum hardware for self-hosting?
In FP8, expect a minimum of 750 GB of VRAM. Realistic configuration: 4x NVIDIA A100 80GB or 8x L40S 48GB with partial offloading. Below that, generation times become prohibitive.
Is GLM-5.2 available in a smaller version?
Not at the time of release (June 13, 2026). Z.AI has historically released distilled versions of its models (GLM-4 existed in 9B and 25B), but no announcement has been made for GLM-5.2. Keep an eye on Z.AI's HuggingFace account.
Can GLM-5.2 be used for complex AI agents?
Yes, and it is even one of its strong points thanks to a Terminal-Bench 2.1 score of 81.0. For agentic workflows with terminal access, file editing, and tool chaining, it rivals the best proprietary models.
✅ Conclusion
GLM-5.2 didn't just invent open weights — DeepSeek did that. It didn't just invent large context — Llama 4 did that. It just combined the two with a power that surpasses everything that existed before, under a license that leaves no door closed. The LLM landscape has shifted: the frontier is no longer exclusively proprietary, and US export controls are accelerating the very movement they claim to slow down. If you want to understand where open AI is heading in 2026, the monthly comparison of the best LLMs is your starting point — GLM-5.2 has changed the game there.