MiniMax M3: the Chinese open-weights model defying GPT-5.5 with 1M context and MSA architecture
🔎 On June 1, 2026, MiniMax released M3 without any warning
An open-weights, natively multimodal model, with 1 million context tokens and 59% on SWE-Bench Pro. All at a fraction of the price of GPT-5.5 or Claude Opus 4.7. The MSA (MiniMax Sparse Attention) architecture it packs is not just a simple tweak — it's a fundamental redesign of how an LLM processes long contexts.
Two immediate implications. First, the barrier to entry for open-weights models just went up a notch. Second, the per-token billing of proprietary models has become much harder to justify for coding and agentic use cases.
The weights will be published on HuggingFace within 10 days according to Toolworthy. That leaves exactly enough time to understand what M3 actually changes before downloading it.
The essentials
- 59% on SWE-Bench Pro, which surpasses GPT-5.5 and Gemini 3.1 Pro according to MiniMax's internal evaluations, and clearly beats DeepSeek V4 Pro and Qwen 3.7.
- MSA Architecture: sparse attention that divides the computational cost by 15.6x in decoding and 9.7x in prefill at 1M tokens compared to standard attention (AimadeTools).
- Native multimodal: text, image, and video in a single architecture, not an assembly of specialized models.
- API price: $0.60 per million input tokens, well below Western rates.
- Open-weights: release under a permissive license on HuggingFace, allowing fine-tuning and full local deployment.
Recommended tools
| MiniMax M3 | Coding, agents, long context | $0.60/M input tokens (June 2026, check on minimax.io) | Developers looking for an open-weights alternative to GPT-5.5 |
|---|---|---|---|
| HuggingFace | Downloading open-weights weights | Free (June 2026, check on huggingface.co) | Local deployment and fine-tuning of M3 |
| Claude, GPT, Gemini, Llama : quel modèle choisir en 2026 ? | Comparison of the best LLMs | Varies by model | Choosing the right model based on your use case |
What M3 actually brings — A frontier model in open-weights
MiniMax M3 is not a model that is "almost as good" as proprietary leaders. On SWE-Bench Pro, it reaches 59%, a score that places it on par with GPT-5.5 and above Gemini 3.1 Pro in the benchmarks published by MiniMax and reported by LushBinary.
The crucial point: it is open-weights. Not open-weights with a restrictive license that prohibits commercial use. Not an "open" model whose weights are never published. The weights arrive on HuggingFace, which means anyone can inspect them, modify them, fine-tune them.
This is a break from the dynamic where Chinese open-weights models remained below the Western frontier. DeepSeek had started to blur the lines with DeepSeek V3.1, but M3 goes further by adding native multimodality and a million context tokens in a single package.
The comparison with current models is enlightening. GPT-5.5 scores 91 in the overall LLM ranking but is overtaken by M3 on the specific coding benchmark. Claude Opus 4.7 Adaptive peaks at 90. Gemini 3.1 Pro is at 92. These overall scores remain high, but on the precise task of resolving GitHub tickets (SWE-Bench Pro), M3 takes the lead.
To understand the implications, you have to look at the architecture that makes this possible.
MSA Architecture: why it's different from anything we've seen
The problem with standard attention
Attention in a transformer has quadratic complexity. Doubling the context length multiplies the computational cost by four. At 1 million tokens, standard attention simply becomes impractical, even with optimizations like Flash Attention.
This is why most "1M context" models in practice are only usable with compression techniques, truncation, or at prohibitive costs. Long context exists on paper but rarely gets fully utilized in production.
What MSA actually does
MiniMax Sparse Attention replaces dense attention with a hybrid mechanism. Instead of computing attention weights between every pair of tokens, MSA adaptively selects which connections are truly necessary.
The figures reported by AimadeTools are unequivocal: 15.6x faster in decoding and 9.7x faster in prefill at 1M tokens compared to standard attention. This is not a marginal optimization. It's a change in orders of magnitude.
According to Banandre, it is this architecture that makes the million-token context truly practical rather than theoretical. You can inject an entire code repository and get coherent responses without waiting for minutes or paying a fortune.
Understanding these attention mechanisms is essential for evaluating the actual billing of LLMs. A model with MSA can offer 1M tokens of context at $0.60/M input because its actual computational cost per token is drastically reduced. A model with standard attention would charge much more for the same context to cover its inference costs.
Benchmarks: what M3 wins at and what the sources don't say
The impressive scores
The striking figure: 59% on SWE-Bench Pro. For context, this benchmark measures a model's ability to solve real GitHub tickets autonomously. It is the reference test for evaluating coding models.
According to FelloAI, M3 outperforms GPT-5.5 and Gemini on this benchmark. Toolworthy confirms the score of 59.0% and specifies that M3 also beats DeepSeek V4 Pro and Qwen 3.7.
The following table summarizes the available comparisons:
| Model | SWE-Bench Pro | Type | Approximate API price (June 2026) |
|---|---|---|---|
| MiniMax M3 | 59.0% | Open-weights | $0.60/M input |
| GPT-5.5 | < 59% (according to MiniMax) | Proprietary | ~$5-15/M input |
| Gemini 3.1 Pro | < 59% (according to MiniMax) | Proprietary | ~$2-5/M input |
| DeepSeek V4 Pro | < 59% | Open-weights | ~$1-2/M input |
The caveats that honest sources raise
Thomas Wiegold asks the right question: how does M3 really compare to GPT-5.5 and Opus 4.8 outside of the benchmarks chosen by MiniMax? A publisher's internal evaluations should always be taken with a grain of salt. The risk of benchmark selection bias — choosing the tests where its model shines — is real.
LushBinary in their comparison points out that M3 beats GPT-5.5 and Gemini "at a fraction of the cost," which is true on SWE-Bench Pro, but notes that comparisons on other general benchmarks are less clear-cut.
The reality likely lies somewhere in between: M3 is a frontier-level model on coding and agentic tasks, with a massive cost/context advantage thanks to MSA. But it probably doesn't yet have the raw versatility of a GPT-5.5 or a Claude Opus 4.7 across all general tasks. For the monthly comparison of the best LLMs, we will have to wait for independent third-party evaluations.
Native multimodal: text, image, video in a single model
Many "multimodal" models are actually assemblies: an LLM for text, a vision model for images, a video model for video, all glued together with a router. M3 integrates the three modalities into a single architecture according to LushBinary.
The practical benefit is considerable for development workflows. You can inject an error screenshot, a design mockup as an image, and a bug reproduction video — the model processes everything in the same context without losing coherence.
This is a differentiating asset compared to DeepSeek V4 Pro or Claude Sonnet 4.6 which excel in pure text but do not have the same native multimodal integration. For the meilleurs LLM pour coder, the multimodal criterion is becoming increasingly decisive as workflows integrate more visual elements.
Pricing: why M3 makes proprietary models uncomfortable
The numbers
According to AimadeTools and Codersera, the M3 API is priced at $0.60 per million input tokens. That's between 5x and 25x cheaper than Western proprietary models with equivalent coding capabilities.
For intensive use of coding agents that consume hundreds of thousands of tokens per session, the difference amounts to hundreds of dollars per month per developer.
What this means for the market
Proprietary models can no longer justify their price solely based on raw performance. If an open-weights model gives you 90-95% of the quality for 5-20% of the price, the economic calculation becomes hard to defend in front of a CFO.
The differentiation strategy for proprietary models will have to shift towards reliability, support, regulatory compliance, and the integration ecosystem — not benchmarks. This is exactly what happened in the cloud with the shift from AWS to cheaper alternatives.
For teams looking for meilleurs LLM gratuits or low-cost options, M3 will become a serious choice as soon as the weights are released. And for those who want to control everything, the meilleurs LLM à run en local will soon include M3 in their list.
The open-weights ecosystem in June 2026: where M3 stands
The current hierarchy
The open-weights landscape has structured itself into three tiers. The first: generalist models like Llama and Mistral. The second: specialized coding models like DeepSeek. The third, which M3 just created: multimodal open-weights models with massive context and frontier performance.
DeepSeek V4 Pro (Max) scores 88 in the overall ranking and remains excellent at coding. But it lacks native multimodality and the million tokens with MSA. Kimi K2.6 at 84 points is solid but in a different performance category.
M3 does not replace DeepSeek. It adds to the open-weights arsenal with a different profile: perhaps less good at pure general reasoning, but superior on tasks that require long multimodal context — exactly what AI agents need.
For local AI agents
The arrival of M3's weights on HuggingFace opens up concrete possibilities for open source AI agents with Ollama locally. An agent that can read an entire repository, analyze screenshots, and watch reproduction videos — all locally, without sending data to a third-party API.
For the choice of the best LLM for AI agents, M3 will become a top candidate as soon as the weights are available. The local LLM installation guide will probably need to be updated to include M3 in the coming weeks.
Geopolitical implications: China in the premium open-weights segment
A strong signal
MiniMax is not a small experimental lab. It is a well-funded Chinese company choosing to release an open-weights model that rivals GPT-5.5. This is a clear strategic signal: China is no longer content with copying or following. It is taking the initiative in specific market segments.
The parallel with DeepSeek is obvious. DeepSeek V3.1 had already demonstrated that Chinese open-weights could reach the frontier. M3 extends this demonstration to multimodality and massive context.
What this changes for Western developers
The question is no longer "are Chinese models good?" but "which Chinese model is the best for my use case?". This is a complete paradigm shift compared to 2024, when Chinese models were perceived as second-tier alternatives.
For the best LLMs for research, M3 with its 1M token context could also become relevant — the ability to ingest dozens of complete documents in a single context changes the game for research workflows.
Practical deployment: what you need to know before using M3
Via the API
The API is already accessible according to the MiniMax developer guide. The MSA architecture is transparent to the developer: you send your tokens, M3 handles the attention internally. No need to adapt your code.
The agentic benchmarks reported by Codersera show that M3 is particularly performant in scenarios involving multiple tool calls and multi-step problem solving — the core of how an AI agent works.
Locally
This is where it gets interesting. With the weights on HuggingFace, you will be able to deploy M3 on your own infrastructure. For the best local LLMs, you will need to anticipate VRAM requirements — a model with 1M context and native multimodality is not lightweight.
A server with 2-4 high-end GPUs (A100 80GB or equivalent) will likely be necessary to leverage the maximum context. For more modest usage with a context of 128K-256K tokens, a more accessible configuration will suffice. The Claude 4 vs GPT-5 vs Gemini 3 comparison will need to include M3 as a local deployment option in its next update.
For French speakers
The question of French quality naturally arises. For the best LLMs in French, M3 has not yet been specifically evaluated. Chinese models have historically had lower performance in French compared to Claude or Gemini. But this is a point to verify empirically rather than presuppose — the quality of Chinese models in European languages has noticeably improved in 2025-2026.
❌ Common mistakes
Mistake 1: Confusing open-weights and open-source
M3's weights will be released, but the training code, data, and complete details of the MSA architecture likely will not be. Open-weights does not mean open-source in the strict sense. You can use and modify the model, but you cannot recreate it from scratch with the same data.
Mistake 2: Taking SWE-Bench Pro benchmarks as absolute truth
59% on SWE-Bench Pro is impressive, but it is a specific benchmark. As Thomas Wiegold points out, real-world quality in production can differ from benchmark scores. Test on your own use cases before migrating.
Mistake 3: Ignoring local inference costs
$0.60/M tokens via the API is cheap. But running M3 locally with a 1M context is expensive in terms of hardware. Calculate the TCO (Total Cost of Ownership) including hardware, electricity, and maintenance before choosing between the API and local. Appropriate hosting with Hostinger can be an intermediate option for less resource-intensive deployments.
Mistake 4: Using M3 for French without testing
Do not assume that English performance translates directly to French. Chinese models have different language profiles. Conduct quality tests on your specific domain before committing.
❓ Frequently Asked Questions
Is MiniMax M3 really open-source?
No, it is open-weights. The model weights will be published on HuggingFace under a permissive license, but the training code and datasets are not public. You can use, modify, and redistribute the model, but you cannot recreate the training process.
Does M3 replace DeepSeek V4 Pro?
Not exactly. M3 is better on SWE-Bench Pro and offers native multimodal with 1M context. But DeepSeek V4 Pro likely remains superior on pure general reasoning. Both models have complementary profiles depending on your use case.
What hardware is needed to run M3 locally?
For the full 1M token context, expect to need 2 to 4 A100 80GB GPUs or equivalents. For standard usage with 128K-256K tokens, a single high-end GPU may be enough. Exact specifications will be known upon the release of the weights.
Is the MSA architecture only an advantage for long context?
Primarily, yes. At short contexts (4K-32K tokens), the gain of MSA over standard attention is marginal. It is beyond 128K tokens that the 9.7x to 15.6x gap becomes significant. If you don't need long context, MSA is not a deciding factor.
Is M3 usable right now?
The API is operational. For local deployment, you will need to wait for the weights to be published on HuggingFace, which is announced within 10 days after June 1, 2026.
✅ Conclusion
MiniMax M3 is the first open-weights model that uncompromisingly combines frontier-level coding, native multimodality, and one million tokens of useful context — all made possible by a sparse attention architecture that changes the game on costs. The weights arrive in under 10 days, and the premium open-weights landscape has just shifted. To follow the evolution of this model in our monthly comparison, check out our classement des meilleurs LLM.