Cursor Composer 2.5: the coding model that rivals Opus 4.7 at a tenth of the price
🔎 $0.50 vs $5: the price war of coding models accelerates
On May 18, 2026, Cursor released Composer 2.5, its third proprietary model. The striking figure: $0.50 per million input tokens, exactly ten times less than Anthropic's Claude Opus 4.7 ($5/M). Except that on agentic coding benchmarks, the two models show nearly identical scores.
This is a strong signal of a market shift. AI IDE publishers are no longer just wrapping third-party models: they are building their own engines, optimized for a specific task, and breaking the dominant pricing structure. Cursor is no longer just a simple wrapper around Anthropic or OpenAI. It has become a model player.
The key points
- Composer 2.5 matches Claude Opus 4.7 on SWE-Bench Multilingual (79.8%) and CursorBench v3.1 (63.2%), according to data compiled by LushBinary.
- Price of $0.50/M input tokens vs $5/M for Opus 4.7, a 10:1 cost/performance ratio on agentic coding tasks.
- Technical base: Kimi K2.5 from Moonshot AI, post-trained by Cursor with 25× more synthetic tasks and a Sharded Muon optimizer, according to BuildFastWithAI.
- Exclusive availability within the Cursor IDE. No public API, no way to use it outside the ecosystem.
- GPT-5.5 remains ahead on Terminal-Bench 2.0 (+13 points), meaning Composer 2.5 is not universally superior.
Recommended tools
| Tool | Main usage | Price (May 2026, check on cursor.com) | Ideal for |
|---|---|---|---|
| Cursor (Composer 2.5) | AI IDE with integrated proprietary model | Pro subscription (~20$/month) | Developers who code in agentic mode on a daily basis |
| Claude Opus 4.7 | General-purpose frontier LLM | 5$/M input tokens (API) | Complex tasks outside of pure coding |
| GPT-5.5 | Frontier LLM, Terminal-Bench leader | OpenAI API pricing | Intense terminal and bash sessions |
| Claude Code | Anthropic coding CLI agent | Via Anthropic API | Purist CLI developers |
Benchmarks: where Composer 2.5 really shines
Frontier-level performance at a fraction of the cost is the central claim of the release. But the numbers require a close look to understand what is actually happening.
SWE-Bench Multilingual: the tie
On SWE-Bench Multilingual, Composer 2.5, Claude Opus 4.7, and GPT-5.5 all converge at 79.8%. This is the benchmark that measures a model's ability to solve real GitHub tickets across multiple programming languages.
An identical score across three models from different generations is rare. It suggests either a structural ceiling of the benchmark or specific optimization by each player on this metric. In either case, Composer 2.5 achieves this at a negligible cost.
CursorBench v3.1: the home playground
Composer 2.5 reaches 63.2% on CursorBench v3.1, once again tied with Opus 4.7 and GPT-5.5. This is Cursor's internal benchmark, designed to measure performance in a multi-file agent context.
The bias is obvious: Cursor optimizes its models on its own benchmark. But as long as the results are reproducible and the benchmark is public, it's a signal of real capability in an agentic environment.
Terminal-Bench 2.0: the flaw
This is where the narrative cracks. GPT-5.5 beats Composer 2.5 by 13 points on Terminal-Bench 2.0. This benchmark measures command-line, bash, and system operation skills.
Composer 2.5 is optimized for coding in a multi-file agent context, not for pure terminal sessions. It's an architectural choice, not a flaw. But it means that for CLI-centric workflows, GPT-5.5 remains the obvious choice.
| Benchmark | Composer 2.5 | Claude Opus 4.7 | GPT-5.5 |
|---|---|---|---|
| SWE-Bench Multilingual | 79.8% | 79.8% | 79.8% |
| CursorBench v3.1 | 63.2% | 63.2% | 63.2% |
| Terminal-Bench 2.0 | ← | ← | +13 pts |
| Input price (per M tokens) | $0.50 | $5 | Variable |
Sources: LushBinary, TechTimes
The architecture: Massively post-trained Kimi K2.5
Composer 2.5 is not a model built from scratch. It is an open-source base model, Moonshot AI's Kimi K2.5 (Beijing), on which Cursor applied massive post-training.
According to AlphaMatch, only ~25% of the compute comes from the Kimi K2.5 base model. The remaining 75% is Cursor's training work. This ratio is revealing: the value no longer lies in pre-training, but in what is done afterwards.
Three pillars of Cursor post-training
1. 25× more synthetic training tasks. Cursor generated millions of agentic coding scenarios — multi-file bug resolution, refactoring, feature addition — to create an ultra-specific alignment dataset.
2. RL with targeted textual feedback. Not generic RLHF. A reward system that precisely evaluates the quality of the generated code in an agentic context, with granular textual feedback rather than simple upvotes/downvotes.
3. Sharded Muon Optimizer. An optimization innovation that allows this volume of training to go through with superior compute efficiency. It's technical, but it's what makes the model profitable at $0.50/M tokens.
This strategy — taking a good open-source model and specializing it to the extreme — is exactly what other players are doing. The difference: Cursor controls the IDE, the benchmark, and the model. The loop is closed.
Strategic positioning: why Cursor is building its own models
Cursor boasts an ARR in the hundreds of millions of dollars in early 2026, according to Memeburn. At this scale, relying on Anthropic or OpenAI for the core of its product becomes a major strategic risk.
The trap of third-party model dependency
An AI IDE maker that exclusively uses third-party models faces three problems. The first is margin: every token generated via Opus 4.7 costs $5, and Cursor must either absorb it or pass it on to the user. The second is differentiation: if your IDE uses the same model as all your competitors, your competitive advantage lies solely in the UX. The third is vulnerability: Anthropic or OpenAI can change their prices, their access terms, or release their own IDE.
By building Composer 2.5, Cursor eliminates all three risks at once. The cost per session drops drastically, the model is exclusive, and nobody can take it away.
Competition is heating up: Claude Code and OpenAI Codex
Anthropic is pushing Claude Code, its CLI coding agent. OpenAI is developing Codex. Both are direct threats in the coding agent space. Composer 2.5 is Cursor's answer: a model that neither Anthropic nor OpenAI can offer, optimized for the Cursor ecosystem.
The parallel with Gemini 3.5 Flash, which beats Opus 4.7 and GPT-5.5 on agent benchmarks, is striking. The trend is clear: specialized and optimized models are outperforming generalist frontier models on their home turf.
What this actually changes for the developer
Kingy AI's analysis highlights the practical changes. Beyond the benchmarks, what does Composer 2.5 actually change in a Cursor developer's day-to-day?
Longer agentic sessions without breaking the bank
The price advantage isn't just theoretical. An intensive agentic session with Opus 4.7 can easily consume 5 to 10 million input tokens (accumulated context, re-reads, iterations). At $5/M, that costs $25 to $50 per complex session. At $0.50/M with Composer 2.5, it's $2.50 to $5.
This changes behavior. Developers no longer hesitate to launch heavy agentic tasks — refactoring an entire module, framework migration, cross-file debugging — because the cost is no longer a psychological barrier.
Better multi-file consistency
Composer 2.5 is specifically trained to maintain consistency when modifying multiple files in parallel. This is where post-training on synthetic agentic tasks pays off: the model "understands" that a change in file A implies coordinated modifications in files B and C, without losing the thread.
The real limitations
This isn't a miracle model. On pure algorithmic logic tasks outside of an agentic context, Opus 4.7 likely remains superior. In pure terminal use, GPT-5.5 crushes it. And above all, Composer 2.5 is locked inside Cursor. If you change your IDE, it disappears.
To choose the right model based on your usage, comparatif Claude, GPT, Gemini, Llama : quel modèle choisir en 2026 ? offers a decision-making framework by profile.
The race for model sovereignty: what Composer 2.5 means for the market
The LLM industry is going through a maturation phase. The first wave (2023-2024) was about access: who could use the best models. The second wave (2025) was about agentic capabilities: who could get complex tasks executed. The third wave, the one Composer 2.5 illustrates, is that of model sovereignty.
The model as a moat, not a commodity
When Cursor, Google with Gemini 3.5 Flash, and other players start building custom models, the LLM ceases to be an interchangeable commodity. It once again becomes a proprietary competitive advantage.
This is a paradoxical step backward. Open-source (Kimi K2.5, Llama, etc.) democratized access to base models. But the value has shifted toward specialized post-training, which remains proprietary. The open model is the foundation. The specialization is the moat.
The winners and the losers
Winners: Tool vendors that have enough usage data and compute to post-train their own models. Cursor, Google, and likely Microsoft/GitHub eventually.
Losers: "Pure" model providers that do not control the application layer. If Cursor can offer 90% of Opus 4.7's performance for 10% of the price, Anthropic's perceived value in the coding segment mechanically decreases.
Unknown: Developers. More choice, but also more lock-in. Each IDE ecosystem will have its exclusive model, and migrating from one tool to another will mean changing "brains".
The value for money in detail
Talking about "ten times cheaper" is accurate but incomplete. You need to look at the total cost of an agentic coding session to understand the real impact.
Concrete scenario: refactoring a backend module
Let's take a typical task: refactor a 15-file module, add tests, update types. Initial context: 200K tokens. With each iteration, the context grows by 50 to 100K tokens. Over 10 iterations, you easily reach 1M tokens in cumulative input.
| Model | Input cost (1M tokens) | Cost for this task |
|---|---|---|
| Claude Opus 4.7 | 5$/M | ~5$ |
| GPT-5.5 | OpenAI API rate | Variable |
| Composer 2.5 | 0.50$/M | ~0.50$ |
On a single task, the difference is modest. But a heavy user can launch 20 to 50 agentic sessions per week. At the scale of a team of 10 developers over a month, the savings amount to thousands of dollars.
Why Cursor can afford this price
The cost of 0.50$/M tokens isn't magic. It's the combination of three factors: an open-source base model (no pre-training cost to amortize), infrastructure optimization via Sharded Muon, and a business model where the Cursor Pro subscription (~20$/month) absorbs part of the margin.
Cursor isn't selling tokens. It's selling an IDE subscription with an integrated model. Token pricing is a comparison signal, not necessarily the actual revenue structure.
❌ Common mistakes
Mistake 1: Confusing "matching on a benchmark" with "matching in general"
Composer 2.5 matches Opus 4.7 on SWE-Bench Multilingual and CursorBench. That doesn't mean it matches Opus 4.7 at everything. For general reasoning, writing, and long document analysis, Opus remains vastly superior. Composer 2.5 is a specialist, not a polymath.
Mistake 2: Thinking Composer 2.5 will replace all models in Cursor
The Cursor IDE still offers Claude Opus 4.7, GPT-5.5, and other models. Composer 2.5 is an additional option, optimized for multi-file agentic tasks. For a one-off code review or a design question, Opus or GPT-5.5 may remain better choices.
Mistake 3: Ignoring lock-in
Composer 2.5 only exists in Cursor. If you build workflows that depend on its specificities (multi-file consistency, low cost for long sessions), migrating to Claude Code or another tool will be painful. This is exactly what Cursor wants.
Mistake 4: Comparing token prices without context
$0.50/M vs $5/M is a 10:1 ratio. But if Composer 2.5 needs 2× as many tokens to achieve the same result (larger context, more iterations), the real ratio becomes 5:1. It's still massive, but you have to stay honest about the metrics.
❓ Frequently Asked Questions
Is Composer 2.5 available via API?
No. It is exclusively integrated into the Cursor IDE. You cannot call it from your own application or via an external API client.
What is the exact relationship with Kimi K2.5?
Moonshot AI's Kimi K2.5 serves as the base model. Cursor then applies massive post-training (75% of the total compute) to specialize it in agentic coding. The final model is proprietary.
Does Composer 2.5 beat GPT-5.5?
No. It matches GPT-5.5 on SWE-Bench Multilingual and CursorBench, but GPT-5.5 remains significantly superior on Terminal-Bench 2.0 (+13 points). The ranking of the best agentic LLMs reflects this hierarchy.
Is it worth switching to Cursor just for Composer 2.5?
If you do intensive agentic coding (multi-file, long sessions), yes. The cost/performance ratio is unbeatable. If you use AI occasionally for quick questions, a Cursor subscription probably isn't justified.
Will Cursor ever open-source Composer 2.5?
Nothing indicates this. Cursor's strategy is to create lock-in through the model. Opening access would destroy this advantage. The best AI tools for code show that every editor is moving toward the same closed-ecosystem logic.
✅ Conclusion
Cursor Composer 2.5 confirms a turning point: in agentic coding, specialized post-training beats the generalist frontier model at a fraction of the cost. The 79.8% on SWE-Bench Multilingual at $0.50/M tokens is not a benchmark trick — it is the result of a coherent training strategy based on Kimi K2.5. The rest is deliberate lock-in, and for now, it works.