GPT-5.6 Sol: OpenAI launches the preview of a new model amid the early price war

LLM & Modèles 🟢 Beginner ⏱️ 15 min read 📅 2026-06-28

GPT-5.6 Sol : OpenAI launches the preview of a new model at the very start of a price war

🔎 Why is OpenAI disrupting the entire market with a limited preview?

On June 18, 2026, OpenAI announced GPT-5.6 in three versions: Sol, Terra, and Luna. Eight days later, prediction markets on Yahoo Finance reported that traders were massively abandoning their bets on a classic GPT-5 in favor of a wide launch of GPT-5.6 Sol as early as July 2026.

The timing is not coincidental. Anthropic just suffered an export control order on Fable 5 and Mythos 5, and Claude Opus 4.8 dominates code benchmarks with 80.3% on SWE-Bench Pro compared to 58.6% for GPT-5.5. OpenAI can no longer win on pure performance. It is therefore changing the rules of the game: the price war.

This preview, limited to around 20 organizations and coordinated with the US government following the Executive Order of June 2, 2026, resembles a political maneuver as much as a commercial one. The message is clear: OpenAI is playing on all fronts — benchmarks, pricing, regulation — to regain control of the narrative.

The essentials

GPT-5.6 exists in three persistent tiers: Sol (flagship), Terra (mid-range), Luna (lightweight). These are not size variants, but service levels designed to endure.
Sol reaches 91.91% on Terminal-Bench 2.1 in ultra mode, becoming the first model above 50% on the Agent's Last Exam (50.9%).
Prices are a calculated aggression: Sol at $5/$30 per million tokens (input/output), Terra at $2.50/$15, Luna at $1/$8 (June 2026, check on openai.com).
Access is restricted to ~20 organizations after coordination with the White House, under OpenAI's cyber readiness framework.
Predictive prompt caching offers a 90% reduction on reads with a minimum cache of 30 minutes.

Recommended tools

Model	Main usage	Input/output price per 1M tokens (June 2026)	Ideal for
GPT-5.6 Sol	Complex agents, CLI workflows	5$ / 30$	Critical agentic tasks, benchmarks
GPT-5.6 Terra	Perf/price balance	2.50$ / 15$	Daily production, RAG
GPT-5.6 Luna	Lightweight tasks, high velocity	1$ / 8$	Bulk processing, classification
Claude Opus 4.8	Code, long reasoning	15$ / 75$	SWE-Bench, production code
Claude Fable 5	Cost-effective code	5$ / 25$	Budget alternative to Sol
DeepSeek V4 Pro (Max)	High-perf open-source	Variable	Self-hosting, sovereignty

The three tiers of GPT-5.6: Sol, Terra, Luna are not what you think

OpenAI is abandoning the logic of model sizes (small/medium/large) in favor of service tiers. Sol, Terra, and Luna are designed as persistent tiers that will outlive GPT-5.6.

Sol: the flagship targeting agentic SOTA

Sol is the attack model. In ultra mode, it mobilizes sub-agents to break down complex tasks. According to the OpenAI blog, it reaches 91.91% on Terminal-Bench 2.1 (CLI workflows), compared to 88% for Mythos 5 and 83.4% for GPT-5.5.

It is also the only model to exceed 50% on the Agent's Last Exam, a benchmark designed to test an LLM's ability to execute autonomous action chains without human intervention. The score of 50.9% in code mode remains modest in absolute terms, but it marks a symbolic milestone.

Terra: the pragmatic mid-range option

Terra targets production use cases where the performance-to-price ratio is paramount. Its score on Terminal-Bench 2.1 is not publicly detailed, but VentureBeat positions it as a direct competitor to Claude Sonnet 4.6 and DeepSeek V4 Pro (High).

At $2.50/$15, Terra is priced to cannibalize the segment where Anthropic and DeepSeek used to make their margins. It is probably the model that will have the most commercial impact if the wide launch is confirmed in July.

Luna: the volume model

At $1/$8, Luna is positioned below free and open-source models when factoring in infrastructure. It targets bulk workloads: classification, extraction, request routing. The idea is to make the use of GPT-5.6 trivial from a cost perspective for non-critical tasks.

The price war: an assumed predatory pricing strategy

The API price comparison table tells an unambiguous story. According to the analysis by CostLens, OpenAI has structured GPT-5.6 to attack every Anthropic price segment.

Model	Input / 1M tokens	Output / 1M tokens	Output/input ratio
GPT-5.6 Luna	1$	8$	8x
GPT-5.6 Terra	2.50$	15$	6x
GPT-5.6 Sol	5$	30$	6x
Claude Fable 5	5$	25$	5x
Claude Opus 4.8	15$	75$	5x
GPT-5.5	5$	15$	3x
GLM-5.2	0.80$	4.80$	6x
Grok 4.3	0.75$ - 2$	3$ - 7.50$	4x

Why this is predatory pricing

Anthropic forced OpenAI's hand by dominating the benchmarks. OpenAI's response is purely economic: Sol is priced at 5$/30$ while Claude Opus 4.8 costs 15$/75$. Even Fable 5, supposed to be Anthropic's budget model, is more expensive on input than Sol.

Furthermore, the output/input ratio of GPT-5.6 (6x) is more aggressive than that of GPT-5.5 (3x). OpenAI is subsidizing input to attract developers, knowing that output is where the real revenue is generated. It's a SaaS classic applied to LLMs.

As noted by Generative AI Pub, the benchmark war is over. We have entered the war of launch timelines and prices.

What this means for open-source models

DeepSeek V4 Pro (Max) and GLM-5.2 remain competitive on paper, especially in self-hosting. But Luna's pricing at 1$/8$ via API makes the "open-source = cheaper" argument increasingly difficult to sustain when factoring in the total infrastructure cost.

The real danger for the open-source ecosystem is not raw performance. It is that developers will no longer have an economic reason to manage their own infrastructure if OpenAI's API costs less than a server's electricity. For those who want to nonetheless keep control, the local LLM installation guide remains relevant, but the financial argument is crumbling.

Benchmarks: Sol wins on agentic, not on everything

The GPT-5.6 benchmarks need to be read with nuance. OpenAI carefully selects what it highlights.

Terminal-Bench 2.1: the obvious victory

91.91% in ultra mode, 88.76% in max mode. This is a significant leap from the 83.4% of GPT-5.5 and even the 88% of Mythos 5. Terminal-Bench measures the ability to execute command-line workflows — a direct proxy for agentic capabilities. Sol is clearly designed for this benchmark.

Agent's Last Exam: the symbolic breakthrough

50.9% is the first score above 50% on this benchmark. But keep in mind that this benchmark is relatively new and its correlation with real-world production performance remains to be established. A score of 50.9% still means the model fails half the time.

Biology and cyber: measured progress

On GeneBench v1 (biology), Sol improves upon GPT-5.5 with a biology recall of 94.8%. On ExploitBench (cyber), the score reaches 81.6%. These figures are solid but not revolutionary.

What the benchmarks don't show

OpenAI does not publish a direct comparison with Claude Opus 4.8 on SWE-Bench Pro, where Anthropic dominates (80.3% vs 58.6% for GPT-5.5). If Sol had beaten Opus 4.8 on this benchmark, OpenAI would have shouted it from the rooftops. The silence is telling.

For a broader comparison of available models, including those from Anthropic and Google, our monthly comparison of the best LLMs covers the entire landscape.

Reasoning modes: max and ultra change the game

GPT-5.6 introduces two new reasoning modes that deserve attention.

Max mode: extended reasoning without sub-agents

Max mode extends the model's chain of thought without delegating to sub-agents. It's the equivalent of an o1-preview but with the Sol base. On Terminal-Bench 2.1, max mode reaches 88.76% — already at the level of Mythos 5.

Ultra mode: sub-agents and task decomposition

Ultra mode is the real novelty. Sol automatically decomposes problems into sub-tasks, assigns them to sub-agents, and aggregates the results. This is what enables the leap from 88.76% to 91.91% on Terminal-Bench 2.1.

This mode naturally consumes more tokens and more time. It is not suited for simple requests. But for complex agentic workflows — script execution, navigating systems, API chains — it's a paradigm shift. For developers who are building agents, our page on the best LLMs for AI agents details the practical implications.

Predictive prompt caching: GPT-5.6's silent weapon

A technical detail often overlooked in media coverage: GPT-5.6's prompt caching is radically improved.

Writes are billed at 1.25x the standard price. Reads benefit from a 90% reduction. The cache has a minimum lifetime of 30 minutes. This isn't just a minor optimization — it's an economic change.

For applications that send long system prompts (RAG, agents with rich context), caching with a 90% reduction can divide the bill by 3 to 5 in production. It's a structural competitive advantage that Anthropic has not yet matched at this level of predictability.

Access Restriction: When Regulation Becomes a Marketing Argument

The Legal Framework: Executive Order of June 2, 2026

The WSJ details the context: the Trump Executive Order of June 2, 2026 imposes a 30-day process for government benchmarking of new models before broad release. Anthropic was hit with an export control order on Fable 5 and Mythos 5, which significantly slowed their deployment.

OpenAI's Strategy: Proactive Coordination

OpenAI chose the opposite path: proactive coordination with the White House before the announcement. As a result, the preview is limited to ~20 organizations, but OpenAI avoids a post-announcement block. It's a fine political calculation.

OpenAI publicly criticizes the government gating process while complying with it. According to Constellation Research, the company argues that proactive transparency should be enough, not a 30-day process that disadvantages it compared to non-US competitors.

Risk Classification: "High" but not "Critical"

All three models (Sol, Terra, Luna) are classified as "High" cyber/biological risk according to OpenAI's internal framework. But Sol does not cross the "Cyber Critical" threshold, which avoids a total block. It's a deliberate balance: impressive enough to impress, but not enough to trigger a regulatory veto.

700K GPU Hours of Red-Teaming

OpenAI dedicated 700,000 A100e GPU hours to automated red-teaming before this preview. This is a figure that signals the desire to show that safety is taken seriously — exactly what the government wants to hear.

Departures and internal context: the talent war in the background

The GPT-5.6 announcement comes against a backdrop of massive talent leaks at OpenAI and Google DeepMind. As detailed in our article on Google DeepMind saigné à blanc, Nobel laureate John Jumper joined Anthropic and Transformer architect Noam Shazeer headed to OpenAI.

This context is important for reading the GPT-5.6 announcement. OpenAI needs to show that it remains the gold standard despite the departures. The Sol model serves as much to reassure business partners as to demonstrate technical superiority. The question of whether this superiority is real or constructed by the choice of benchmarks remains open.

For those who want to understand how models are positioned globally, our article Claude, GPT, Gemini, Llama : quel modèle choisir en 2026 ? offers an overview.

Cerebras deployment: 750 tokens/sec planned for July

One detail that could be a game-changer for the user experience: OpenAI is planning a deployment on Cerebras infrastructure at 750 tokens per second in July 2026.

To put this into context, most current models generate between 50 and 150 tokens/sec in streaming. 750 tokens/sec means that Sol could generate a 2,000-word article in about 5 seconds. This is an order of magnitude leap that would make voice interactions and real-time agents radically more fluid.

This deployment echoes OpenAI's voice/realtime orientation, as with GPT-Realtime-2 which offers three voice models reasoning in real time. The combination of Sol + Cerebras + Realtime could create an agentic experience with no perceptible latency.

What prediction markets say

According to data from Yahoo Finance, traders have massively abandoned their positions on a classic GPT-5 launch to reposition themselves on a broad GPT-5.6 Sol launch in July 2026.

Prediction markets are an imperfect but non-negligible indicator. They aggregate rumors, leaks, and weak signals that traditional media do not always catch. The massive realignment of positions suggests that market actors expect the limited preview to be short-lived.

However, the Executive Order and the government gating process could delay this timeline. A gap between market expectations and regulatory reality is always possible.

Impact on the competitive landscape

Anthropic: the main enemy

Claude Opus 4.8 remains superior on SWE-Bench Pro (80.3% vs ~60% estimated for Sol based on OpenAI's lack of communication on this benchmark). But Sol's pricing at $5/$30 compared to $15/$75 for Opus 4.8 creates immense economic pressure.

Anthropic must now choose: maintain its prices and lose market share, or lower them and reduce its margins while the company is not yet profitable. This is the classic predatory pricing trap, and OpenAI is setting it methodically.

Google: the discreet player

Gemini 3.1 Pro and Gemini 3 Pro Deep Think are not directly threatened by this pricing. Google has its own distribution ecosystem (Cloud, Android, Search) which makes a purely API comparison less relevant. But the signal sent by GPT-5.6's pricing could force Google to adjust its Cloud AI pricing.

xAI and Grok 4.3: the budget segment threatened

Grok 4.3 at $0.75-$2 for input remains the cheapest on the market. But Luna at $1/$8 with GPT-5.6 output quality could cannibalize the use cases where developers chose Grok for the price. The output quality is often worth the price difference.

DeepSeek and open-source: the mounting pressure

DeepSeek V4 Pro (Max) at 88 points on the overall leaderboard remains a solid alternative, especially in self-hosting. But developers who choose open-source for cost reasons will have to rethink their calculations. The comparison of the best free LLMs and the best LLMs for coding will need to be updated once Sol is publicly available.

❌ Common mistakes

Mistake 1: Confusing tiers with model sizes

Sol, Terra, and Luna are not size variants like "large" vs "small". These are service tiers with different performance, security, and pricing guarantees. Choosing Luna thinking you are getting "a small Sol" is a misunderstanding of the architecture.

Mistake 2: Comparing prices without caching

The raw pricing table makes it seem like GLM-5.2 or Grok 4.3 are cheaper. But with a 90% reduction on reads via GPT-5.6's predictive caching, the effective cost in production can be lower. Do the math with your cache/hit ratio before deciding.

Mistake 3: Assuming the limited preview is purely technical

The restriction to 20 organizations is not a sign that the model is unstable. It is a political choice to coordinate with the US government. The model is likely ready for broader deployment, but OpenAI does not want to repeat Anthropic's mistake with the export control order.

Mistake 4: Ignoring ultra mode for simple tasks

Ultra mode with sub-agents consumes significantly more resources. Using it for trivial queries is a waste. Max mode is sufficient for 90% of common use cases. Reserve ultra for truly complex multi-step workflows.

❓ Frequently Asked Questions

Is GPT-5.6 Sol available to the public?

No. In June 2026, the preview is limited to around 20 organizations. Prediction markets are betting on a wide launch in July, but this depends on the 30-day government validation process.

Does Sol beat Claude Opus 4.8 on all benchmarks?

No. Sol dominates on Terminal-Bench 2.1 (91.91% vs unreported for Opus 4.8) and the Agent's Last Exam (50.9%). But Anthropic claims 80.3% for Opus 4.8 on SWE-Bench Pro, a benchmark that OpenAI does not highlight for Sol.

Will GPT-5.6 pricing stay after the preview?

There is no guarantee that these prices are final. Preview pricing is often aggressive to capture early adopter developers. But given the predatory pricing strategy described by CostLens, a significant increase would be contradictory to the goal of regaining market share.

What does the "High" cyber/bio risk rating mean?

According to OpenAI's internal framework, "High" means the model could be used to facilitate harmful activities in cybersecurity or biology, but does not cross the "Critical" threshold that would trigger a block. Sol, Terra, and Luna are all rated "High".

Is the Cerebras deployment at 750 tokens/sec guaranteed for July?

This is an OpenAI forecast, not a contractual commitment. Infrastructure deployments can be delayed. But if it happens, the impact on the user experience will be considerable, especially when combined with realtime voice models.

✅ Conclusion

GPT-5.6 Sol is not just a new model — it's a declaration of price war. OpenAI can no longer beat Anthropic on code benchmarks, so it is competing on prices with a calculated three-tier aggression. The quality is real on agentic tasks (Terminal-Bench 2.1, Agent's Last Exam), the caches are structurally advantageous, and the regulatory coordination is shrewd. It remains to be seen whether the wide launch in July will materialize — and whether Anthropic will respond with a counter-attack on prices or with a new model. To follow the evolution of the landscape, our monthly comparison of the best LLMs is continuously updated.

#intelligence-artificielle #guerre-des-prix #gpt-56-sol #nouveaux-modeles-ia #OpenAI

📚 Related articles

LLM & Modèles 🟢 Débutant 12 min

Poolside Laguna M.1: the 225B open-source model for the coding agent, Apache 2.0

Discover Poolside Laguna M.1, a 225B-parameter open-source model under Apache 2.0, built to revolutionize coding agents.

2026-06-27 18:06

LLM & Modèles 🟢 Débutant 15 min

FrontierCode: Cognition's benchmark that buries SWE-Bench and ranks code agents by the real quality of pull requests — Fable 5 at 46.3%, Opus 4.8 at 34.3%, GPT-5.5 at 25.5%

Discover FrontierCode, Cognition's new benchmark replacing SWE-Bench by evaluating the real quality of code agents' pull requests.

2026-06-26 17:03