DeepSeek V4-Pro : the permanent 75% price drop accelerating the LLM war

LLM & Modèles 🟢 Beginner ⏱️ 15 min read 📅 2026-06-12

DeepSeek V4-Pro: The permanent 75% price drop accelerating the LLM war

🔎 A frontier model for less than a cent per million tokens

On May 22, 2026, DeepSeek made a decision that sent shockwaves through the LLM market. The 75% discount on its V4-Pro model, initially planned as a temporary promotion set to expire on May 31 at 15:59 UTC, has become permanent.

The message is clear: this is no longer a launch offer, it's the new price floor. And this price floor puts existential pressure on the entire value chain of American models.

The consequences go far beyond a simple price adjustment. By making this drop permanent, DeepSeek is turning a marketing stunt into an industrial strategy. China is no longer just competing on performance — it is attacking directly on cost, and the gap has reached unprecedented levels.

The key points

DeepSeek V4-Pro permanently moves to $0.003625/M tokens in input and $0.87/M tokens in output, marking a 75% drop made permanent on May 22, 2026.
The model is approximately 11.5x cheaper than GPT-5.5 in input and 34.5x cheaper in output, with coding performance that holds its own in comparison.
The whole of China (DeepSeek, Xiaomi, Qwen, Kimi, GLM) has executed six price cuts in the first half of 2026, pushing prices down to marginal cost.
For developers, V4-Pro becomes the rational option for high-volume token production workloads.
The geopolitical implications are major: pricing becomes a strategic weapon in the AI war.

Recommended tools

Tool	Main usage	Price (June 2026, check on deepseek.com)	Ideal for
DeepSeek V4-Pro	Production workload, coding, RAG	$0.003625/M input, $0.87/M output	High-volume apps, tight budget
DeepSeek V4-Pro (Max)	Complex tasks, agentic	Surcharge compared to standard Pro	Advanced coding, deep reasoning
DeepSeek V4 Flash	Routing, classification, light tasks	Lower than V4-Pro	High-throughput, low-latency
GPT-5.5	Global benchmarks, agentic	$5/M input, $30/M output	Use cases where raw score matters
Gemini 3.1 Pro	Multimodal, long context	Varies by tier	Google Cloud integration

The new V4-Pro prices: the exact breakdown

DeepSeek V4-Pro's standard pricing breaks down into three lines, according to the detailed analysis by Codersera and TokenMix.

Uncached input costs $0.435 per million tokens. Cached input drops to $0.003625 per million tokens. Output remains at $0.87 per million tokens.

It's the cached input line that changes everything. In a real production workflow — RAG, chatbots, iterative agents — a massive proportion of input tokens is repeated from one call to the next (system context, reference documents, conversation history). The cache causes the effective cost to drop dramatically.

Concretely, for an application that sends 1 million tokens in input with a 90% cache hit rate (a common scenario in RAG), the effective input cost drops below 5 cents. Add 870,000 tokens in output, and the total bill remains under a dollar.

At OpenAI for the same volume with GPT-5.5, you would pay $5 in input + $30 in output = $35. That's a factor of 35 to 50x depending on the cache rate.

The PromptCost analysis confirms that V4-Pro is the first frontier model to drop below the 50-cent mark per million tokens in effective cost (cache included).

DeepSeek V4-Pro vs GPT-5.5 : the real comparison

The central question is no longer "Is V4-Pro as good as GPT-5.5?" but "Is V4-Pro good enough to justify a 34x price difference?"

Raw benchmarks: advantage GPT-5.5

According to the comparison by BenchLM and the analysis by DataCamp, GPT-5.5 dominates on overall benchmarks. With a score of 91 in the General category compared to 88 for V4-Pro (Max), the gap is real but not abyssal.

In agentic, the gap widens: GPT-5.5 reaches 98.2 compared to 88.1 for the best Chinese self-host model (Kimi K2.6). If you are building complex autonomous agents with long reasoning chains, GPT-5.5 remains the gold standard.

Coding: V4-Pro takes the lead

It is on the coding front that DeepSeek V4-Pro is surprising. In coding benchmarks, V4-Pro (Max) regularly beats GPT-5.5 on generation, debugging, and refactoring tasks. For developers who use LLMs as code copilots, this is the deciding factor.

The conclusion of BenchLM is unambiguous: V4-Pro is the best choice if coding is a priority or if the budget is tight.

Criterion	DeepSeek V4-Pro (Max)	GPT-5.5	Winner
General score	88	91	GPT-5.5
Agentic score	—	98.2	GPT-5.5
Coding	Superior	Inferior	V4-Pro
Input price/M tokens	$0.435 ($0.003625 cached)	$5	V4-Pro (11.5x to 1379x)
Output price/M tokens	$0.87	$30	V4-Pro (34.5x)
Context window	1M	1M	Tie

For an overview of the best models for coding, check out our comparison of the best LLMs for coding in 2026.

The Chinese price war: a systemic phenomenon

DeepSeek is not an isolated case. The 75% price drop is part of a broader market dynamic that DigiTimes describes as a "collision course" between Chinese labs.

Six price cuts in six months

According to the tracking by APIDog and the analysis of Dev.to, the first half of 2026 saw at least six rounds of tariff reductions declared as permanent by Chinese players.

The Chinese pricing landscape in June 2026 is dizzying:

Model	Lab	Output price/M tokens	Positioning
DeepSeek V4-Pro	DeepSeek	$0.87	Frontier, coding
MiMo	Xiaomi	Undisclosed	Aggressive new entrant
Qwen 3.6	Alibaba	~$0.90	General-purpose open source
Kimi K2.6	Moonshot AI	$0.07 (cached)	Long context, agentic
GLM-5.1	Z.AI	~$0.20	Reasoning, Chinese

Three of these reductions were declared permanent during H1 2026, meaning that the labs are committing to these prices as structural, not promotional.

The logic of marginal cost

The Chinese strategy relies on a simple economic calculation: the marginal cost of LLM inference continues to drop thanks to hardware improvements (Huawei Ascend GPUs, local chips) and architectural optimizations (MoE, aggressive quantization, KV cache optimization).

When your marginal cost of production falls below $0.10/M tokens, selling at $0.87 remains highly profitable. And this gives you a margin of maneuver that American labs, with their NVIDIA infrastructure costs and astronomical R&D expenses, cannot match.

The guerre des LLM open source documents this dynamic: China has made pricing a strategic variable, not a byproduct of competition.

Why developers need to rethink their stack

The 34x price gap is not a statistical curiosity. It's a paradigm shift that renders certain architectures obsolete.

The cost-benefit calculation that changes everything

Let's take a medium-sized RAG application: 10,000 requests per day, 50,000 input tokens (80% cached), 2,000 output tokens.

With GPT-5.5: $2,500/day in input + $6,000/day in output = $8,500/day, or ~$255,000/month.

With V4-Pro: $0.36/day in input + $174/day in output = ~$174/day, or ~$5,200/month.

The monthly savings exceed $249,000. Over a year, that's almost $3 million. This isn't an optimization; it's a change in orders of magnitude that makes previously impossible business models viable.

The workloads shifting to V4-Pro

Certain use cases naturally shift to DeepSeek. Automated code review, high-volume RAG, document classification, customer-facing chatbots with extended context — in all these scenarios, V4-Pro's performance-to-price ratio is unbeatable.

The workloads that remain on GPT-5.5 or Claude Opus 4.7 are those where cost is secondary: autonomous agents with complex multi-step reasoning, critical tasks where a single benchmark point translates to millions of dollars in value, or integrations where the OpenAI/Anthropic ecosystem provides a functional advantage.

If you are exploring free options to test these models, our guide to the best free LLMs summarizes the available access.

The geopolitical implications of pricing as a weapon

Using price as a strategic lever is not new in the tech industry. China has done it with solar panels, batteries, and telecoms. With LLMs, the logic is identical but the impact is potentially deeper.

The "good enough" trap

DeepSeek's strategy does not require beating GPT-5.5 on all benchmarks. It just needs to be "good enough" on the tasks that represent 80% of real-world use cases, while costing 34x less.

This is exactly what the ranking shows: V4-Pro (Max) at 88 versus GPT-5.5 at 91 in generalist capabilities. The 3-point gap does not justify a 34x price multiplier for the vast majority of companies.

The danger for American labs is structural. The more developers invest in architectures based on V4-Pro, the higher the switching cost becomes. Prompts, fine-tunings, preprocessing pipelines — all of this is optimized for a specific model. Ultimately, the lock-in reverses: it is no longer OpenAI that captivates its users, it is DeepSeek.

The limited American response

American labs have little room for maneuver. Their infrastructure costs (market-price NVIDIA H200/B300 GPUs), their personnel costs (ML engineers at $500k-$1M/year), and their fundamental research investments do not allow them to drop to these price levels without destroying their margins.

Sanctions on Chinese chips complicate the picture even further. While they slow down Chinese R&D on cutting-edge architectures, they do not prevent the optimization of existing models nor the reduction of inference costs. The paradoxical result is that sanctions push Chinese labs to excel precisely in efficiency — and it is efficiency that determines price.

The impact on the enterprise market

Enterprises do not react instantly to LLM price changes. But the signals are clear.

The three-phase adoption cycle

The first phase, currently underway, is that of early adopters. Tech-savvy startups and individual developers migrating their non-critical workloads to V4-Pro to test the price-to-performance ratio.

The second phase, expected in the second half of 2026, will see mid-sized companies make the switch. SaaS vendors, agencies, fintechs — all those with significant volumes and margins squeezed by the market.

The third phase involves large enterprises. Slower to migrate, they are also the most sensitive to the total cost of ownership. When a CIO presents $3M/year in savings to the executive committee, the decision no longer depends on benchmarks but on risk management.

What this means for US LLM providers

Beyond Tomorrow analyzes the enterprise impact of this cut: OpenAI and Anthropic's enterprise contracts will face increasing pricing pressure. Customers will negotiate harder, and volume discounts will need to approach Chinese levels on large contracts.

Market segmentation will become more pronounced. US models will position themselves as the "premium" tier — comparable to Apple's positioning in hardware. Chinese models will capture the volume. This is a viable scenario for OpenAI and Anthropic, but only on the condition that they maintain a sufficient qualitative advantage. However, this advantage is shrinking quarter after quarter.

V4-Pro in the DeepSeek Ecosystem: Pro vs Flash vs Max

Not all DeepSeek V4 models are created equal. Understanding the lineup is essential for making the right choice.

V4-Pro (standard): the workhorse

With a score of 70 in generalist tasks, the standard V4-Pro is the base model. It remains suitable for simple tasks, but it's the V4-Pro in the Max configuration that reaches 88 points and rivals GPT-5.5.

The difference between the configurations lies in the reasoning parameters activated at inference (thinking budget, number of verification passes). The higher the budget, the better the result — but the output cost increases proportionally.

V4-Flash: the routing model

V4-Flash (Max) reaches 76 points in generalist tasks with a significantly lower cost than V4-Pro. Its optimal use is not complex reasoning but routing, classification, and high-frequency tasks where latency matters more than depth.

The architecture recommended by most ML engineers in 2026 is a two-tier system: Flash for routing and simple tasks, Pro (Max) for complex tasks. This optimizes the average cost per request while maintaining quality.

The local option

For companies that don't want to depend on a Chinese API, DeepSeek models are available open source. Our local LLM installation guide details the options with Ollama and LM Studio. The comparison of the best local LLMs can help you choose the right hardware configuration.

Limits to keep in mind

Despite the exceptional value for money, V4-Pro has real weaknesses that you need to be aware of before migrating.

Agentic remains the weak point

With no DeepSeek model in the top 10 of the agentic leaderboard (the best Chinese self-hosted model being Kimi K2.6 at 88.1 compared to 98.2 for GPT-5.5), complex multi-agent architectures remain the territory of American models. If your product relies on agents that plan, execute, and iterate autonomously, V4-Pro is not yet sufficient.

For advanced agentic use cases, check out our guide to the best LLMs for AI agents.

Support and compliance

Regulated companies (banking, healthcare, defense) may face legal hurdles related to data processing by Chinese infrastructure. DeepSeek's confidentiality clauses, although they guarantee that API data is not stored, are not always enough to satisfy European or American compliance requirements.

Language is another factor. While the best LLMs in French are still dominated by American and European models, V4-Pro remains performant but not optimal when it comes to Francophone linguistic nuances.

Dependence on a pricing strategy

The main risk is that these prices are not sustainable. If DeepSeek or its backers decide to raise rates, companies that have migrated massively will find themselves trapped. This is why caution recommends maintaining a multi-model architecture with a controlled switching cost.

❌ Common mistakes

Mistake 1: Comparing only input prices without cache

Many developers look at the input price without cache ($0.435/M) and conclude that the advantage is "only" 11.5x. In reality, in production workloads with cache, the effective price drops to $0.003625/M — an advantage of 1379x on input. Cache is the key to V4-Pro's business model.

Mistake 2: Migrating 100% at once to V4-Pro

Switching an entire production pipeline all at once is an unnecessary risk. The right approach is to shadow-run V4-Pro in parallel with your current model for 2-4 weeks, measure the quality deltas on your business metrics, and then gradually migrate the least sensitive workloads.

Mistake 3: Ignoring the Max configuration

Using V4-Pro in standard configuration (score 70) and concluding that the model is mediocre is a common mistake. It's the equivalent of buying a sports car and driving in neutral. The Max configuration, which activates extended reasoning, is the one that reaches 88 points and rivals GPT-5.5.

Mistake 4: Neglecting the switching cost

Migrating to V4-Pro is not just about changing an API key. Prompts optimized for GPT-5.5 do not always translate well, output formats may differ, and guardrails must be recalibrated. Budget at least 2-3 weeks of migration engineering.

❓ Frequently Asked Questions

Is DeepSeek V4-Pro's 75% price drop really permanent?

Yes. According to AIToolBriefing, DeepSeek confirmed on May 22, 2026, that the reduction, initially scheduled to expire on May 31 at 15:59 UTC, was made permanent. Three other Chinese reductions followed the same pattern in H1 2026.

Is DeepSeek V4-Pro really 34x cheaper than GPT-5.5?

For output, yes: $0.87/M vs $30/M. For cached input, the gap reaches 1379x ($0.003625/M vs $5/M). For uncached input, it is 11.5x. The actual factor depends on your cache hit rate, which ranges from 50% to 95% depending on workloads.

Does V4-Pro replace GPT-5.5 for all use cases?

No. For coding and high-volume production workloads, V4-Pro is often the best choice. For complex agentic tasks (score of 98.2 for GPT-5.5 vs no DeepSeek model in the agentic top 10), deep reasoning, and cases where every benchmark point counts, GPT-5.5 remains superior.

What legal risks does using V4-Pro pose?

Data sent to the DeepSeek API passes through Chinese infrastructure. For companies subject to GDPR, HIPAA, or US sector-specific regulations, this can be problematic. DeepSeek's non-storage clauses do not constitute a sufficient legal guarantee in certain regulatory frameworks. The local open-source option bypasses this issue.

How to test V4-Pro without risk?

Start with non-critical workloads (log analysis, internal classification, draft generation) in parallel with your current model. Measure quality metrics specific to your use case, not just generic benchmarks. Our monthly comparison of the best LLMs can help you structure your evaluation.

✅ Conclusion

DeepSeek V4-Pro at $0.87/M output tokens is no longer a cheap alternative — it's the new market benchmark. Developers who continue to pay $30/M tokens for GPT-5.5 on coding or RAG workloads must be able to justify every dollar of that gap. The Chinese price war is only just beginning, and the classement des meilleurs LLM will continue to be rewritten every month.

#intelligence-artificielle #guerre-des-llm #deepseek-v4-pro #baisse-prix-ia #modeles-langage #tarification-llm

📚 Related articles

LLM & Modèles 🟢 Débutant 12 min

July 17: Gemini 3.5 Pro and Shanghai's WAIC collide — the day AI officially goes bipolar

On July 17, 2026, the Gemini 3.5 Pro launch and Shanghai WAIC illustrate two opposing visions. Discover this key day for AI.

2026-07-14 17:03

LLM & Modèles 🟢 Débutant 14 min

GPT-Live : OpenAI launches full-duplex voice — AI agents can finally listen and speak at the same time

OpenAI launches GPT-Live with full-duplex voice. Discover how AI agents can finally listen and speak at the same time.

2026-07-13 15:04

LLM & Modèles 🟢 Débutant 11 min

Meta Muse Spark 1.1 : Meta launches its first paid model and enters the agentic coding battle

Discover Meta Muse Spark 1.1, Meta's first paid model. The giant enters the agentic coding battle and changes strategy.

2026-07-11 15:02

📑 Table of contents