In short
- DeepSeek made its 75% V4-Pro discount permanent on May 22, closing the output at $0.87 per million tokens.
- Xiaomi cut the price of MiMo-V2.5 up to 99% on May 26, and the current reserve price is $0.0036 per million tokens of the Pro version.
- OpenAI’s GPT-5.5 has doubled issuance prices to $30 per million tokens at launch, and Anthropic’s Claude Opus 4.7 shipped with an updated tokenizer that can raise real-time revenue by up to 35%.
DeepSeek has 75% off on DeepSeek V4-Prowhich was supposed to expire, fixed earlier this week. And now Xiaomi’s Chinese AI lab has reduced MiMo-V2.5 API rates by up to 99% for stored inputs. Two of the most advanced AI technologies on the market just got a lot cheaper, while American labs moved elsewhere.
A quick explanation for the non-developers in the room: When you use ChatGPT or Claude in a browser, you either pay a subscription fee or you don’t. When a company creates a product on top of an AI model, they pay for each token, where the token is about three-thirds of the word. Every message sent, every reply made, every document edited: everything adds up to a level measured in millions of tokens.
An API is the raw pipe that makes this possible, enabling an application, agent, website, etc. to use the model in their own environment. So token prices determine whether an AI-powered product is financially viable or a money pit.
The Token plan is a notepad on top of it. You buy credit up front; example he eats in them. Xiaomi’s payment update gives users 5 to 8 tokens for the same price. The Max plan at $100 now gets you 82 billion tokens, up from 1.6 billion.
In terms of content, 82 billion tokens are over 60 billion words.
Why the bars are real, not commercial
Fuli Luo, the leader of Xiaomi’s MiMo team and a former DeepSeek expert who co-built DeepSeek-V2, published a technical report on X. The high storage comes from a clever way to store and reuse many of the AI’s already processed data. Instead of repeating, Xiaomi’s machine can store a lot of memory at the same time – about five times more than before. This means AI needs less computing power, cutting storage and processing costs by around 80%.
Behind the MiMo API Cost Reduction:
The highest value, up to 99%, is for Input (Cache Hit). The main reason is that our control system now supports KV cache optimization for SWA. Display engine testing shows this optimization increases the saved signal…— Fuli Luo (@_LuoFuli) May 27, 2026
“We are working on the API prices that have just been lowered, our production engine is getting very close, and we can break even,” Luo said. “If more architectures that store computations and KV (Key-Value cache) come out, and better infrance to reduce the cost of the API, this will create the best solution in the market.”
DeepSeek architecture it falls in the same place on the contrary. V4 uses two types of interactive controls—one pressing every four tokens for attention, the other collapsing every 128 tokens of the global standard for low-level calculations. At one million images, the V4-Pro’s KV cache is 10% larger than its predecessor, and the single-signal resolution is 27% of the previous value.
The result is 98% cheaper than the competing GPT-5.5 Pro.
Silicon Valley’s bet
Claude Opus 4.7 costs $5 per million input tokens and $25 per million output tokens. Anthropic kept the value card but shipped it with a new tokenizer that it can generate up to 35% other symptoms for the same words. So the price didn’t go up. Your bill may still be due.
GPT-5.5, was released at the end of Aprilit just increased its fixed price to $30 per million tokens. Gemini 2.5 Pro sits at $1.25 for input and $10 for output—cheap by American standards.
DeepSeek V4-Pro is a 1.6 trillion sample that gives you large sample data at an affordable price. It now runs at a steady $0.435 entry and $0.87 exit per million tokens. This is the version that scored 80.6% on SWE-Verified against Claude Opus 4.6’s 80.8% – a benchmark measuring real GitHub issues, not selected demos. Price difference between brands with similar specs: 34x in output.
MiMo V2.5 Pro corresponds to $ 0.435 / $ 0.87 per million tokens after the new cut. The cache hit drops to $0.0036. In terms of content, it’s cheaper per token than most people pay for a person per SMS.
DeepSeek and Xiaomi are not alone
These bars reached the market where there were Chinese brands already very expensive before this happened. MiniMax M2.7, which sells boxing with Claude Opus on coding benchmarks on Artificial Analysis, costs $ 0.30 to enter and $ 1.20 to issue per million tokens – about 5% of the amount of Opus 4.7.
Kimi K2.5 from Moonshot AI, with 76.8% on SWE-bench Verified, runs $0.60 input and $2.50 output. The GLM-5.1 from Z.AI beat the Claude Opus 4.6 on an important benchmark earlier this quarter. Four Chinese frontier models were shipped in a 12-day window in early May, all for less than a third of the price of the Opus 4.7.
For clarity, this chart shows how the Chinese brands stack up against the three most popular American AI providers (Anthropic, OpenAI, and Meta) in terms of price range and quality.

The difference in Q2 2026 between the Chinese and American models ranges from 15–30x, depending on which models you’re comparing—and that’s the base, before taking out savings.
What this week’s cuts do is the fall that creates the number of jobs that work: pipelines of agents with constant incentives, document processors, return tools, things that always hit. At $0.003625 per million input tokens, DeepSeek V4-Pro’s value for iterations is a round error.
Daily Debrief A letter
Start each day with top stories right here, including originals, podcasts, videos and more.





