Tokens are too expensive; Chinese open-source models dominated the charts overnight.

2026-02-27 11:35:59

Data from OpenRouter, a leading global AI model API aggregation platform, shows that by February 2026, the call volume for Chinese AI models surged by 127% over three weeks, surpassing US models for the first time, with four of the top five models coming from China.

In the latest weekly ranking from February 16 to 22, four of the top five models by platform calls are from Chinese vendors, including MiniMax’s M2.5, Kimi K2.5 from Yue Zhi An Mian, GLM-5 from Zhipu, and V3.2 from DeepSeek. These four models account for 85.7% of the total calls in the Top 5.

A year ago, Chinese models held less than 2% of the platform’s market share.

Image: Stacked bar chart of weekly token totals from November 2024 to November 2025. Dark red = closed-source models, orange = Chinese open-source models, teal = open-source models from other regions. Clearly visible is the rise of Chinese open-source models (orange) from nearly invisible to nearly 30%.

OpenRouter aggregates API calls for over 300 models from more than 60 vendors, with over half of the usage coming from outside the US. Developers can switch instantly between models with a single API key; if one isn’t satisfactory, they can switch in seconds. Token data on OpenRouter reflects market voting almost in real time.

Recently, the platform experienced the explosive popularity of open-source personal agent frameworks like OpenClaw and the Chinese New Year AI battles, with nearly all model keywords shifting to “Agentic.”

Over the past two years, the core narrative of large model competition has been capabilities: who is smarter, who scores higher on benchmarks like ARC and SWE-Bench, and who is closer to AGI. Parameters, reasoning depth, and complex task completion rates have been the main industry metrics.

However, after the 2026 Spring Festival, the core narrative for Agentic AI is about continuous task completion—coding, debugging, tool calling, reading files, and iterative improvements. Token consumption has shifted from “human-machine dialogue” to “machine self-loop.” A single task can consume hundreds of thousands or millions of tokens.

Differences between models are increasingly about lower unit costs, greater stability, and smoother reasoning curves in long workflows, high-frequency calls, and extended contexts.

Image: OpenClaw is the largest single application on OpenRouter, accounting for a significant proportion of token consumption.

Image generated by AI

The underlying logic of token consumption has changed

The “2025 AI Usage Report” jointly released by OpenRouter and a16z covers over 1 quadrillion tokens of anonymous metadata. A key insight: the share of programming tasks in token usage skyrocketed from 11% at the start of 2025 to over 50%, becoming the platform’s largest single category. Meanwhile, agent-driven workflows (models autonomously executing multi-step tasks) now produce over half of the total output tokens.

Image: The proportion of programming requests in all LLM queries, rising from about 11% at the start of 2025 to over 50%.

In the past, Q&A AI consumed hundreds to thousands of tokens per conversation, and if users stopped asking questions, token consumption stopped. In agent mode, machines can run continuous background processes.

For example, OpenClaw’s token consumption roughly follows three patterns:

Multi-round self-correction: a programming task may go through dozens of rounds—“write code → run → error → modify → rerun”—each round involving a full model call.
Infinite context expansion: to let the agent “remember” previous actions, each call carries the entire conversation history. Users have measured that active sessions’ context can quickly grow beyond 230,000 tokens.
Toolchain chaining: an agent handling a “sort emails and create to-do list” task may trigger 5-10 API calls, each with the full context.

Some OpenClaw users complain that misconfigured automation tasks can burn through $200 in API fees per day. More straightforwardly, running OpenClaw 24/7 with Claude API costs between $800 and $1,500 per month.

Looking at OpenRouter’s data: in the week of February 9, the platform processed 130 trillion tokens, double the 64 trillion tokens in the first week of January. The latest weekly total is 121 trillion tokens, 12.7 times a year earlier.

AI usage has shifted from “dialogue-based” to “flow-based,” with token consumption moving from “per interaction” to “per flow.” Cost sensitivity has been sharply amplified.

Opportunities Behind Agent Model Combinations

Assuming an agent runs 24/7, consuming billions of tokens daily, price differences become a matter of life and death.

Current mainstream model API pricing comparison (per million tokens, USD):

Claude 4.6 Sonnet costs about $15 per million tokens for output, while MiniMax M2.5’s typical output price is around $1.10 per million tokens—about 13.6 times cheaper. GPT-5.2’s output price is $14 per million tokens, nearly 12.7 times that of MiniMax. Even the price-increased Zhipu GLM-5, at about $2.55 per million tokens, is only roughly one-sixth of Claude’s cost.

In agent scenarios, this gap is exponentially magnified. Suppose a production agent processes 1 billion output tokens daily (equivalent to 1,000 million units). Using Claude costs about $15,000 per day; using MiniMax, roughly $1,100. Over 30 days, that’s nearly $450,000 versus about $33,000—a difference of over $400,000.

This price gap already influences developer choices in real projects.

A European studio using OpenClaw disclosed their approach: 80% of daily reasoning uses Kimi K2.5, while the remaining 20%—for complex reasoning and system architecture—are outsourced to Claude via bash commands. Daily cost for Kimi is about $5–$10, with monthly token budgets of $150–$300. If they used Claude API for everything, monthly costs could reach $800–$1,500 or more.

The “80% capability, 20% cost” combo vastly outperforms the “100% capability, 100% cost” plan in practical deployment.

In late 2022, a16z partner Martin Casado revealed in an interview with The Economist that about 80% of AI startups using open-source models are running Chinese models. He later clarified on X that this refers to the “portion of startups using open-source models,” which accounts for 20–30% of all startups. Roughly, 16–24% of US AI startups embed Chinese open-source models in their tech stacks.

OpenRouter COO Chris Clark said more directly: Chinese open-source models’ weights are “unusually prevalent” in US enterprise agent workflows.

Architectural Competition, “Agent-Native”

In this paradigm shift, nearly all top Chinese open-source model players have made “Agentic” their main focus, designing architectures and training pipelines natively adapted for agent scenarios.

They continue the MoE + MLA approach from the previous phase—scaling parameters, activating only small parts per inference to maintain capability while controlling token costs.

But cheapness alone isn’t enough; the real differentiator is “performing well” in agent scenarios.

MiniMax developed Forge, a native agent reinforcement learning framework. Its core design decouples agent execution logic from the underlying training and inference engine: the agent handles task execution and produces trajectory data, while the training engine learns solely from these trajectories. This architecture can integrate with any agent framework. MiniMax reports large-scale RL training on hundreds of thousands of real agent environments, extending context length to 200K.

Two technical details in Forge are noteworthy: first, “prefix tree merging.” Multi-round requests have many overlapping context prefixes. Traditional methods treat each request as an independent sample, recomputing shared prefixes. Forge reconstructs training samples into a tree structure, sharing prefixes and reducing computation—speeding up training by about 40 times.

Second, reward design: besides task completion, M2.5’s RL also uses “task completion time” as a reward signal, directly incentivizing models to choose shortest paths and utilize parallelism. According to MiniMax, M2.5 running SWE-Bench Verified end-to-end takes 22.8 minutes—37% faster than M2.1 at 31.3 minutes, and comparable to Claude Opus 4.6 at 22.9 minutes. Running continuously for an hour (100 TPS) costs $1; MiniMax claims “$10,000 can keep four agents working continuously for a year.”

Kimi K2.5 supports agent clusters, capable of dynamically scheduling up to 100 “clones” for different roles, working in parallel and handling up to 1,500 steps simultaneously. In large-scale search scenarios, agent clusters reduce key steps by 3–4.5 times, with maximum runtime speedups of 4.5x.

K2.5 is designed as a native multimodal, agent-oriented model, supporting visual and text inputs, thinking and non-thinking modes, dialogue and agent tasks, with comprehensive architecture adaptation.

These innovations show that top Chinese models are no longer just “cheap.”

Closed-source models like Anthropic and OpenAI are black boxes; developers cannot evaluate long-term cost curves or optimize local deployment. But Claude’s strengths lie in productization, computer use, artifacts, MCP ecosystem, and high-precision complex reasoning.

Image: Breakdown of programming token share among closed-source, Chinese open-source, and other open-source models. Anthropic Claude has long dominated with over 60% in programming, but Chinese open-source models and others have been steadily eroding its share since late 2025.

Competitive advantage is now differentiated: Chinese open-source models are transparent, replicable, and cost-effective for large-scale deployment; US closed-source models excel in productization and complex reasoning accuracy.

The Agent era has brought structural dividends to Chinese open-source models.

Price wars are over; demand wars begin

On February 12, the same day Zhipu AI announced a 30%+ price increase for the GLM-5 Coding Plan, removing initial purchase discounts. The international version’s prices increased by 30–60%, API calls by 67–100%.

This marks China’s first major price hike for large models in 2026.

The background is telling. Over the past year, China’s large model market has experienced fierce price wars. ByteDance’s Doubao offered prices as low as 0.0008 yuan per thousand tokens; Alibaba’s Tongyi Qianwen, a GPT-4-level main model, cut prices by 97%; Zhipu itself slashed the previous GLM-4-Plus by 90%.

Now, prices are rising again, and the GLM Coding Plan is sold out immediately. Paid packages for domestic AI coding products are snapped up.

This also raises a question: “Zhipu’s price increase—does it mean Chinese models’ growth is unrelated to price wars?”

The answer isn’t simply “yes” or “no.”

Agentization has caused token demand to explode. Chinese models, with their cost advantage, have gained incremental volume, and price hikes are essentially a supply-demand rebalancing. Zhipu responded: “User scale and call volume are rapidly increasing; the company is investing more in computing power.”

According to media reports, within less than a month after Kimi K2.5’s release on January 27, nearly 20 days’ cumulative revenue already exceeded the total revenue for all of 2025. The core driver is overseas developers and API calls; Kimi’s calls on OpenRouter remain top, directly boosting B-end revenue, with overseas income surpassing domestic for the first time.

MiniMax’s situation is similar: within seven days of M2.5’s release, token usage exceeded 30 trillion; internal data shows that code generated by M2.5 accounts for 80% of the company’s new code submissions.

In the industry, Zhipu, Yue Zhi An Mian, MiniMax, and Jie Yue Xing Chen have all raised some API prices. A research report from Changjiang Securities states that domestic models have “officially entered the demand-driven era.”

The era of price wars is over; the demand war begins.

How much of this data is real?

There are also disputes about whether the token data during this surge is inflated.

For example, the 197% weekly increase of MiniMax M2.5 was largely driven by free promotion of two AI programming tools, Kilo Code and Cline. Starting February 12, Kilo Code offered over 1.5 million developers a week of free access to M2.5; Cline ran similar campaigns.

While free promotion objectively boosts short-term volume, it doesn’t explain long-term trends or retention.

MiniMax M2.5 has achieved or surpassed state-of-the-art in programming, tool invocation, search, and productivity scenarios, such as SWE-Bench Verified (80.2%), Multi-SWE-Bench (51.3%), BrowseComp (76.3%). From benchmark results, it has reached flagship-level scores, no longer just “cheap alternative.”

OpenRouter’s annual data shows Chinese open-source models’ share rose from less than 2% at the end of 2024 to accelerate in mid-2025, approaching 30% in some weeks.

A steadily rising curve, not just a spike from a promotion.

Another detail in the rankings: among the top five, MiniMax, Yue Zhi An Mian, Zhipu, and DeepSeek are products from four different teams. This isn’t just a viral hit; it reflects the maturity of China’s open-source model ecosystem gaining recognition in the international market.

Image: By late 2024, DeepSeek V3 and R1 accounted for over half (deep blue). After mid-2025, the distribution diversified sharply, with models like Qwen, MiniMax, Kimi, GPT-OSS rising in turn, none exceeding 25%.

Final thoughts

The Agent era is rewriting the rules of model competition.

Using combined models in an agent setup enhances cost-effectiveness.

The growth of API aggregation platforms is gradually breaking down traditional vendor entry barriers.

Open-source models have matured to product-level engineering. Global developers embedding Chinese open-source models into production agent workflows are deploying in real business environments—far beyond lab score-chasing.

In early 2026, the paradigm shifted again—from dialogue AI to Agentic AI—Chinese open-source models seized this structural opportunity.

But the change is far from over.

Source: Tencent Tech

Risk warning and disclaimer

Market risks exist; investments should be cautious. This article does not constitute personal investment advice and does not consider individual users’ specific investment goals, financial situations, or needs. Users should evaluate whether any opinions, views, or conclusions herein are suitable for their circumstances. Invest at your own risk.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.