Chinese AI models surpass American ones in usage

Chinese AI models have surpassed American ones in usage volume. According to OpenRouter data, DeepSeek and MiniMax models have been consuming more tokens than Western competitors since last month. Lower prices are doing their work.

MiniMax and Moonshot charge $2-3 per million output tokens, Claude Sonnet 4.5 charges about $15. For a chatbot that's tolerable, but AI agents burn orders of magnitude more: a simple coding task can require 20 million tokens. At those volumes, a 6x price difference is very noticeable.

One Hong Kong developer tells FT that he used to work exclusively with Claude, but at current volumes that would cost $900 per day. Now he sends 80% of tasks to Chinese Kimi and keeps Claude for complex work. That's $50 instead of $900.

While the AI industry spent most of its money on training models, it was a race for the best benchmark. But when millions of agents generate responses around the clock, the main expense becomes inference — the actual production of tokens. The token is becoming a basic resource, like a kilowatt-hour (remember Altman's famous quote): the winner isn't whoever built the smartest model but whoever provides cheap and stable inference.

Alibaba, by the way, has already created a separate division — Token Hub — under the company CEO's leadership. They want to become the platform through which inference flows — like AWS for cloud computing.