Cloudflare’s Workers AI platform officially integrates Moonshot AI’s Kimi K2.5, supporting 256K context windows, multi-turn tool calls, and visual input. Cloudflare’s internal security audit agent processes over 7 billion tokens daily, and switching to this model reduces costs by 77% compared to mid-tier commercial models.
(Background: Cursor used Kimi K2.5 for training but didn’t disclose it; developers captured packets, deleted prompts, and the official quick turnaround was recorded.)
(Additional info: Cloudflare, which helps block web crawlers, launched a “One-Click Whole Site Crawler API” that supports RAG, incremental updates, and model training.)
Table of Contents
Toggle
Cloudflare’s Workers AI platform quietly made a major move. According to the official Cloudflare blog, they set Moonshot’s Kimi K2.5 as the default model for the Agents SDK starter. Cloudflare engineers are also using it for real security audits, saving a lot of costs.
Kimi K2.5 is one of the few open-source models supporting “cutting-edge specifications,” including 256K context windows, multi-turn tool calling, visual inputs, and structured outputs. For long-text reasoning agent tasks, these features are quite practical.
Cloudflare engineers directly used Kimi K2.5 as the main programming agent in the OpenCode environment, deploying a public code review agent called “Bonk” integrated into automated pipelines.
Even more impressive is the internal security audit scenario. This agent handles over 7 billion tokens daily. Using standard-tier commercial models for the same workload would cost about $2.4 million per year. With Kimi K2.5, costs are cut by 77%, saving nearly $1.85 million.
This isn’t advertising—it’s a figure directly shared by Cloudflare engineers on their official blog.
Just changing the model isn’t enough. Cloudflare also rolled out three platform-level improvements aimed at reducing costs and increasing efficiency for long conversation scenarios:
Cloudflare didn’t use off-the-shelf inference frameworks. Instead, they built a customized core with their own Infire inference engine, employing data parallelism, tensor parallelism, and expert parallelism, combined with a separated prefix processing architecture.
Currently, Kimi K2.5 is the first large model inference case on Workers AI, demonstrating Cloudflare’s ambition in AI infrastructure—integrated with web platforms and cost-effective.