CoreWeave Achieves #1 Ranking for Inference Speed and Price-Performance for Moonshot AI’s Kimi K2.6 Model in Independent Benchmark

Full stack optimization across memory architecture, runtime, and interconnect translates into the speed and economics enterprises need to run open-source AI in production
Output Speed
Output tokens per second · Higher is better · 10,000 Input Tokens
Artificial Analysis
Accurate as of 5/11/2026
205
158
125
95
80
78
62
48
44
38
22
CoreWeave
Clarifai
Azure
Cloudflare
Fireworks
SiliconFlow (FP8)
Novita
Kimi
Together.ai (FP4)
DeepInfra (FP4)
Parasail

LIVINGSTON, N.J. — May 11, 2025 — CoreWeave, Inc. (Nasdaq: CRWV), The Essential Cloud for AI™, today announced it has achieved the strongest combination of speed and price-performance1 for Moonshot AI’s Kimi K2.6 in independent inference benchmarking conducted by Artificial Analysis. Across 11 inference providers evaluated on the current top open-source model, CoreWeave simultaneously delivered the highest output speed at the most cost-efficient performance level measured.

As AI applications move from training into production, inference efficiency increasingly determines real-world product viability. For organizations running the full AI loop from training to inference to continuous improvement,  throughput, latency, and cost per request directly shape how reliably and economically AI can scale in the real world. This is especially significant  where performance is non-negotiable, like  coding assistants, agentic systems, and real-time enterprise copilots.

“Training launched the first wave of AI, and inference will define the next one. That’s why the effectiveness and economics of inference are becoming critical to organizations bringing AI into the products people use every day,” said Chen Goldberg, Executive Vice President of Product and Engineering at CoreWeave. “This benchmark reflects the investments we’ve made across our full stack, and the deep expertise of CoreWeave engineers in optimizing performance and efficiency. This is a clear signal that speed, responsiveness, and predictable economics are attainable for customers today.”

"Performance gains in inference systems come from optimization across the full stack, including hardware, inference runtime and model configuration,” said George Cameron, Co-founder at Artificial Analysis. “Artificial Analysis benchmarks are intended to give organizations transparency in how inference offerings perform. CoreWeave performed strongly across speed and price-performance dimensions in our benchmarking of providers of Kimi K2.6. For those deploying agents in production, inference speed and price are critical to user experience and to making open source models a viable choice at scale."

The gap between theoretical compute capacity and actual production throughput is influenced by how well hardware, model optimization, and runtime execution are tuned together. CoreWeave has optimized its platform across all three layers. 

The benchmark result, as validated by this Artificial Analysis benchmark, reflects the company's investment in full stack infrastructure optimization for production AI workloads. CoreWeave Inference and Applied Training teams achieved top speed by training an in-house NVFP4 Quantization with Eagle3 Speculative decoding on Nvidia GB300 NVL72 hardware delivering 205 token/sec at $0.7 per million tokens blended (7:2:1 agentic blend) price. Teams can access this performance directly through CoreWeave Inference offerings:

  • Serverless Inference, which provides immediate API access to optimized models with no infrastructure to manage.
  • Dedicated Inference, which provides a predictable path to production with explicit control over the number of GPUs for the required scale, while all inference services are still managed by CoreWeave.
  • Inference on CoreWeave Kubernetes Service (CKS), which means developers can work with direct, bare-metal access to AI infrastructure, allowing for deep control over the entire stack.

Artificial Analysis is an independent platform that benchmarks and analyzes AI models, API providers, and infrastructure. It provides data on model quality, speed, cost, and reliability, helping users (developers/enterprises) compare and select AI technologies. Artificial Analysis independently benchmarked Moonshot AI’s Kimi K2.6 by testing its performance across 10+ core metrics – including MMLU-Pro, GPQA, and agentic coding tasks –to evaluate speed, cost, and reasoning capability. 

The Artificial Analysis result is the latest in a series of independent validations of CoreWeave. The company is the only AI cloud to earn the top Platinum ranking in both SemiAnalysis ClusterMAX™ 1.0 and 2.0, which evaluate AI cloud performance, efficiency, and reliability, and also demonstrated record-breaking MLPerf® benchmark results

Learn more about CoreWeave’s recognition on our blog or on Artificial Analysis’s website.

1Price performance is measured in speed vs. price by Artificial Analysis

About CoreWeave

CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to move at the pace of innovation, building and scaling AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave serves as a force multiplier by combining superior infrastructure performance with deep technical expertise to accelerate breakthroughs. Established in 2017, CoreWeave completed its public listing on Nasdaq (CRWV) in March 2025. Learn more at www.coreweave.com.

Media Contacts

CoreWeave Media
[email protected]

More press releases

AI Inference,
No items found.

Heading

5 min read