Since CoreWeave’s inception we have consistently been the first to move and the first to scale across multiple generations of AI-native cloud infrastructure, and that has repeatedly resulted in optimized and leading performance for our customers. That’s why pioneers like Cursor AI, Cline, Mercado Libre, and 8 of the 10 leading foundation model providers rely on CoreWeave. This leadership was reflected in MLPerf 5.0 inference benchmarks, and continues in MLPerf 6.0.
Excellence isn’t a single act—it’s the result of consistent, repeated actions, over time, that deliver exceptional results.
When CoreWeave leads, it doesn’t stand still. In MLPerf 5, CoreWeave demonstrated that we're first to move and first to scale, resulting in leading NVIDIA GB200 NVL72 inference performance. That lead continues in MLPerf 6.0, where CoreWeave remains the top performer for DeepSeek R1.
CoreWeave moves first, scales first, and delivers optimized performance—consistently reflected across MLPerf benchmarks.
CoreWeave doubles NVIDIA GB300 NVL72 MLPerf performance
The NVIDIA GB300 NVL72 is a rack-scale system built for AI reasoning with 1.5x more dense NVFP4 Tensor Core FLOPS and 2x higher attention performance than NVIDIA Blackwell GPUs15. Since deploying GB300 NVL72 systems in production last year, CoreWeave has continued to refine the throughput performance. Using DeepSeek R1, CoreWeave demonstrated faster inference throughput, improving by 1.6x in offline mode and 2x in server mode from MLPerf 5.1 over the last 6 months.

With GPT-OSS-120B, CoreWeave delivered high throughput inference performance in both offline and server modes with NVIDIA GB300 NVL72, demonstrating up to 28% higher inference throughput compared to NVIDIA GB200 NVL72. For DeepSeek R1, CoreWeave inference throughput in offline mode for the NVIDIA GB300 NVL72 was the highest of all submitters.
CoreWeave demonstrates the performance benefits of NVIDIA GB300 NVL72 vs NVIDIA GB200 NVL72
The NVIDIA GB300 NVL72 leverages the same architecture as the NVIDIA GB200 NVL72 but has more memory and other performance improvements. CoreWeave’s MLPerf results demonstrate these performance improvements with a 14% improvement compared to the GB200 in server mode, and 34% improvement in offline mode.

CoreWeave continues MLPerf NVIDIA GB200 NVL72 inference leadership
CoreWeave was the first AI-native Cloud to offer the NVIDIA GB200 NVL72 in production at scale, and the NVIDIA GB200 NVL72 benchmark leader in MLPerf 5.0. That leadership continued in MLPerf 6.0, for example by leading all submitters in DeepSeek R1 inference benchmarks in both server and offline mode10 ,11.

From MLPerf to real-world inference products
CoreWeave Inference is architected to support every level of operational need, giving teams a clear path from fully managed serverless to fully self-managed infrastructure as their requirements evolve.
For teams who want to move fastest without managing any infrastructure, CoreWeave offers Serverless Inference through Weights & Biases (W&B) Inference. Teams can deploy and iterate instantly with autoscaling, high availability, and integrated observability out of the box, while pay-per-token pricing keeps costs aligned with actual usage as workloads grow.
CoreWeave recently announced Dedicated Inference which provides lifecycle-supported inference that lets customers bring their own weights, pick GPUs and open runtimes, while CoreWeave handles production operations and cost visibility. Currently in preview, Dedicated Inference is designed to keep performance behavior and cost drivers predictable as workloads scale. AI pioneers can request preview access to Dedicated Inference and experiment with their own inference workloads.
CoreWeave recently announced Dedicated Inference which provides lifecycle-supported inference that lets customers bring their own weights, pick GPUs and open runtimes, while CoreWeave handles production operations and cost visibility. Currently in preview, Dedicated Inference is designed to keep performance behavior and cost drivers predictable as workloads scale. AI pioneers can request preview access to Dedicated Inference and experiment with their own inference workloads.
For those who prefer to have more control, Inference on CKS (CoreWeave Kubernetes Service) enables developers to self-host their models, runtimes, and orchestration. This provides the ultimate level of infrastructure control, tuning, and end-to-end observability, ensuring their stack is aligned with the required performance and security.
Because these inference paths share a consistent architectural base, teams can move from iteration to scaled production without replatforming or introducing cost ambiguity. Performance behavior, scaling dynamics, and cost drivers remain visible and aligned to underlying infrastructure as AI applications evolve.
Results at a glance:
- CoreWeave doubled the DeepSeek R1 server mode throughput performance from MLPerf 5.1 with the NVIDIA GB300 NVL72 in just 6 months.
- CoreWeave led all submitters in DeepSeek R1 throughput performance, measured in tokens per second per GPU, for the NVIDIA GB300 NVL72 in offline mode.
- CoreWeave led with the highest tokens per second per GPU with NVIDIA GB200 NVL72 for DeepSeek R1 among submitters
- CoreWeave achieved the highest increase with more than 28% higher throughput from NVIDIA GB200 NVL72 to NVIDIA GB300 NVL72 with GPT-OSS-120B in server mode.
Learn more about how CoreWeave can support your AI Inference needs.




%20(1).avif)
.avif)