CoreWeave Kubernetes Service

Our managed Kubernetes environment is purpose-built for building, training, and deploying AI applications.

Designed for generative AI

Up to 65% of effective compute capacity embedded in GPUs is lost to system inefficiencies. At CoreWeave, every element of our stack is intentionally built around generative AI. CoreWeave Kubernetes Service (CKS) lies at the heart of that.

Kubernetes on bare metal

We’ve removed the hypervisor layer entirely, meaning your teams will work with bare metal Nodes for optimal node performance, lower latency, better observability, and faster time to market.

Preconfigured clusters for AI

Free your teams from spending countless hours managing complex Kubernetes clusters. CKS Clusters come pre-installed and with pre-configured components.

That includes network and storage interfaces, GPU drivers, Slurm-on-Kubernetes, and Observability plug-ins for out-of-the-box production use on day one.

Tightly integrated with AI workload orchestration tools

CKS is built to natively integrate with workload orchestration tools like Slurm, KubeFlow, and KServe to help your developers focus on what they do best: innovating.

Industry-leading performance, scale, and resilience

Spin up GPU superclusters in an environment built for AI workloads, with ultra-low latency, high-speed interconnect, and “human-in-the-loop” automation for top-tier performance.

Get maximum performance from your GPU Nodes

See 20% higher GPU cluster performance with CKS than alternative solutions, including 5x faster model download speeds and 10x faster spin up times for inference.

CKS clusters use bare-metal Nodes with NVIDIA BlueField DPUs for offloading Node and resource management processes. That gives you high performance from your GPUs during model training, experimentation, and inference.

Supercomputer level scale and performance

Powered by NVIDIA Infiniband with SHARP, the industry’s best cluster scale-out interconnect and purpose-built cloud storage services. CKS supports scaling across clusters with 100k+ GPUs while delivering cutting-edge performance.

Reliability and resilience

CKS is deeply integrated with Mission Control—our collection of cluster health management tools and services.

Experience 50% fewer interruptions per day on CoreWeave Cloud(source note) with little to no fleet management overhead.

Purpose-built for AI at every layer

Discover how CoreWeave helps customers get their models to market faster, improve performance for their inference compute, and lower the total cost of ownership.

Watch the video

Enterprise-grade security and observability

Trusted by leading AI Labs and enterprises, CKS provides the enterprise-grade security and observability solutions you need to run your mission-critical workloads.

With precise visibility into what’s going on in your clusters, bounce back from workload interruptions quickly and maximize cluster utilization.

Securely connect via Virtual Private Cloud (VPC)

Create isolated CKS clusters with compute and storage resources using VPC networking and encryption support to manage your cloud resources—powered by NVIDIA BlueField DPUs.

Granular observability to pinpoint troubleshooting

Traditional virtualized cloud environments provide limited visibility into infrastructure issues.

CoreWeave’s approach provides cutting-edge observability tools that provide real-time insights into detailed cluster, Node, and job-level metrics.

Plus, CKS is complemented by intelligent monitoring that identifies and removes problem Nodes before they can disrupt workloads.

Nip interruptions in the bud

Automated, proactive health-checking continuously runs on idle Nodes, identifying patterns for potential hardware issues and swapping out problem Nodes before they impact your workload.

Your teams directly benefit from our learnings and experience managing some of the industry’s largest GPU deployments.

A full stack of solutions

CKS was made to support developers with AI workloads. That’s why CKS leverages a holistic tech stack that makes building and deploying AI applications faster, easier, and more cost-efficient.

See the power of SUNK

SUNK runs Slurm on CKS, letting you easily run Slurm jobs and containerized workloads on the same cluster. That gives you better workload fungibility and greater resource utilization.

Learn more →

Cut time with Tensorizer

Never waste time waiting for models to load. Tensorizer accelerates model loading times in your CKS Nodes by serializing AI models and their tensors into a single file and streaming them from HTTPS or S3 endpoints.

Learn more →

Do more with Mission Control

CoreWeave Mission Control ensures CKS cluster readiness at delivery. Comprehensive monitoring tracks the health of all infrastructure components, enabling optimal cluster performance and resiliency.

Learn more →

Start building on CKS today

Don’t settle for a Kubernetes platform built for web applications. Use a platform made for AI.

Get started

CoreWeave Kubernetes Service

Designed for generative AI

Kubernetes on bare metal

Preconfigured clusters for AI

Tightly integrated with AI workload orchestration tools

Industry-leading performance, scale, and resilience

Get maximum performance from your GPU Nodes

Supercomputer level scale and performance

Reliability and resilience

Purpose-built for AI at every layer

Enterprise-grade security and observability

Securely connect via Virtual Private Cloud (VPC)

Granular observability to pinpoint troubleshooting

Nip interruptions in the bud

A full stack of solutions

See the power of SUNK

Cut time with Tensorizer

Do more with Mission Control

Start building on CKS today

Products

Solutions

AI Infrastructure

Why CoreWeave

Resources

About