Event details
How AI Training Infrastructure Must Evolve
What once ran on small clusters now spans thousands of GPUs across multi-day runs. However, most teams still manage that complexity across two disconnected environments: Slurm for research and Kubernetes for production.
That split worked early on, but at today's scale, it creates overhead that slows model progress, increases errors, and reduces GPU utilization.
That's why CoreWeave built SUNK: a unified AI training infrastructure designed to eliminate the research-to-production gap.
In this 30-minute briefing, we’ll explore what SUNK is today, how guided self-service deployment helps standardize adoption, and how SUNK Anywhere extends that same unified operating model across infrastructure environments.
In this briefing, we’ll cover:
- Why Slurm-based research and Kubernetes-based production infrastructure fragments at scale
- An introduction to SUNK and how it delivers a more unified way to run demanding AI workloads
- How guided self-service deployment supports faster, more predictable, and more standardized cluster bring-up
- How SUNK Anywhere extends the same unified training system across on–prem, hybrid, and cloud environments
- When teams should evaluate SUNK based on workload scale, infrastructure consolidation needs, and readiness for a more standardized operating model


