- Fast, flexible infrastructure for optimal performance
CoreWeave is a unique, Kubernetes-native cloud, which means you get the benefits of bare metal without the infrastructure overhead. We do all of the heavy Kubernetes lifting, including dependency and driver management and control plane scaling so your workloads just...work.
- Superior networking architecture, with NVIDIA InfiniBand
Our HGX H100 distributed training clusters are built with a rail-optimized design using NVIDIA Quantum-2 InfiniBand networking supporting in-network collections with NVIDIA SHARP, providing 3.2Tbps of GPUDirect bandwidth per node.
- Easily migrate your existing workloads
CoreWeave is optimized for NVIDIA GPU accelerated workloads out-of-the-box, allowing you to easily run your existing workloads with minimal to no change. Whether you run on SLURM or are container-forward, we have easy to deploy solutions to let you do more with less infrastructure wrangling.
The NVIDIA HGX H100 is here, and so are supercomputer instances in the cloud.
Want to get your hands on the most powerful supercomputer for AI and Machine Learning?
You’ve come to the right place.
Available at supercomputer scale, starting at $2.23/hr.
The NVIDIA HGX H100 is designed for
large-scale HPC and AI workloads
7x better efficiency in high-performance computing (HPC) applications, up to 9x faster AI training on the largest models and up to 30x faster AI inference than the NVIDIA HGX A100. Yep, you read that right.
What’s inside a CoreWeave Cloud HGX H100 Instance?
2x Intel 4th Gen Xeon Scalable 8462Y+ CPUs (128 vCPU)
1 TB DDR5 System RAM
3200 Gbps of GPUDirect InfiniBand Networking (8x 400 Gbps InfiniBand NDR Adapters)
100 Gbps Ethernet Networking
- HGX H100 for Model Training
Tap into our state-of-the-art distributed training clusters, at scale
CoreWeave's HGX H100 infrastructure can scale up to 16,384 H100 SXM5 GPUs under the same InfiniBand Fat-Tree Non-Blocking fabric, providing access to a massive scale of the world's most performant and deeply supported model training accelerators.
Our infrastructure is purpose built to solve the toughest AI/ML and HPC challenges. You gain performance and cost savings via our bare-metal Kubernetes approach, our high capacity data center network designs, our high performance storage offerings, and so much more.
- HGX H100 NETWORK PERFORMANCE
Avoid rocky training performance with CoreWeave’s non-blocking GPUDirect fabrics built exclusively using NVIDIA InfiniBand technology.
CoreWeave’s NVIDIA HGX H100 supercomputer clusters are built using NVIDIA InfiniBand NDR networking in a rail-optimized design, supporting NVIDIA SHARP in network collections.
Training AI models is incredibly expensive and our designs are painstakingly reviewed to make sure your training experiments leverage the best technologies to maximize your compute per dollar.
- HGX H100 Deployment Support
Scratching your head with on-prem deployments? Don’t know how to optimize your training setup? Utterly confused by the options at other cloud providers?
CoreWeave delivers everything you need out of the box to run optimized distributed training at scale, with industry leading tools like Determined.AI and SLURM.
Need help figuring something out? Leverage CoreWeave’s team of ML engineers at no extra cost.
- HGX H100 for Inference
Highly configurable compute with responsive auto-scaling
No two models are the same, and neither are their compute requirements. With customizable configurations, CoreWeave provides the ability to “right-size” inference workloads with economics that encourage scale.
- HGX H100 Storage Solutions
Flexible storage solutions with zero ingress or egress fees
Storage on CoreWeave Cloud is managed separately from compute, with All NVMe, HDD and Object Storage options to meet your workload demands.
Get up to 10,000,000 IOPS per Volume on our All NVMe Shared File System tier, or leverage our NVMe accelerated Object Storage offering to feed all your compute instances from the same storage location.