Datadog Integration: In Summary
- The Datadog integration gives CoreWeave users greater visibility into usage metrics, infrastructure stability, and billing on CoreWeave Cloud.
- Thanks to Datadog, CoreWeave users get enterprise-grade reliability and observability for their workloads, which are critical to customers’ AI infrastructure as they build and deploy AI applications in production.
- CoreWeave Cloud users can learn more about the specific metrics and monitoring capabilities from Datadog’s documentation.
CoreWeave is now integrated with Datadog, the monitoring and security platform for cloud applications. This integration allows users to monitor, assess, and optimize their organization’s usage of CoreWeave Cloud, providing enterprise-grade reliability and observability which are essential for building and deploying AI applications in production.
CoreWeave is a premium cloud provider purpose-built for compute-intensive workloads like AI and model training. As a serverless cloud, CoreWeave runs Kubernetes directly on bare metal—so users can deploy containerized workloads with increased portability, less complexity, and lower costs. This enables customers to run workloads with common Kubernetes commands while CoreWeave manages and maintains the underlying infrastructure.
Companies are innovating and deploying AI applications to production faster than ever before. To do this with confidence in their underlying infrastructure, teams need enterprise-level observability. The integration with Datadog provides this critical capability, enabling CoreWeave customers to reliably run AI models in production.
Tracking usage patterns through Datadog gives users more information to better understand and optimize how they use the CoreWeave Cloud platform. Datadog enables users to track environment resource consumption over time and filter metrics using tags of their choice, including tags provided by Prometheus, like pod, container, and namespace. For organizations with multiple namespaces, Datadog enables them to separate out teams’ usage of CoreWeave by namespace.
Datadog’s recommended monitors for this integration allow users to proactively track these metrics and strategically plan how to better utilize resources. Examples of metrics you can use include:
Users can also monitor pods, containers, and other Kubernetes infrastructure within CoreWeave Cloud to better analyze and optimize infrastructure stability and reliability.
The Datadog and CoreWeave integration comes with an out-of-the-box dashboard that helps you understand deployment successes and failures. Users can identify pods or containers that use too many resources or take too much network IO and detect pod failure and potentially bad deployments.
The integration also comes with three recommended monitors to alert users when CPU usage, memory usage, or billing is getting too high.
Datadog also helps provide greater visibility into how organizations are billed and pinpoint where expenses come from within CoreWeave Cloud. Users can detect billing abnormalities and receive alerts if or when those occur, helping teams address changes quickly, and determine which pods or namespaces are the most expensive. With access to the total costs, the hourly cost of GPU, CPU, and memory, users can better estimate costs for future projects.
Billing metrics that can be monitored by Datadog include:
Read Datadog’s documentation to learn more about monitoring usage on CoreWeave today. If you’re new to Datadog, you can sign up for a 14-day free trial, or reach out to our CoreWeave team to learn more about our platform.