Event details
Why your inference budget drifts after you ship to production
AI inference budgets don’t usually drift for the reasons teams expect. What looks like usage growth, model inefficiency, or pricing surprises is often something more fundamental—the four hidden cost categories that show up only in production, and that agentic AI workloads are amplifying right now.
In this webinar, we’ll cover:
- Why production inference budgets drift in ways that don’t show up on any pricing page
- Why agents burn 10–100x more tokens per task than single-turn chatbots — and what that does to the CFO conversation
- How cold starts compound under agentic multi-step reasoning loops — and what mitigations actually cost
- Why tool-call fan-out amplifies traffic spikes from one user request into many backend calls
- How the agent execution graph deepens observability gaps that were already a problem at scale
- How to shift the inference cost conversation from "tokens per dollar" to "performance per dollar at production scale"
Speakers

Tara Madhyastha
CoreWeave
,
Senior Field Engineer


