The Hidden Costs of Production Inference

Event details

Tara Madhyastha

Senior Field Engineer

CoreWeave

Jul 29, 2026

11:00 am

July

—

45 minutes

Why your inference budget drifts after you ship to production

AI inference budgets don’t usually drift for the reasons teams expect. What looks like usage growth, model inefficiency, or pricing surprises is often something more fundamental—the four hidden cost categories that show up only in production, and that agentic AI workloads are amplifying right now.

In this webinar, we’ll cover:

Why production inference budgets drift in ways that don’t show up on any pricing page
Why agents burn 10–100x more tokens per task than single-turn chatbots — and what that does to the CFO conversation
How cold starts compound under agentic multi-step reasoning loops — and what mitigations actually cost
Why tool-call fan-out amplifies traffic spikes from one user request into many backend calls
How the agent execution graph deepens observability gaps that were already a problem at scale
How to shift the inference cost conversation from "tokens per dollar" to "performance per dollar at production scale"