GLM 5.2, the newest open-weight model from Z.ai, is available now on CoreWeave Inference. Building on the recent optimizations we delivered for Kimi K2.7 Code, our engineering teams have tuned GLM 5.2 and once again achieved one of the highest rankings for inference speed and price-performance on Artificial Analysis. That means you can start building with GLM 5.2 immediately through a fully managed endpoint, and get fast, cost-efficient inference with no infrastructure to provision or inference stack to manage.
GLM 5.2 marks a new era for open-weight models
For years, the most capable AI models were available primarily through proprietary APIs. Open models gave teams more flexibility and control, but they often lagged behind on capability. A growing set of leading open-weight models—including GLM 5.2, Kimi K2.7 Code, and others served by CoreWeave—now hold their own on the benchmarks once led by proprietary endpoints. For engineers, researchers, and enterprises, that expands the toolkit: open-weight models can be tuned and run as part of core workflows, alongside the proprietary models many teams also depend on, with greater control over deployment, governance, and infrastructure.
Released on June 16, 2026, GLM 5.2 is one of the most capable open-weight models available. It currently scores 51 on the Artificial Analysis Intelligence Index and ranks as the top open-weight model. It ships with a longer context window, up from 200,000 in GLM 5.1, and a permissive MIT license that allows self-hosting, fine-tuning, and commercial use.
GLM 5.2’s strongest results come in the areas drawing the most attention from developers: coding, software engineering, and long-horizon agent workflows. On FrontierSWE, a benchmark designed to evaluate real-world software engineering tasks, GLM 5.2 edges out GPT-5.5 and trails only Claude Fable 5 and Claude Opus 4.8, the top-ranked models. Early reactions from developers have reflected those results, with many highlighting its ability to generate production-ready applications, work across large codebases, and support more autonomous software engineering workflows.
Bringing frontier open models to production faster
CoreWeave Inference gives teams rapid access to the strongest open-weight models, with the performance and reliability production workloads demand. GLM 5.2 joins a growing catalog of advanced open-weight models on CoreWeave, and teams can access it through a managed endpoint, scale usage as demand changes, and focus on building applications rather than operating clusters.
What does that look like in practice? A team evaluating GLM 5.2 for agentic development can start the way most teams do: call the model through an online endpoint, compare it against their current baseline, and decide whether it earns a place in their workflow. That part is easy anywhere. The harder question is what happens next. Moving a model from "promising in testing" to "running in production" is usually where the work starts: matching hardware to the model, tuning the serving stack, and holding performance steady under real traffic. On CoreWeave, that path is already built. The same endpoint a team evaluates on is backed by high-performance hardware, an optimized inference stack, and CoreWeave's model-level tuning, so strong performance is the default rather than something the team has to engineer. And because the weights are open, they keep the option to fine-tune or self-host later without re-platforming.
Built for more than just an endpoint
Providing quick and easy access to a model at launch is the baseline. For teams running AI in production, performance, reliability, and scalability matter as much as access and model quality.
Fast access to a model matters little if the model runs slowly in production. CoreWeave currently delivers the second fastest GLM 5.2 output speed on the Artificial Analysis Inference Providers leaderboard. That speed is the work of CoreWeave's inference engineering teams, who tune each model for the hardware it runs on rather than serving it off the shelf. Faster inference improves user experience, reduces agent completion times, and increases the amount of work applications can perform within a given budget.
That work goes beyond model-level tuning: CoreWeave optimizes the full inference stack from the GPUs and networking, to model and container loading, KV cache, to the serving runtime. This generates incremental performance gains that add up from metal to model, and it is what lets CoreWeave serve a model like GLM 5.2 at full speed when it breaks out.
Open-weight models are advancing rapidly, and GLM 5.2 is the latest example of how quickly the landscape is evolving. As new models emerge, CoreWeave will continue to make them available quickly, optimize them for production performance, and help customers evaluate them without the operational burden of managing inference infrastructure themselves. We look forward to sharing more about the engineering work behind these optimizations—and bringing the next generation of open-weight models to customers as soon as they are ready.











