glm-4-airx

glm-4-airx

The high-performance language model launched by Zhipu AI, along with its low-latency, high-concurrency intelligent agent task execution engine, delivers outstanding performance in tool calls, real-time responses, and complex logic processing.
2025-04-14
LLM
Model capability: function_call
Input:
$1.4/1M tokens
Output:
$1.4/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

GLM-4-AirX is a high-performance language model launched by Zhipu AI, primarily positioned as a low-latency, high-concurrency intelligent agent task execution engine that excels in tool calls, real-time responses, and complex logic processing.

  • Performance rivals top international models: In benchmarks such as BFCL-v3 (comprehensive tool calls) and TAU-Bench (intelligent agent tasks), GLM-4-AirX achieves performance metrics that are on par with—or even surpass—those of larger models like GPT-4o and DeepSeek-V3 in certain areas.
  • Atom-level capabilities enhanced through reinforcement learning: By leveraging rejection sampling and reinforcement learning techniques, GLM-4-AirX significantly improves performance in core agent tasks such as instruction following, code generation, and function calls.
  • Ultra-low latency response in milliseconds: Optimization of the prefill and decoder autoregressive output stages during inference enables faster response times, making it ideal for real-time interaction scenarios.
  • High-concurrency enterprise-grade support: V3-level users can handle up to 500 concurrent requests, meeting the high-frequency call demands of applications such as financial risk control and e-commerce customer service.
  • Exceptional cost-effectiveness: As a high-speed version of GLM-4-Air, GLM-4-AirX features comprehensive upgrades in speed and concurrency, with call costs reduced by more than 30% compared to similar flagship models.

───────────────────────────────────────────────────────────────────

Core Capabilities

⚡ Millisecond-level real-time response:

Optimized inference architecture ensures that complex logic processing occurs within milliseconds, guaranteeing smooth multi-turn conversations and real-time retrieval.

🔧 Intelligent tool calls:

Enhanced Function Call capabilities enable seamless integration with external systems such as search engines and databases.

🤖 Optimized intelligent agent tasks:

Specific enhancements in instruction following and code generation capabilities make GLM-4-AirX well-suited for atomic task execution scenarios required by intelligent agents.

📈 High-concurrency enterprise-grade support:

V3 users enjoy support for up to 500 concurrent requests, ensuring stable performance in high-frequency interaction scenarios such as finance and e-commerce.

🌐 Deep adaptation across multiple scenarios:

Well-balanced for needs including code generation, tool integration, and real-time responses, making GLM-4-AirX the core engine for enterprise-level intelligent agents.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat (Zhipu GLM-4)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

glm-4-airx

-
8000

Input$1.4 / 1M tokens
Output$1.4 / 1M tokens

Input$1.4/ 1M tokens
Output$1.4/ 1M tokens
Original Price