glm-4.7-flashx

glm-4.7-flashx

The lightweight, high-speed version of GLM-4.7—a language model that balances performance and cost-effectiveness.
2026-01-20
LLM
Model capability: thinkingModel capability: function_call
Input:
$0.0715/1M tokens
Output:
$0.429/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

GLM-4.7-FlashX is a lightweight, high-speed text language model launched by Zhipu AI, primarily positioned as an “efficient inference engine tailored for agentic coding and high-frequency interaction scenarios.” While retaining the core capabilities of GLM-4.7, it optimizes response speed and resource efficiency to the extreme.

Key Upgrades: As a lightweight, high-speed version of the GLM-4.7 series, it’s specially optimized for low-latency, high-concurrency scenarios, significantly boosting inference speed.

Applicable Scenarios: It’s ideal for scenarios that are sensitive to response speed, such as agentic coding, intelligent customer service, real-time frontend generation, and multi-turn collaborative dialogues.

Product Value: It retains key capabilities like tool invocation and structured output, reduces deployment costs, and can be easily integrated into existing systems.

Performance Advantages: It supports context windows of up to 200K tokens and a maximum output length of 128K tokens, striking a balance between handling long-term tasks and delivering rapid responses.

Developer-Friendly: It natively supports streaming output, Function Calls, and MCP tool invocations, enabling seamless integration with agent workflows.

───────────────────────────────────────────────────────────────────

Core Capabilities

Ultra-Fast Response: Its lightweight architecture achieves first-token latency in milliseconds, making high-frequency interactions smoother.

🧠 Intelligent Reasoning: It offers multiple reasoning modes, flexibly adapting to various task requirements such as coding, Q&A, and content creation.

🛠️ Powerful Tool Collaboration: It supports Function Calls and the MCP protocol, allowing it to invoke external tools and data sources to expand its functional boundaries.

💬 Streaming Interaction Experience: It provides real-time, word-by-word output, creating a human-like dialogue rhythm and enhancing user immersion.

🗃️ Efficient Long-Context Processing: With a 200K token input window and intelligent context caching, long conversations remain smooth without lagging.

🧾 Structured Output: It natively supports formats like JSON, making it easy for backend systems to parse directly and reducing secondary processing costs.

───────────────────────────────────────────────────────────────────

Demonstration of Performance (based on GLM-4.7)


Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat (Zhipu GLM Multimodal)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

glm-4.7-flashx

glm-4.7-flashx
200000

Input$0.0715 / 1M tokens
Output$0.429 / 1M tokens

Input$0.0715/ 1M tokens
Output$0.429/ 1M tokens
Original Price