zai-org/autoglm-phone-9b-multilingual

zai-org/autoglm-phone-9b-multilingual

A visual-language reasoning engine specifically designed for mobile agents.
2025-12-11
LLM
Model capability: image
Input:
$0.036/1M tokens
Output:
$0.143/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

AutoGLM-Phone-9B-Multilingual is a 9-billion-parameter open-source multimodal model launched by FaceWall Intelligence (Zai-org). Its core purpose is to serve as a lightweight, on-device visual-language reasoning engine specifically designed for phone agents, aiming to enable multimodal perception and understanding of the screen to automatically execute actions.

  • Specially tailored for phone agents: Developed based on the AutoGLM framework, this model is optimized for controlling mobile devices. It uses a visual-language model to parse screen interface elements in real time, enabling intent understanding and task execution.
  • Multi-modal interaction capability: Supports both text and image inputs, capable of comprehending complex mobile screen content and automatically generating operation steps (such as taps and swipes) to complete end-to-end task loops.
  • Cost-effective inference: The model costs 0.25 yuan/Mt for input and 1 yuan/Mt for output, significantly reducing inference costs for enterprise-level applications compared to similar large models.
  • Secure and controllable mechanism: Equipped with prompts for confirming sensitive operations, it automatically switches to human intervention when encountering logins or verification codes. It also supports WiFi/network-based remote ADB debugging, ensuring the security of remote control.
  • Extensive language support: As a Multilingual version, it supports instruction understanding and interaction in multilingual environments, making it suitable for global application scenarios.

───────────────────────────────────────────────────────────────────

Core Capabilities

📱 Deep screen perception and understanding

Utilizing a visual-language model to parse mobile screen UI elements in real time, accurately identifying icons, buttons, and text, and converting pixel information into actionable semantic instructions.

🤖 End-to-end task automation

Based on the AutoGLM framework, it can automatically plan operation paths according to natural language instructions (e.g., “Open Xiaohongshu and search for food”), and use ADB (Android Debug Bridge) to perform screen operations such as taps and swipes.

🌐 Multi-modal input and remote control

Supports mixed text-and-image inputs and integrates WiFi/network-based remote ADB debugging capabilities, making it easy to achieve remote device control and management across networks.

🛡️ Intelligent security and human handoff

Equipped with built-in security mechanisms, it automatically triggers confirmation prompts for operations involving privacy or critical decisions (such as logins and payments), and seamlessly switches to human intervention when encountering verification codes.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat(PPIO)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

zai-org/autoglm-phone-9b-multilingual

-
65536

Input$0.036 / 1M tokens
Output$0.143 / 1M tokens

Input$0.036/ 1M tokens
Output$0.143/ 1M tokens
Original Price