GLM-ASR-2512

GLM-ASR-2512

Zhipu's next-generation speech recognition model supports real-time conversion of speech into high-quality text.
2025-12-15
Audio-Video Processing
Pricing:
$0.025/M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

GLM-ASR-2512 is a next-generation speech recognition model developed by Zhipu. Its core purpose is to support real-time conversion of speech into high-quality text, adapting to multiple scenarios and diverse accent environments. It focuses on delivering precise and efficient speech-to-text services, significantly enhancing input and recording efficiency.

  • Precise Recognition: In the latest benchmark tests, its character error rate (CER) has dropped as low as 0.0717, reaching an internationally leading level and rivaling the world’s top speech recognition models.
  • Efficient Custom Dictionary: It supports quick import of proprietary vocabulary, project codes, and rare personal and place names. Once configured, these settings remain effective for a long time, greatly reducing manual correction costs.
  • Advantages in Complex Scenarios: In complex scenarios such as mixed Chinese-English speech, industry-specific terminology, long sentences, colloquial expressions, and command-based utterances, it consistently delivers high-quality text output, outperforming similar models overall.
  • Multi-language and Dialect Processing: It supports Mandarin Chinese as well as major dialects including Cantonese, Sichuanese, Minnan, and Wu, while also covering various English accents and dozens of mainstream languages such as French, German, Japanese, Korean, Spanish, and Arabic.
  • Clear Upload Limits: It supports audio input and text output, with a single file size ≤ 25 MB and an audio duration ≤ 30 seconds.

───────────────────────────────────────────────────────────────────

Core Capabilities

🎯 Precise Adaptation to Multiple Scenarios

It can efficiently support core application scenarios such as real-time meeting minutes, customer service quality inspection and work order processing, live video subtitles, office document input, multilingual communication and translation, and medical record entry.

  • Real-time Meeting Minutes: It transcribes meeting content in real time and outputs structured minutes, boosting meeting record-keeping efficiency.
  • Customer Service Quality Inspection and Work Order Processing: It accurately transcribes call content, supporting quality inspection analysis and process optimization.
  • Live Video Subtitles: It provides low-latency, high-accuracy synchronized subtitles for live broadcasts and meetings.
  • Office Document Input: It quickly generates documents, emails, and draft proposals via voice input.
  • Multilingual Communication and Translation: It supports cross-language speech understanding, making it ideal for cross-border communication and online collaboration.
  • Medical Record Entry: It accurately recognizes medical terminology, assisting doctors in efficiently generating electronic medical records.

📌 Advantages in Special Scenarios

  • Mixed-element Scenarios: It precisely handles mixed Chinese-English speech, numbers and units, and discontinuous colloquial expressions, producing semantically complete and logically clear text.
  • Dialect and Noise Scenarios: It features automatic dialect identification and noise resistance, maintaining high recognition accuracy even in complex environments.
  • Foreign Languages with Accents: It can reliably recognize English spoken with accents and continues to deliver dependable results even in noisy environments.
  • Industry-Specific Jargon Scenarios: It supports recognition of industry and gaming jargon, enabling seamless switching between Chinese and English and streaming transcription.

───────────────────────────────────────────────────────────────────

Performance Demonstration

Pure text + accented English (Chinglish) + noisy environment:file.302.ai/gpt/resource302db/20251215/f864885c6bcb40ec89789c11679ffbac.wav

Result Output: OK, now please tell me, how do you know from this picture that its location is bangladesh?

API Console

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
GLM-ASR-2512
POST
Stable
View Details

API Pricing

$
ModelDescription302.AI Price

GLM-ASR-2512

-

$0.025/M tokens