speech-2.8-hd

speech-2.8-hd

High-performance text-to-speech model launched by MiniMax
2026-02-06
Audio-Video Processing
Pricing:
$52.5/1M characters

starting from

Bulk order? Contact your manager for exclusive deals
稳定性
Stable

API Overview

MiniMax Speech 2.8 HD is MiniMax’s flagship text-to-speech product, positioned as a professional voice synthesis API that delivers studio-grade, high-fidelity audio quality. It meets the demands of high-end audio production with unparalleled clarity and naturalness.

  • Key Upgrades: Supports over 17 preset voices and custom clone voices; natively parses onomatopoeic expressions such as (laughs) and (sighs); and offers emotional control and pronunciation customization capabilities.
  • Applicable Scenarios: Professional audio applications including audiobook production, film and TV dubbing, podcast broadcasting, educational materials, accessibility services, and voice acting for video game characters.
  • Product Value: Allows fine-tuned adjustments to speed, pitch, volume, sampling rate, bit rate, and channel configuration, delivering ready-to-use, broadcast-quality audio.
  • Audio Quality Advantage: HD processing provides richer, cleaner audio details, with significantly improved naturalness compared to the Turbo version.
  • Technical Features: Supports English number normalization (english_normalization) and pronunciation dictionaries (pronunciation_dict), ensuring accurate pronunciation of brand names and technical terms.

───────────────────────────────────────────────────────────────────

Core Capabilities

🎙️ Studio-Grade Audio Quality

HD rendering delivers higher clarity and naturalness, ideal for final deliverables.

💬 Onomatopoeia Support

Natively recognizes 22 types of onomatopoeic expressions, including (laughs), (coughs), (gasps), and (sighs), enhancing the vividness of speech.

😊 Emotional Tone Control

Allows specifying emotion modes such as happy or calm to match the emotional tone of the content.

🎛️ Full Parameter Fine-Tuning

Enables free adjustment of speed, pitch, volume, as well as audio format, sample rate, bit rate, and channel.

🔤 Accurate Pronunciation Management

Uses pronunciation_dict to define the correct pronunciation of proper nouns and enables english_normalization to optimize the reading of English numbers and dates.

API Console

Log in to explore more features! Click to Log In

API Analytics

API Reference (4)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
T2A(Speech Generation - Synchronous)
POST
Stable
View Details
T2A(Async extra content generation)
POST
Stable
View Details
T2A(Status Inquiry)
GET
Stable
View Details
Files(Audio File Download)
GET
Stable
View Details

API Pricing

$
ModelDescription302.AI Price

speech-2.8-hd

T2A (voice generation-synchronization)

$52.5/1M characters

speech-2.8-hd

Asynchronous Long-form Text-to-Speech Generation

$52.5/1M characters

T2A

Status Inquiry

Free

Files(Audio File Download)

-

Free