Qwen-TTS (Speech Synthesis)

Qwen-TTS (Speech Synthesis)

Speech synthesis model from Tongyi Wanxiang
2025-08-27
Audio-Video Processing
Model capability: audio
Input:
$0.5/1M tokens
Output:
$2/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

Qwen-TTS is a speech synthesis model for the Tongyi Qianwen series. It supports Chinese, English, and mixed Chinese and English text input and streams audio output. The model automatically adjusts the prosody, rhythm, and emotional inflection based on the input text, achieving human-level naturalness and expressiveness. Currently, Qwen-TTS supports seven Chinese and English voice tones, including Cherry, Ethan, and Chelsie. It supports streaming audio output, with a theoretical first packet time of less than 400ms, demonstrating stability and speed. Using the Qwen API, developers can empower their applications with speech synthesis capabilities with just a few lines of code.


Reference: https://bailian.console.aliyun.com/?spm=5176.28197581.0.0.1e7d29a4JqZcpM&tab=doc#/doc/?type=model&url=2879134

API Console

Log in to explore more features! Click to Log In

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Qwen-TTS (Speech Synthesis)
POST
Stable
View Details

API Pricing

$
ModelDescriptionOfficial Price302.AI Price

Qwen-TTS

-

Input$0.5 / 1M tokens
Output$2 / 1M tokens

Input$0.5/ 1M tokens
Output$2/ 1M tokens
Original Price