
Qwen-TTS (Speech Synthesis)
Speech synthesis model from Tongyi Wanxiang
2025-08-27
Input:
$0.5/1M tokens
Output:
$2/1M tokens
Bulk order? Contact your manager for exclusive deals
API Overview
Qwen-TTS is a speech synthesis model for the Tongyi Qianwen series. It supports Chinese, English, and mixed Chinese and English text input and streams audio output. The model automatically adjusts the prosody, rhythm, and emotional inflection based on the input text, achieving human-level naturalness and expressiveness. Currently, Qwen-TTS supports seven Chinese and English voice tones, including Cherry, Ethan, and Chelsie. It supports streaming audio output, with a theoretical first packet time of less than 400ms, demonstrating stability and speed. Using the Qwen API, developers can empower their applications with speech synthesis capabilities with just a few lines of code.
API Console
Log in to explore more features! Click to Log In
API Reference (1)
API Pricing
$¥ 円 ₽