Audio-Translation(Audio-Translation)

Audio-Translation(Audio-Translation)

Generate a voice translation into another language based on the original voice's tone.
2025-11-20
Audio-Video Processing
Pricing:
Depends on the specific model used
Bulk order? Contact your manager for exclusive deals

API Overview

Feature Overview

Input a segment of audio and generate translated speech in another language, preserving the original voice's tone.

Processing Workflow Explanation

🔄 Overall Processing Flow

Input Audio → Speech-to-Text (STT) → Text Translation → Voice Tone Cloning → Text-to-Speech (TTS) → Output Audio

The system automatically performs end-to-end translation from audio to audio, maintaining either the original or a specified voice tone.

Detailed Processing Steps

📋 Five Core Steps

1️⃣ Initialization (initialization)

Download and prepare the audio file

Automatically trim the cloned audio to within 30 seconds

2️⃣ Speech Recognition (speech_to_text)

Use OpenAI’s gpt-4o-transcribe to convert audio into text

3️⃣ Translation (translation)

Users can select their preferred LLM model (default is claude-haiku-4-5-20251001 for intelligent translation)

4️⃣ Voice Tone Cloning (voice_clone)

Analyze and extract audio features

5️⃣ Text-to-Speech (text_to_speech)

Generate target-language audio using the cloned voice tone

Output high-quality audio files

Vendor Selection Logic

🎯 Automatic Selection Rules

The system automatically chooses the best voice tone cloning vendor based on the target language:

Selection Priority

1. User-Specified: If a vendor is designated and supports the target language, it takes precedence

2. Language Match: Use index_tts2 for Chinese and English; otherwise, opt for Fish

⚠️ Important Notes

Recommended audio length for cloning is 10–30 seconds—longer clips will be automatically trimmed

Audio must be clear and free of noise for optimal results

Supported formats: MP3, WAV

API Console

Log in to explore more features! Click to Log In

API Reference (2)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Audio-Translation (Creating Audio Translation Tasks)
POST
Stable
View Details
Audio-Translation(Query audio translation tasks)
GET
Stable
View Details

API Pricing

$
ModelDescription302.AI Price

Audio-Translation

Creating Audio Translation Tasks

Depends on the specific model used

Audio-Translation

Query audio translation tasks

Free