Kling V3(Text-to-video)

Kling V3(Text-to-video)

Kling flagship text-to-video model
2026-02-06
Video Generation
Pricing:
$0.084/Second

starting from

Bulk order? Contact your manager for exclusive deals

API Overview

Supports synchronous audio generation (sound=True)


Kling V3.0 Text-to-Video is a text-to-video dual-mode product line launched by Kuaishou, featuring two versions: Standard and Pro. Its core positioning is a high-quality short-video API that generates cinematic camera movements, natural subject motions, and optional synchronized sound effects solely based on plain text descriptions.

  • Dual-version strategy: The Pro version delivers top-tier visual fidelity, smoother motion, and stronger adherence to prompt instructions; the Standard version significantly reduces usage costs while maintaining high availability.
  • Core capabilities: Both versions support video lengths of 5 or 10 seconds, aspect ratios of 16:9, 9:16, and 1:1, negative prompts, and optional AI-generated sound effects.
  • Audio integration: The Pro version additionally supports adding up to 2 custom voice entries (voice_list) for character dialogue; the Standard version only supports ambient sound effects.
  • Applicable scenarios: Social media short videos, marketing ads, creative concept visualization, AI digital human content, and dynamic storytelling with sound effects.
  • Cost comparison: The Pro version costs about twice as much as the Standard version; users can flexibly choose based on their quality requirements.

───────────────────────────────────────────────────────────────────

Core Capabilities

🎬 Cinematic Visual Presentation (Pro)

The Pro version comprehensively outperforms in terms of detail, lighting, and motion smoothness, making it ideal for final deliverables.

🔊 Synchronized Audio and Video Generation

When sound is enabled, ambient sound effects matching the visuals are automatically added; the Pro version also allows overlaying custom voices for dialogue.

📱 Multi-platform Aspect Ratio Adaptation

One-click output in 16:9 (YouTube), 9:16 (TikTok/Reels), and 1:1 (Instagram) formats—no cropping required.

🚫 Precise Content Control

Use negative_prompt to exclude unwanted elements such as “blur” or “distortion,” enhancing generation stability.

✨ Intelligent Creation Assistance

We recommend using Prompt Enhancer to automatically add cinematic shot types (e.g., “handheld tracking,” “shallow depth of field”) and elevate the film-like quality of your videos.

───────────────────────────────────────────────────────────────────

Effect Demonstrations


API Console

Log in to explore more features! Click to Log In

API Reference (3)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Kling v3-std
POST
Stable
View Details
Kling v3-pro
POST
Stable
View Details
Fetch
GET
Stable
View Details

API Pricing

$
ModelDescription302.AI Price

Kling v3-std

No synchronous sound is generated

$0.084/Second

Kling v3-std

Generate synchronous sound

$0.126/Second

Kling v3-pro

No synchronous sound is generated

$0.112/Second

Kling v3-pro

Generate synchronous sound

$0.168/Second

Fetch

Fetch Task

Free