
Kling O3(reference-to-video)
API Overview
Supports synchronous audio generation (sound=True)
Kling Video O3 Reference-to-Video is a reference-video generation dual-mode product line launched by Kuaishou, comprising two versions: Standard and Pro. Its core positioning is “to extract the subject’s identity and motion logic from multi-view reference images or reference videos, and generate high-fidelity dynamic videos with consistent identity and controllable movements.”
- Core features: Supports uploading multiple reference images (up to 7 for Std, up to 7 for Pro; if a reference video is provided, the maximum number of images is 4) to establish the identity of characters or objects. Optionally, a reference video can be provided to guide camera movement and action timing.
- Dual-version strategy: The Pro version delivers the highest visual fidelity, more sophisticated physics simulations, and enhanced lighting and shadow effects; the Std version optimizes inference costs while maintaining subject consistency.
- Sound-and-image synergy: Supports preserving the original audio from the reference video (keep_original_sound) or enabling AI-generated sound effects (available only when no reference video is provided).
- Applicable scenarios: Custom digital human videos, character animation generation, ad asset reuse, film and TV concept verification, and creative short-video production with consistent identity.
- Flexible output: Supports arbitrary durations from 3 to 15 seconds and aspect ratios of 16:9, 9:16, and 1:1, adapting to distribution needs across all platforms.
───────────────────────────────────────────────────────────────────
Core Capabilities
👤 Multi-image Identity Binding
Accurately captures the subject’s appearance features from multi-angle reference images, ensuring highly consistent identity in the generated video.
🎥 Optional Motion Guidance
Can be used independently with images alone, or combined with a reference video to reuse its camera movements, rhythm, and scene logic.
🔊 Intelligent Audio Processing
Automatically inherits the audio track from the reference video, or generates AI-generated sound effects that match the environment when only images are provided.
🎬 O3 Architecture Realism
The Pro version achieves movie-level detail and natural dynamics; the Std version provides cost-effective baseline animations with consistent quality.
⚡ End-to-end API Integration
No preprocessing required—simply submit images/videos plus text prompts, and receive commercially usable high-definition video results directly.
───────────────────────────────────────────────────────────────────
Effect Demonstrations
API Console
Log in to explore more features! Click to Log In