Kling O3(reference-to-video)

Kling O3(reference-to-video)

Kuaishou's flagship reference video model for creators
2026-02-06
Video Generation
Pricing:
$0.084/Second

starting from

Bulk order? Contact your manager for exclusive deals

API Overview

Supports synchronous audio generation (sound=True)


Kling Video O3 Reference-to-Video is a reference-video generation dual-mode product line launched by Kuaishou, comprising two versions: Standard and Pro. Its core positioning is “to extract the subject’s identity and motion logic from multi-view reference images or reference videos, and generate high-fidelity dynamic videos with consistent identity and controllable movements.”

  • Core features: Supports uploading multiple reference images (up to 7 for Std, up to 7 for Pro; if a reference video is provided, the maximum number of images is 4) to establish the identity of characters or objects. Optionally, a reference video can be provided to guide camera movement and action timing.
  • Dual-version strategy: The Pro version delivers the highest visual fidelity, more sophisticated physics simulations, and enhanced lighting and shadow effects; the Std version optimizes inference costs while maintaining subject consistency.
  • Sound-and-image synergy: Supports preserving the original audio from the reference video (keep_original_sound) or enabling AI-generated sound effects (available only when no reference video is provided).
  • Applicable scenarios: Custom digital human videos, character animation generation, ad asset reuse, film and TV concept verification, and creative short-video production with consistent identity.
  • Flexible output: Supports arbitrary durations from 3 to 15 seconds and aspect ratios of 16:9, 9:16, and 1:1, adapting to distribution needs across all platforms.

───────────────────────────────────────────────────────────────────

Core Capabilities

👤 Multi-image Identity Binding

Accurately captures the subject’s appearance features from multi-angle reference images, ensuring highly consistent identity in the generated video.

🎥 Optional Motion Guidance

Can be used independently with images alone, or combined with a reference video to reuse its camera movements, rhythm, and scene logic.

🔊 Intelligent Audio Processing

Automatically inherits the audio track from the reference video, or generates AI-generated sound effects that match the environment when only images are provided.

🎬 O3 Architecture Realism

The Pro version achieves movie-level detail and natural dynamics; the Std version provides cost-effective baseline animations with consistent quality.

End-to-end API Integration

No preprocessing required—simply submit images/videos plus text prompts, and receive commercially usable high-definition video results directly.


───────────────────────────────────────────────────────────────────

Effect Demonstrations


API Console

Log in to explore more features! Click to Log In

API Reference (3)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
o3-std
POST
Stable
View Details
o3-pro
POST
Stable
View Details
Fetch
GET
Stable
View Details

API Pricing

$
ModelDescription302.AI Price

Kling O3-std

Without reference video and Sound Off

$0.084/Second

Kling O3-std

Without reference video and Sound On

$0.112/Second

Kling O3-std

With reference video

$0.126/Second

Kling O3-pro

Without reference video and Images only

$0.112/Second

Kling O3-pro

Without reference video and Images only + Sound On

$0.14/Second

Kling O3-pro

With reference video

$0.168/Second

Fetch

Fetch Task

Free