SenseNova-V6-Turbo

SenseNova-V6-Turbo

SenseTime Lightweight Multimodal Large Model
2025-04-09
LLM
Model capability: image
Input:
$0.275/1M tokens
Output:
$0.715/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

SenseNova-V6-Turbo is a high-performance, cost-effective multimodal inference model released by SenseTime. As the “efficiency-first version” within the “Daily New SenseNova V6” large-model ecosystem, its core positioning is a practical intelligent engine featuring “native multimodal capabilities + low-latency response + high-throughput deployment,” designed for large-scale application scenarios that are sensitive to both cost and speed.

  • Lightweight MoE Architecture: Based on an optimized mixture-of-experts structure, this architecture significantly reduces the number of parameters and computational overhead while preserving the core multimodal fusion capabilities of the V6 series. Inference speed is 3–5 times faster than that of V6-Pro.
  • Native Support for All Modalities: It supports joint understanding of text, images, and short videos (up to 2 minutes), making it suitable for high-frequency interactive scenarios such as live-streaming moderation, short-video tag generation, and mobile-end text-and-image question-answering.
  • FP8/INT4 Quantization-Friendly: Deeply adapted to SenseTime’s self-developed inference engine, it supports FP8 post-training quantization and INT4 inference compression, enabling real-time multimodal responses (<<500ms>) on consumer-grade GPUs or edge devices.
  • Outstanding Cost Efficiency: The API call price is only 1/4–1/3 of that for V6-Pro, making it ideal for businesses requiring high concurrency and low per-call costs, such as content safety moderation, e-commerce image-text matching, and intelligent Q&A in educational apps.
  • Preserves Core V6 Capability Baseline: It scores 76.2 on SuperCLUE-V (a Chinese multimodal evaluation benchmark). Although slightly lower than the Pro version (80.4), it still significantly outperforms most open-source and commercial competitors.

───────────────────────────────────────────────────────────────────

Core Capabilities

Fast Text-and-Image Understanding: Efficiently handles tasks such as consistency checks between product images and accompanying text, recognition of question stems from exam paper screenshots, and sentiment analysis of social media text-and-image posts.

🎥 Short-Video Keyframe Inference: Extracts event summaries, detects violations, and extracts interest tags from short videos lasting 30 seconds to 2 minutes—perfect for platforms like TikTok and Kuaishou.

📱 Mobile-Friendly Deployment: With a small model size and low memory footprint, it can be integrated into mobile SDKs, supporting basic multimodal interactions even in offline or weak-network environments.

💬 Instruction-Fine-Tuning Optimization: Specifically fine-tuned for common user queries (e.g., “What’s in the picture?” or “What’s this video about?”) to deliver concise, accurate, and non-redundant responses.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat (SenseTime)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

SenseNova-V6-Turbo

-
32000

Input$0.25 / 1M tokens
Output$0.65 / 1M tokens

Input$0.275/ 1M tokens
Output$0.715/ 1M tokens
10%