sophnet/Qwen2.5-32B-Instruct

sophnet/Qwen2.5-32B-Instruct

The 3.2-billion-parameter version of the Qwen2.5 series
2025-07-08
LLM
Input:
$0.29/1M tokens
Output:
$0.86/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

Qwen2.5 is Alibaba’s flagship open-source language model, positioned as a high-performance, enterprise-grade large model optimized for long-text processing and structured output, supporting a context length of 128K and a generation length of 8K.

  • Performance Leap: The 72B-parameter model outperforms Llama-3.1-70B in 12 authoritative benchmarks including MMLU and MATH, with inference speeds twice as fast as similar models and costs only one-fourth as high.
  • Applicable Scenarios: It is well-suited for high-frequency interaction scenarios such as financial risk control, code generation, and multilingual translation, supporting JSON-structured output and the execution of complex system instructions.
  • Multimodal Capabilities: It supports over 29 languages (including Chinese, English, French, Japanese, Korean, and more), with a 30% improvement in understanding structured data such as tables.
  • Competitor Comparison: In the HumanEval code benchmark, it scores 86.6 points (compared to 86.0 for CodeQwen1.5); in the MBPP task, it achieves an 88.2% completion rate (Llama3.1-70B scores 84.2%).
  • Open-Source Ecosystem: Released under the Apache 2.0 license (except for the 72B version), it integrates vLLM/Ollama tool calls and has been downloaded over 1.68 million times, demonstrating strong community recognition.

───────────────────────────────────────────────────────────────────

Core Capabilities

⚡ Ultra-High-Speed Inference: Featuring exclusively optimized KV cache technology, it delivers response latencies below 50ms and generates thousand-token outputs at a cost as low as 0.1 yuan.

📊 Long-Text Processing: Supports 128K-context understanding and 8K continuous generation, boosting efficiency in handling complex reports by 50%.

🔑 Structured Output: Achieves a 92.3% accuracy rate in JSON generation and processes tabular data three times faster than industry averages.

🌍 Multilingual Coverage: Seamlessly switches between over 29 languages; in mixed Chinese-English scenarios, its F1 score reaches 89.7 (SDXL scores 85.2).

🛠️ Tool Ecosystem: Natively compatible with vLLM/Ollama tool calls, allowing API services to be deployed with just five lines of code.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat(SophNet)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

sophnet/Qwen2.5-32B-Instruct

-
128000

Input$0.29 / 1M tokens
Output$0.86 / 1M tokens

Input$0.29/ 1M tokens
Output$0.86/ 1M tokens
Original Price