
gpt-5-mini
A lightweight version of GPT-5 for high-frequency, simple scenarios that are sensitive to cost and speed.
2025-08-08
Input:
$0.25/1M tokens
Output:
$2/1M tokens
Bulk order? Contact your manager for exclusive deals
API Overview
Basic Information
- Release Date: The GPT-5 model was officially launched at 1:00 a.m. (Beijing Time) on August 8, 2025, and OpenAI rolled out API access permissions to developers worldwide.
- Model Portfolio: Initially released in August 2025, the model lineup includes GPT-5, mini, nano, and Chat versions. The core architecture features a dual-track system—gpt-5-main (fast response) and gpt-5-thinking (deep reasoning)—with a Pro version available for subscribers. On September 16, an independent programming model, GPT-5-Codex, was added, specifically optimized for coding tasks.
- Basic Parameters: The total length of the context window is 400K tokens (including 272K tokens for input and 128K tokens for output, with the output containing invisible reasoning tokens). GPT-5’s input cost is $1.25 per million tokens, while the output cost is $10 per million tokens; the input cost is 50% lower than that of GPT-4o.
Core Features
- Leading Performance Across Multiple Domains: In the SWE-bench Verified programming test, the accuracy reached 74.9% for the first time. GPT-5 Pro scored 89.4% on the GPQA Diamond-level doctoral test, achieved 94.6% accuracy without tool assistance in the 2025 AIME math competition, and scored 46.2% on the HealthBench Hard test.
- Extremely Low Hallucination Rate: The fact-checking error rate when connected to the internet is 45% lower than that of GPT-4o. The gpt-5-thinking version has a self-reasoning error rate 65% lower than that of the o3 model, with major response errors reduced by 78%. It also openly acknowledges task limitations.
- Outstanding Agent Capabilities: Supports reusing inference contexts via the Responses API, improving the efficiency of complex tool chains. Its ability to fix defects in coding scenarios surpasses competitors, achieving a 96.7% success rate in the τ²-bench telecom tool chain test.
Technical Highlights
- Unified Architecture: Integrates GPT series generation capabilities with the o series reasoning components, adopting a dual-track design of “fast model + deep reasoning model.” Performance and efficiency are balanced through adjustable reasoning levels, reducing input costs by 50% compared to previous generations.
- Real-Time Routing System: Automatically analyzes task complexity and dynamically switches response modes without requiring manual model selection by users.
- Security Mechanisms: Employs a “safe completion” strategy instead of hard rejection. After more than 9,000 hours of red-team testing—including 400 external testers—the security has been significantly enhanced in scenarios such as violent attack planning, and the rate of flattery responses has dropped below 6%.
Market Impact
- Brand Positioning: At the time of its August 2025 release, the model was valued at $300 billion. Concurrently, negotiations were initiated for employee stock sales, aiming for a valuation of $500 billion. LMArena testing ranked it first across all categories.
- Ecosystem Integration: Simultaneously integrated with platforms such as Microsoft Copilot, Microsoft 365, and Azure AI Foundry.
- Industry Transformation: Driving AI from a mere tool toward an “industry operating system,” marking the inaugural year of “AI Agent capability platformization.”
Application Scenarios
- Programming Development: Leveraging the independently released GPT-5-Codex model on September 16, developers can efficiently create responsive websites, apps, and 3D games, supporting continuous complex project iterations for up to 7 hours. GitHub paid users can integrate it for code review, achieving 74.5% accuracy in SWE-bench Verified tests. A live demonstration showed a French-language learning website developed in just 3 minutes.
- Professional Fields: Assists in medical result analysis, doctoral-level scientific research, financial report analysis, and PPT generation.
- Education and Interaction: The Chat version supports language learning and offers four preset personalized interaction modes—cynic, robot, listener, and top student—to meet individualized learning needs.
Related Review: “GPT-5 Review: Failed to Blow Up the Market, But Accurately Exposed Competitors—Cheap, Powerful, and No Bullshit”

Related Review: “GPT-5 Review: Failed to Blow Up the Market, But Accurately Exposed Competitors—Cheap, Powerful, and No Bullshit”

Playground
Log in to explore more features! Click to Log In
API Analytics
API Reference (5)
API Pricing
$¥ 円 ₽