grok-4-fast-non-reasoning

grok-4-fast-non-reasoning

Non-reasoning mode of Grok-4-Fast
2025-09-23
LLM
Model capability: imageModel capability: function_call
Input:
$0.2/1M tokens
Output:
$0.5/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

Grok 4 Fast is a high-performance inference model launched by xAI, designed to deliver cutting-edge performance for both enterprise and consumer applications. It places particular emphasis on improving token efficiency, driving the development of smaller, faster AI models, and making high-quality inference capabilities accessible to more users and developers.

  • Ultimate Cost Efficiency: In inference benchmarks, Grok 4 Fast outperforms Grok 3 Mini, reducing token costs. While matching the performance of Grok 4, it uses 40% fewer thinking tokens, resulting in a 98% reduction in costs compared to Grok 4.
  • Native Tools and Top-Tier Search: It supports code execution, web browsing, and other features, with real-time data search and analysis capabilities. It leads in multilingual search benchmarks.
  • State-of-the-Art General Post-Training: It ranks at the top in multiple benchmarks, especially excelling in the LMArena search arena, where it outperforms other similar models.
  • Unified Architecture Design: Adopting a unified model architecture, it can handle both inference and fast-response tasks without distinguishing between task types, reducing latency and token costs, and supporting real-time application scenarios.
  • Ultra-Large Context Window: It supports a context window of up to 2 million tokens, enabling it to process large-scale text in a single go while ensuring stability and coherence in processing.

───────────────────────────────────────────────────────────────────

Core Capabilities

⚡ Multi-Platform and Version Compatibility

  • Unrestricted Access for All Users: Available on grok.com, as well as in iOS and Android apps. Free users enjoy unlimited access, with support for both quick and automatic modes, enhancing the search and information-query experience.
  • Developer-Friendly Version: Provides API access, including both inference and non-inference versions, supporting a context window of up to 2 million tokens. Developers can adjust computing resources according to their needs.

🛠️ Key Scenario Performance

  • Academic and Mathematical Reasoning: It performs exceptionally well in several highly challenging benchmarks, demonstrating outstanding ability to handle complex mathematical and academic tasks.
  • Real-Time Information Retrieval and Analysis: It can handle real-time data-dependent tasks, such as multi-round searches and information integration, efficiently completing large-scale computations and validations.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (4)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat(grok)
POST
Stable
View Details
Chat(grok-2-vision)
POST
Stable
View Details
Asynchronous request to chat
POST
Stable
View Details
Async Get Result
GET
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

grok-4-fast-non-reasoning

-
2000000

Input$0.2 / 1M tokens
Output$0.5 / 1M tokens

Input$0.2/ 1M tokens
Output$0.5/ 1M tokens
Original Price