gpt-oss-120b

gpt-oss-120b

Most powerful open-weight model, fits into an H100 GPU
2025-08-05
LLM
Input:
$0.2/1M tokens
Output:
$1/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

Basic Information

GPT-OSS-120B is an open-weight language model released by OpenAI on August 5, 2025. It belongs to the GPT-OSS series and is licensed under the Apache 2.0 license, allowing free commercial download. It requires an 80GB GPU for operation, is compatible with the OpenAI Response API, and is optimized for agent workflows. With a total of 117 billion parameters, each token activates 5.1 billion parameters. In core inference benchmarks, its performance is comparable to o4-mini; in scenarios such as HealthBench, it outperforms both o1 and GPT-4o. The model has been open-sourced on Hugging Face and supports inference on platforms including PyTorch and Metal.

Core Features

It boasts strong tool-use capabilities, enabling tasks such as web search and Python code execution. The Tau-Bench agent evaluation shows outstanding performance. It supports long contexts up to 128k tokens, uses the “o200k_harmony” tokenizer, and is well-suited for handling long texts across multiple domains. It offers three levels of inference intensity—low, medium, and high—balancing latency and performance, and supports structured outputs and complete Chain-of-Thought (CoT) reasoning, making it easy to customize.

Technical Highlights

The model adopts a Mixture-of-Experts (MoE) architecture with a 36-layer structure containing 128 experts. Each token activates four experts, striking a balance between performance and efficiency. It introduces an innovative unsupervised CoT approach that does not rely directly on supervised alignment, facilitating the monitoring of anomalous behaviors. After adversarial fine-tuning tests, its internal security benchmarks have reached the level of state-of-the-art models. Additionally, it has undergone external expert review methodologies and launched a $500,000 red-team testing challenge to further enhance its security.

Market Impact

The model lowers the barrier to entry for using large AI models, helping organizations with limited resources deploy them more easily. It promotes the establishment of open-model security standards, with its security assessment methodology serving as an industry reference. By being compatible with multiple hardware and platforms, it fosters the democratization of AI, accelerates innovation in edge devices and local inference scenarios, and provides a high-performance example for the open-model ecosystem.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (20)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat(LLaMA3.3)
POST
Stable
View Details
Chat(LLaMA3.2 multimodal)
POST
Stable
View Details
Chat(LLaMA3.1)
POST
Stable
View Details
Chat(Mixtral-8x7B)
POST
Stable
View Details
Chat(Gemma-7B)
POST
Stable
View Details
Chat(Gemma2-9B)
POST
Stable
View Details
Chat(Command R+)
POST
Stable
View Details
Command R
POST
Stable
View Details
Chat(Qwen2)
POST
Stable
View Details
Chat(Qwen2.5)
POST
Stable
View Details
Chat(Llama-3.1-nemotron)
POST
Stable
View Details
Chat(Mistral)
POST
Stable
View Details
Chat(Pixtral-Large-2411multimodal)
POST
Stable
View Details
Chat(QwQ-32B-Preview)
POST
Stable
View Details
Marco-o1
POST
Stable
View Details
QVQ-72B-Preview
POST
Stable
View Details
QwQ-32B
POST
Stable
View Details
Gemma-3-27b-it
POST
Stable
View Details
Qwen3
POST
Stable
View Details
Chat(LLaMA4)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

gpt-oss-120b

-
128000

Input$0.2 / 1M tokens
Output$1 / 1M tokens

Input$0.2/ 1M tokens
Output$1/ 1M tokens
Original Price