gpt-oss-20b

gpt-oss-20b

Medium-sized open-weight model for low latency
2025-08-05
LLM
Input:
$0.1/1M tokens
Output:
$0.5/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

Basic Information

GPT-OSS-20B is a lightweight open-weight language model released by OpenAI on August 5, 2025. It belongs to the GPT-OSS series and is licensed under the flexible Apache 2.0 license, allowing free commercial use. The model has a total of 21 billion parameters, with each token activating 360 million parameters. It is compatible with the OpenAI Response API and optimized for agent workflows. The deployment threshold is extremely low—only 16 GB of memory is required to run it on edge devices. The weights have been open-sourced on Hugging Face and natively quantized into MXFP4 format. It supports inference on multiple platforms, including PyTorch, Apple Metal, and Windows ONNX Runtime, and has also established partnerships with mainstream deployment platforms such as Azure, AWS, and Ollama.

Core Features

Its performance rivals that of OpenAI’s o3-mini, while delivering superior results in scenarios like competitive mathematics (AIME 2024/2025) and healthcare (HealthBench). It supports long context lengths of up to 128k tokens and features the “o200k_harmony” tokenizer, enabling it to handle long-text tasks across multiple domains. It boasts tool-use capabilities and few-shot function calling abilities, allowing it to perform operations such as web searches and code execution. It offers three levels of inference intensity—low, medium, and high—which developers can quickly configure via system messages, striking a balance between latency and task requirements.

Technical Highlights

The model adopts a Mixture-of-Experts (MoE) architecture with a 24-layer structure containing 32 experts. Each token activates four experts, achieving an optimal balance between efficiency and performance. It introduces an innovative unsupervised Chain-of-Thought (CoT) approach that does not rely on direct alignment supervision, making it easier to monitor anomalous behavior. After undergoing rigorous safety training and passing adversarial fine-tuning tests under the "Preparedness Framework," its internal security benchmarks reach the level of state-of-the-art models. Additionally, it supports structured outputs, catering to customized development needs.

Market Impact

This model significantly reduces the cost of deploying AI models, empowering small organizations, resource-constrained industries, and emerging markets to adopt AI solutions. It accelerates the adoption of AI on edge devices, making it ideal for local inference and low-latency applications. Its open nature and secure design set a benchmark for lightweight open models in the industry, accelerating the democratization of AI while providing the research community with practical examples of unsupervised CoT and security evaluation methodologies.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat(Talk)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

gpt-oss-20b

-
128000

Input$0.1 / 1M tokens
Output$0.5 / 1M tokens

Input$0.1/ 1M tokens
Output$0.5/ 1M tokens
Original Price