
gpt-4.1
API Overview
GPT‑4.1 is OpenAI’s latest general‑purpose model, delivering major improvements in coding, instruction following, and long‑context understanding over GPT‑4o, with a refreshed knowledge cutoff of June 2024. The family includes GPT‑4.1, GPT‑4.1 mini, and GPT‑4.1 nano, all supporting context windows up to 1 million tokens. This enables stable retrieval and integration of information across very large codebases and multi‑document corpora, making the models well‑suited for domains like law, finance, and customer support that depend on reading and reasoning over long materials.
On coding, GPT‑4.1 reaches 54.6% on SWE‑bench Verified, a 21.4‑point absolute gain over GPT‑4o, and significantly outperforms on Aider’s polyglot diff benchmark. It produces precise incremental edits, follows diff formats reliably, and reduces unnecessary changes, rather than rewriting entire files. In practice, tools like Windsurf and Qodo report notably higher first‑pass acceptance of code edits and better code review quality.
For instruction following, GPT‑4.1 improves by 10.5 points over GPT‑4o on Scale’s MultiChallenge and scores higher on IFEval, handling strict output formats, negative instructions, ordered multi‑step tasks, and calibrated “I don’t know” behavior more faithfully, while maintaining coherent multi‑turn dialogue.
Thanks to the 1M‑token context and targeted long‑range training, GPT‑4.1 reliably solves “needle in a haystack” retrieval and outperforms GPT‑4o on OpenAI‑MRCR and Graphwalks long‑context reasoning benchmarks, supporting complex, multi‑hop reasoning across many documents.
In multimodal tasks, the GPT‑4.1 family shows strong gains in vision; on Video‑MME (long videos without subtitles) GPT‑4.1 achieves a state‑of‑the‑art 72.0%, making it suitable for long‑video understanding and cross‑scene reasoning.
On cost and latency, GPT‑4.1 is about 26% cheaper than GPT‑4o for typical queries, and benefits from higher prompt‑caching discounts and Batch API pricing. GPT‑4.1 mini matches or beats GPT‑4o on many benchmarks at around one‑sixth the cost, while GPT‑4.1 nano is the fastest and cheapest option for low‑latency tasks like classification and autocompletion.
Overall, the GPT‑4.1 family sets a new practical baseline for real‑world software engineering, intelligent agents, and long‑document understanding, and is positioned to replace GPT‑4.5 Preview as a primary API workhorse.
Playground
Log in to explore more features! Click to Log In