Concept visualization of GPT-5.2 analyzing documents, code, and data
GDPval Knowledge Work
70.9%
Beats or ties top industry professionals on tasks across 44 occupations
Error Reduction
30%↓
Fewer hallucinations & factual errors compared to GPT-5.1
Long-Context Accuracy
~100%
On 4-needle MRCR benchmark (out to 256K tokens)
SWE-Bench Pro
55.6%
New state-of-the-art on rigorous software engineering eval

OpenAI's "Code Red" Bears Fruit With Major Reasoning Upgrade

In a strategic response to intensifying competition from Google's Gemini 3, OpenAI has launched GPT-5.2, its most advanced model yet, focused squarely on professional and enterprise applications. Announced just weeks after CEO Sam Altman's internal "code red" memo, this release represents a consolidation and enhancement of the reasoning and workflow capabilities introduced with GPT-5 and GPT-5.1.

Available starting today for ChatGPT Plus, Pro, Business, and Enterprise subscribers, as well as via the API, GPT-5.2 is being pitched as the premier AI for "professional knowledge work". OpenAI claims it delivers unprecedented gains in economically valuable tasks—from creating complex spreadsheets and presentations to writing production-grade code and analyzing massive documents—often performing at or above the level of human experts.

Three Flavors: Instant, Thinking, and Pro

GPT-5.2 comes in three distinct variants, each optimized for different types of tasks:

Model Variant Primary Use Case & Strength Key Benchmark Highlight
GPT-5.2 Instant Speed-optimized for routine queries: information-seeking, writing, translation. Optimized for latency-sensitive applications with better performance at reasoning.effort='none'.
GPT-5.2 Thinking Complex structured work: coding, long-document analysis, math, planning, and multi-step projects. 70.9% on GDPval (ties/beats experts), 55.6% on SWE-Bench Pro, ~100% on 4-needle MRCR.
GPT-5.2 Pro Maximum accuracy for difficult problems, scientific research, and high-stakes analysis. 93.2% on GPQA Diamond (graduate-level science), top performance on FrontierMath.

The standout is the "Thinking" variant, which OpenAI designed to "unlock even more economic value" by dramatically improving on real-world professional tasks. Product lead Fidji Simo stated the model is better at "creating spreadsheets, building presentations, writing code, perceiving images, understanding long context, using tools and then linking complex, multi-step projects".

Breakthrough Capabilities: Where GPT-5.2 Shines

Expert-Level Professional Work

On GDPval—a benchmark of well-specified knowledge tasks across 44 occupations like finance, marketing, and management—GPT-5.2 Thinking beat or tied top industry professionals on 70.9% of comparisons, according to expert human judges. This is a massive leap from GPT-5's 38.8%. OpenAI estimates it can produce outputs for these tasks at >11x the speed and <1% the cost of human experts. One judge remarked on the output quality: "It appears to have been done by a professional company with staff".

Unprecedented Long-Context Reasoning

GPT-5.2 sets a new state of the art in understanding and connecting information across vast documents. It achieves near-perfect ~100% accuracy on the challenging "4-needle" MRCR benchmark out to 256,000 tokens. This translates to substantially better performance on real-world tasks like deep analysis of lengthy reports, legal contracts, or research papers. For workflows that need to go even further, the new /responses/compact API endpoint allows for "loss-aware compression" of prior conversation, effectively extending the model's working context window.

Industry Validation: Major companies testing GPT-5.2 confirm its advancements. Notion, Box, Shopify, and Zoom observed "state-of-the-art long-horizon reasoning and tool-calling." Databricks and Hex found it exceptional for "agentic data science," while coding startups like Windsurf and JetBrains reported "state-of-the-art agentic coding performance".

State-of-the-Art Coding & Tool Use

For developers, GPT-5.2 represents a significant leap. It achieves 55.6% on SWE-Bench Pro, a rigorous, contamination-resistant evaluation of real-world software engineering across four languages, and 80% on SWE-bench Verified. This translates to more reliable debugging, implementing feature requests, and refactoring large codebases. Its tool-calling reliability reaches 98.7% on the Tau2-bench Telecom evaluation, enabling more robust multi-step agentic workflows that can coordinate across multiple systems.

Advancing Science, Math, and Vision

OpenAI positions GPT-5.2 Pro and Thinking as "the world's best models for assisting and accelerating scientists". They achieve 92.4% and 93.2% respectively on GPQA Diamond, a graduate-level "Google-proof" Q&A benchmark. In a notable case study, GPT-5.2 Pro directly helped researchers produce a proof that resolved an open problem in statistical learning theory, which was then verified by external experts.

On the FrontierMath benchmark for expert-level mathematics, GPT-5.2 Thinking scored 40.3% (Tiers 1-3), a substantial jump from 31.0%. Research lead Aidan Clark noted that stronger math performance indicates a model's ability to "follow multi-step logic, keep numbers consistent over time, and avoid subtle errors"—capabilities critical for financial modeling, forecasting, and data analysis.

Vision capabilities also improved, with error rates roughly halved on chart reasoning and software interface understanding tasks. The model shows a better grasp of spatial relationships within images, which aids in interpreting dashboards, diagrams, and technical screenshots.

Availability, API Updates, and The "Code Red" Backstory

Rollout & API: GPT-5.2 is rolling out starting today (December 11, 2025) to paid ChatGPT plans (Plus, Pro, Business, Enterprise). The models, including gpt-5.2-pro-2025-12-11, are available immediately in the API. OpenAI has also published a detailed prompting guide highlighting GPT-5.2's stronger instruction adherence, lower verbosity, and improved scaffolding for complex tasks.

This launch comes in the wake of an internal "code red" declared by CEO Sam Altman in response to competitive pressure from Google's Gemini 3 and concerns about ChatGPT's market share. The directive sidelined other projects (like introducing ads) to focus resources on improving the core ChatGPT experience. Sam Altman told CNBC he expects OpenAI will exit this "code red" state by January.

Notably absent from this release is a new image generation model, despite Altman's memo highlighting it as a priority following Google's "Nano Banana" models. Reports suggest a model with better image capabilities may arrive in January.

The Bottom Line: A Power Tool for Professionals

GPT-5.2 is not a reinvention but a powerful refinement. It turns the dial up on the reasoning and agentic capabilities that began with GPT-5, delivering measurable, economically valuable improvements for professional use. With its ability to match or surpass human experts on structured knowledge work, near-perfect long-context comprehension, and robust coding skills, it sets a new benchmark for what an AI assistant can achieve in professional settings.

For businesses and power users, the upgrade is substantial. For the wider AI landscape, it signals OpenAI's intense focus on maintaining its edge in the face of formidable competition, even if it means running higher-cost reasoning models at scale. The race for AI supremacy, especially in the enterprise, is accelerating, and GPT-5.2 is OpenAI's latest volley.