TL;DR

OpenAI has officially launched GPT-5.4, a new frontier model that consolidates its best reasoning, coding and agentic capabilities into a single packageFaster than GPT-5.2, dramatically better at real-world professional tasksCapable of controlling computers natively

OpenAI is not having a quiet week. From amending Pentagon deals to managing the PR fallout from a leaked internal transcript, the company appears to be dealing with plenty behind closed doors.

Yet despite the turmoil, OpenAI has just launched GPT-5.4, its most capable and efficient frontier model to date, rolling it out simultaneously across ChatGPT, the Codex platform and its developer API.

GPT-5.3 Codex with significantly improved reasoning, computer use and knowledge-work capabilities.

You may like

The result is a model designed to do real work, actually operating software, analyzing spreadsheets and powering long-horizon agent workflows with minimal hand-holding.

OSWorld-Verified — the benchmark that measures a model’s ability to navigate a real desktop environment — GPT-5.4 scores 75.0%, which not only destroys GPT-5.2’s 47.3% score but also edges past the measured human baseline of 72.4%. In other words, this model is already better than the average person at navigating a computer via screenshots alone.

Hallucinations are down significantly. According to OpenAI, GPT-5.4’s individual factual claims are 33% less likely to be false than GPT-5.2‘s, and its full responses are 18% less likely to contain any errors — a meaningful upgrade for professionals who rely on accurate outputs.

What to read next

Scale’s MCP Atlas benchmark with 36 MCP servers enabled, the tool-search configuration reduced total token usage by 47% while maintaining accuracy. For developers building large agentic systems, that translates directly to lower costs and faster response times.

It’s clear OpenAI is catering to developers and power users with this rollout.

Google News

Follow Tom’s Guide on Google News and add us as a preferred source to get our up-to-date news, analysis, and reviews in your feeds.