OpenAI recently updated GPT-5.4 with a feature that has the tech world buzzing: Extended Thinking mode. While the base model is already lightning-fast, this new toggle allows the AI to “ruminate”—running internal simulations and self-correcting before it ever types a single word of its response.

The results are a staggering 94% success rate on the ARC-AGI-1 reasoning benchmark, finally surpassing the 92.8% score held by human experts in the same category.

So if you’re still using ChatGPT for simple summaries, you’re essentially using a Ferrari to go to the grocery store. That said, even on a Plus plan, usage limits are tied to how complex your prompts are. Heavy tasks like large code audits can hit system limits quickly, and in some cases, responses may fall back to faster, less capable models.

Article continues below

You may like

Hardest Logic Puzzle Ever . Before providing the answer, show your work and explain where other models typically fail this specific prompt.”

What to read next

The results: We’ve all seen AI fail the “Strawberry” test or the “Three Gods” riddle. Those days are over because the model can handle much harder prompts. In this case, by utilizing its “Mid-Response Course Correction,” GPT-5.4 realized when it’s heading toward a wrong answer and pivoted mid-thought. What’s interesting is not just the answer but the metacognition. The model truly didn’t guess the answer. It identified the “Key Lemma” (the logical tool used to solve the problem). It essentially built its own “logic translation layer” to solve the puzzle.

USPTO patent database for the last two years. List any existing patents that might represent ‘Prior Art’ and explain the legal risk level.”

The results: Planning a new invention? GPT-5.4 can tell you if someone beat you to it. Here, the model used its 1-million token context window to “read” an entire database of patent descriptions in one go, spotting overlaps in abstract concepts.

As someone with plenty of ideas, I have used this prompt before to ensure that my ideas are uniquely mine with no overlaps of what’s currently on the market. This prompt is good to file away if you are a big thinker or solopreneur. Will I be launching my Cupcake laundry basket prototype anytime soon? Probably not, but it’s good to know I could.

Gemini 3.1 Pro is the king of “doing” (agentic automation), GPT-5.4 is the undisputed king of “thinking.” Its new Extended Thinking mode allows it to pause and simulate outcomes before responding, resulting in a 93.7% reasoning score that edges out human experts.

It is slower and significantly more expensive than the standard model, but for high-importance tasks like cybersecurity auditing, legal cross-referencing or complex coding, GPT-5.4 Thinking is currently the most capable “brain” on the planet. The chatbot is dead. Long live the reasoning engine. Give it a try and let me know what you think in the comments.

Google News

Follow Tom’s Guide on Google News and add us as a preferred source to get our up-to-date news, analysis, and reviews in your feeds.