{"id":10543,"date":"2025-07-20T13:10:12","date_gmt":"2025-07-20T13:10:12","guid":{"rendered":"https:\/\/www.newsbeep.com\/ca\/10543\/"},"modified":"2025-07-20T13:10:12","modified_gmt":"2025-07-20T13:10:12","slug":"openai-claims-a-breakthrough-in-llm-reasoning-on-complex-math-problems","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/ca\/10543\/","title":{"rendered":"OpenAI claims a breakthrough in LLM reasoning on complex math problems"},"content":{"rendered":"<p>                                    <a class=\"article-menu__content__link\" href=\"#summary\"><br \/>\n                        <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/the-decoder.com\/resources\/icons\/summary.svg\" alt=\"summary\" width=\"27\" height=\"24\" data-no-lazy=\"1\"\/><br \/>\n                        Summary<br \/>\n                    <\/a><\/p>\n<p>Added statements from OpenAI researcher Jerry Tworek<\/p>\n<p>Update as of July 20, 2025:<\/p>\n<p>OpenAI researcher <a target=\"_blank\" rel=\"noopener nofollow\" href=\"https:\/\/x.com\/MillionInt\/status\/1946551400365994077\">Jerry Tworek<\/a> confirmed on X that the model below received &#8220;very little IMO-specific work&#8221;\u2014just continued training of the general-purpose base models. All solutions relied on natural language proofs without any special evaluation framework.<\/p>\n<p>Tworek called the achievement a genuine research breakthrough delivered by <a target=\"_blank\" rel=\"noopener nofollow\" href=\"https:\/\/x.com\/alexwei_\">Alexander Wei&#8217;s team<\/a>. He later added that <a target=\"_blank\" rel=\"noopener nofollow\" href=\"https:\/\/x.com\/MillionInt\/status\/1946556255490982022\">a public release of the model is possible by the end of the year<\/a>.<\/p>\n<p><a target=\"_blank\" rel=\"noopener nofollow\" href=\"https:\/\/x.com\/MillionInt\/status\/1946558130906968330\">Tworek also noted<\/a> that all of OpenAI\u2019s major announcements this week\u2014the <a target=\"_blank\" rel=\"noopener nofollow\" href=\"https:\/\/the-decoder.de\/openai-chef-warnt-vor-dem-einsatz-von-chatgpt-agent-fuer-wichtige-aufgaben\/\">general AI agent system<\/a>, a close loss to a human in a <a target=\"_blank\" rel=\"noopener nofollow\" href=\"https:\/\/the-decoder.de\/openai-ki-erreicht-zweiten-platz-bei-internationalem-code-wettbewerb\/\">heuristic programming contest<\/a>, and solving 5 of 6 IMO problems\u2014came from the same reinforcement learning system. According to Twore, <a target=\"_blank\" rel=\"noopener nofollow\" href=\"https:\/\/the-decoder.de\/openai-stattet-chatgpt-mit-autonomen-agenten-faehigkeiten-aus\/\">ChatGPT agent<\/a> runs on an earlier version built on an older base model.<\/p>\n<p>Ad<\/p>\n<p>THE DECODER Newsletter<\/p>\n<p>The most important AI news straight to your inbox.<\/p>\n<p>\u2713 Weekly<\/p>\n<p>\u2713 Free<\/p>\n<p>\u2713 Cancel at any time<\/p>\n<p><a href=\"https:\/\/www.newsbeep.com\/ca\/wp-content\/uploads\/2025\/07\/jerry_tworek_x_imo.png\"><img data-lazyloaded=\"1\" fetchpriority=\"high\" decoding=\"async\" class=\"wp-image-25599 size-full\" src=\"https:\/\/www.newsbeep.com\/ca\/wp-content\/uploads\/2025\/07\/jerry_tworek_x_imo.png\" alt=\"\" width=\"607\" height=\"435\"\/><\/a>Screenshot of Tworek&#8217;s statements on X. | Image: <a target=\"_blank\" rel=\"noopener nofollow\" href=\"https:\/\/x.com\/MillionInt\">via X<\/a><\/p>\n<p>Share<\/p>\n<p>Recommend our article<\/p>\n<p>        Share<\/p>\n<p>There are <a target=\"_blank\" rel=\"noopener nofollow\" href=\"https:\/\/x.com\/zjasper666\/status\/1946650175063384091?t=pAzIIRZTrt9d1F6OM0AOgQ&amp;s=19\">rumors<\/a> that DeepMind has also earned a gold medal in the IMO contest, though the company has not made any official announcement. Last year, DeepMind&#8217;s <a href=\"https:\/\/the-decoder.com\/google-deepminds-latest-ai-models-might-bring-us-one-step-closer-to-llms-that-can-reason\/\" rel=\"nofollow noopener\" target=\"_blank\">AlphaProof and AlphaGeometry<\/a> systems took silver by solving four out of six problems.<\/p>\n<p>While OpenAI claims to use a standard language model, it remains unclear what approach either team used this year. In 2024, DeepMind&#8217;s silver-medal systems relied on a hybrid method that combined a pre-trained LLM with elements from <a href=\"https:\/\/the-decoder.com\/alphazero-learns-human-concepts\/\" rel=\"nofollow noopener\" target=\"_blank\">classic search algorithms<\/a>.<\/p>\n<p>Article from July 19, 2025:<\/p>\n<p>OpenAI says its experimental language model has solved International Mathematical Olympiad (IMO) problems at a gold medal level\u2014a possible breakthrough for AI with general reasoning skills. The results have not yet been independently confirmed.<\/p>\n<p>According to <a target=\"_blank\" rel=\"noopener nofollow\" href=\"https:\/\/x.com\/alexwei_\/status\/1946477742855532918\">OpenAI researchers Alexander Wei<\/a> and <a target=\"_blank\" rel=\"noopener nofollow\" href=\"https:\/\/x.com\/polynoamial\/status\/1946478249187377206\">Noam Brown<\/a>, the model tackled the IMO 2025 competition, solving the first five of the six official problems and earning 35 out of a possible 42 points.<\/p>\n<p>Recommendation<\/p>\n<p>                                            <a class=\"link-overlay\" href=\"https:\/\/the-decoder.com\/deepminds-genie-2-creates-playable-3d-worlds-from-single-images\/\" aria-label=\"DeepMind&#039;s Genie 2 generates playable 3D worlds from single images\" rel=\"nofollow noopener\" target=\"_blank\"><\/p>\n<p>                                                        \t\t\t<a class=\"post-thumbnail\" href=\"https:\/\/the-decoder.com\/deepminds-genie-2-creates-playable-3d-worlds-from-single-images\/\" aria-hidden=\"true\" tabindex=\"-1\" rel=\"nofollow noopener\" target=\"_blank\"><\/p>\n<p>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" data-lazyloaded=\"1\" src=\"https:\/\/www.newsbeep.com\/ca\/wp-content\/uploads\/2025\/07\/genie_2_screenshots-375x212.png\" loading=\"lazy\" alt=\"DeepMind's Genie 2 generates playable 3D worlds from single images\" width=\"375\" height=\"212\"\/><br \/>\n\t\t\t\t\t\t\t<\/a><\/p>\n<p>                \t\t\t<a class=\"post-thumbnail\" href=\"https:\/\/the-decoder.com\/deepminds-genie-2-creates-playable-3d-worlds-from-single-images\/\" aria-hidden=\"true\" tabindex=\"-1\" rel=\"nofollow noopener\" target=\"_blank\"><\/p>\n<p>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" data-lazyloaded=\"1\" src=\"https:\/\/www.newsbeep.com\/ca\/wp-content\/uploads\/2025\/07\/genie_2_screenshots-375x212.png\" loading=\"lazy\" alt=\"DeepMind's Genie 2 generates playable 3D worlds from single images\" width=\"375\" height=\"212\"\/><br \/>\n\t\t\t\t\t\t\t<\/a><\/p>\n<p>The IMO is considered the most difficult math competition for high school students, requiring creativity and rigorous logical reasoning. Wei claims this is the first AI model that can &#8220;craft intricate, watertight arguments at the level of human mathematicians.&#8221;<\/p>\n<p><a href=\"https:\/\/www.newsbeep.com\/ca\/wp-content\/uploads\/2025\/07\/imo_2025_openai_model_solution-e1752924326960.png\"><img loading=\"lazy\" data-lazyloaded=\"1\" decoding=\"async\" class=\"wp-image-25565 size-full\" src=\"https:\/\/www.newsbeep.com\/ca\/wp-content\/uploads\/2025\/07\/imo_2025_openai_model_solution-e1752924326960.png\" alt=\"\" width=\"920\" height=\"912\"\/><\/a>A step-by-step solution generated by OpenAI&#8217;s model for an IMO problem. | Image: Screenshot via X<\/p>\n<p>The model generated its solutions under standard competition conditions: two 4.5-hour sessions, no outside help, all answers written in natural language, and no tool use. Former IMO medalists graded the responses anonymously. The <a target=\"_blank\" rel=\"noopener nofollow\" href=\"https:\/\/github.com\/openai\/imo\">full solutions are available on GitHub<\/a>.<\/p>\n<p>Still room to scale<\/p>\n<p>Unlike <a href=\"https:\/\/the-decoder.com\/alphageometry2-deepmind-ai-outperforms-math-olympians-at-geometry-tasks\/\" rel=\"nofollow noopener\" target=\"_blank\">DeepMind&#8217;s AlphaGeometry<\/a>, which is built specifically for math, OpenAI&#8217;s model is a general-purpose reasoning language model. &#8220;We reach this capability level not via narrow, task-specific methodology, but by breaking new ground in general-purpose reinforcement learning and test-time compute scaling,&#8221; Wei explains.<\/p>\n<p>Brown confirms that the model relies on &#8220;new experimental general-purpose techniques&#8221; and <a href=\"https:\/\/the-decoder.com\/study-shows-test-time-compute-scaling-is-a-path-to-better-ai-systems\/\" rel=\"nofollow noopener\" target=\"_blank\">scales its compute at test time<\/a>, though he doesn&#8217;t share the technical details.<\/p>\n<p>&#8220;o1 thought for seconds. Deep Research for minutes. This one thinks for hours,&#8221; <a target=\"_blank\" rel=\"noopener nofollow\" href=\"https:\/\/x.com\/polynoamial\/status\/1946478253960466454\">Brown notes<\/a>, pointing out that the new model is more efficient and still has scaling potential. He argues that even a small advantage over human performance can be enough to drive major scientific progress.<\/p>\n<p>Wei says OpenAI has no plans to release this model or a similar one in the coming months, stressing that it&#8217;s strictly a research project. He also clarified that while <a href=\"https:\/\/the-decoder.com\/openai-ceo-sam-altman-says-gpt-5-is-probably-coming-sometime-this-summer\/\" rel=\"nofollow noopener\" target=\"_blank\">GPT-5 is planned &#8220;soon&#8221;<\/a>, it is unrelated to the IMO model, which was developed by a small team led by Wei.<\/p>\n<p><a target=\"_blank\" rel=\"noopener nofollow\" href=\"https:\/\/x.com\/polynoamial\/status\/1946509154752811136\">Brown points out<\/a> that the technology could eventually become a product, and with progress moving so quickly, future versions may be even more advanced. He adds that the results surprised even people inside OpenAI, calling it &#8220;a milestone that many considered years away.&#8221;<\/p>\n<p>Current models are far behind<\/p>\n<p>The timing of OpenAI&#8217;s announcement seems intentional, coming just after current AI models delivered disappointing results at the same competition.<\/p>\n<p>A recent evaluation by the <a target=\"_blank\" rel=\"noopener nofollow\" href=\"https:\/\/matharena.ai\/results\">MathArena.ai<\/a> platform tested several leading models-including Gemini 2.5 Pro, Grok-4, DeepSeek-R1, and even OpenAI&#8217;s own o3 and o4-mini-on the IMO 2025 tasks. None of them managed to score the 19 points needed for a bronze medal. Gemini 2.5 Pro came out on top, but with only 13 out of 42 points, while the others performed even worse.<\/p>\n<p><a href=\"https:\/\/www.newsbeep.com\/ca\/wp-content\/uploads\/2025\/07\/imo_results_2025.png\"><img loading=\"lazy\" data-lazyloaded=\"1\" decoding=\"async\" class=\"wp-image-25566 size-full\" src=\"https:\/\/www.newsbeep.com\/ca\/wp-content\/uploads\/2025\/07\/imo_results_2025.png\" alt=\"\" width=\"2261\" height=\"443\"\/><\/a>MathArena.ai&#8217;s chart shows major language models falling short on 2025 IMO problems. | Image: Screenshot via <a target=\"_blank\" rel=\"noopener nofollow\" href=\"http:\/\/matharena.ai\">Matharena.ai<\/a><\/p>\n<p>Even with extensive testing, which included a best-of-32 selection process and evaluations by IMO experts, the models showed serious flaws. The results were filled with logical errors, incomplete arguments, and even made-up theorems.<\/p>\n<p>Viewed in this context, OpenAI&#8217;s announcement looks like a direct response to the limitations exposed by the MathArena test. While the achievement is significant, its true value will depend on whether the results can be independently reproduced and applied to real scientific problems.<\/p>\n","protected":false},"excerpt":{"rendered":"Summary Added statements from OpenAI researcher Jerry Tworek Update as of July 20, 2025: OpenAI researcher Jerry Tworek&hellip;\n","protected":false},"author":2,"featured_media":10544,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[62,10414,10415,276,277,49,48,278,61],"class_list":{"0":"post-10543","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-ai-and-math","10":"tag-ai-research","11":"tag-artificial-intelligence","12":"tag-artificialintelligence","13":"tag-ca","14":"tag-canada","15":"tag-openai","16":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/posts\/10543","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/comments?post=10543"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/posts\/10543\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/media\/10544"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/media?parent=10543"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/categories?post=10543"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/tags?post=10543"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}