{"id":480709,"date":"2026-02-15T02:10:08","date_gmt":"2026-02-15T02:10:08","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/480709\/"},"modified":"2026-02-15T02:10:08","modified_gmt":"2026-02-15T02:10:08","slug":"first-proof-is-ais-toughest-math-test-yet-the-results-are-mixed","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/480709\/","title":{"rendered":"First Proof is AI&#8217;s toughest math test yet. The results are mixed"},"content":{"rendered":"<p class=\"article_pub_date-zPFpJ\">February 14, 2026<\/p>\n<p class=\"article_read_time-ZYXEi\">4 min read<\/p>\n<p><a href=\"https:\/\/www.google.com\/preferences\/source?q=scientificamerican.com\" target=\"_blank\" class=\"google_cta-CuF5m\" rel=\"nofollow noopener\"><img decoding=\"async\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAADgAAAA4CAYAAACohjseAAAACXBIWXMAACxLAAAsSwGlPZapAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAZpSURBVHgB1ZrfbxRVFMfPvTPTnwiLgVISErdqYuKPsJKWYPzRacHEBxKWgg+EmHZ91IcWnzSEdBtDfIT+Bd0+YK0\/wvJmVLbTaAzSGrYvBsNDV40UKdKpBftj597ruUN3Xdrd7vzYbnc\/STMzO\/dO5zvn3HPPPTMENoEZ\/VBQoSs6J0qIAHmKgMCtCAgggUwbPDbx2MTdFAeY0gCmGYPxJmMiCSWEQImY1Vt1ppAwBTiGh0HwyKrwuAB+Zc\/Vn+PgE98C73a29eBVunFXh9KTokCiaW1lfO\/XyRR4wLPAVWH94MNaLkgRQWK7E9cHwCWuBc68FQoqaW0INsdixUgxLd3hxpoUXPD34bZeNa3egK0RJ5EPd3q282C\/0w6OLDithwKNinYBd3ugQpCB6F+2EmkxkuZG7Ypa0BanamNQQeIkGG3D9bS2p1g7daOTq+NtDB9XECoMIchAc+Kni8XaFbSgtJxiaZcBKlPcnsT1qJO2BQVuU9QhtFwIKgw34iR5XXT2SGs\/XigM\/u\/GBEJkNpLEzOQ3zjQ7DVOABZgqAoog+wXAy\/gg2wkp7iluxUnWRdHVcTcNvhAGIzCw97tJw2mPu3pbSCjQhzfUnfeKHsRJ1gm8e7hNiguCB4SAJKfijBtha7ETdcXCRILo\/18YIk2JiRh44DGBq+nXEHgAVwSDzVcn+qBE\/HWkNYrpWb8fcZLHBN5\/5\/lp63ZjENzi8yYKIa2517iWAh9kBa58o\/UQCkNLPzTD4vfNji+wWeJKBc3Zs\/O7utfuQOPR34HuWCnaWQ78ShYnsS2Y\/lbVMZyP5Z7g8zWwcOlZe1uAWNPViQhUOLYF0RI9606gBXe89wvUts2u64TRMsWY4nptthXYAnHsHSvUoOHIn7bLkjqW8ysZ9jv4ywW13RMgsFGjmpfuw\/Z3f82OS85pDKoElQEJKQ4aSnFS5NLk7tj26B8p8IgencOHWReAsrBkqgoR7U4rF9JNG9647avSRTUFc1zmKZlwi+DqII5B4uppKoSNQ7VAIUgFcZF3Yq5JOsCEKgGLzvspcbNaJ9UjbpWAq6oaF2Qeqgt3AjEgzUGV4UqgqMD6TDFkkEk5bUyqUSAGGceBQ1pQjEGZJumSkKKcw5SbHkwo7VAloPFSKrqorHR1O+mwwFVILO3DTOTWFfAIPlATK2gGeAT76k7bciLmSb61YD5mWD28P\/8qzPAG01pkLclIvOxzov7xgk5p8XvNwsUA1d60DNzd8GZHF5+BblOX4uRhQK2jJSsuuQGDnKtaLRbCDHuaQLfJ63LSJS8+fBH\/XoAFoeX8J9IbGgqXPdgQSo656sCspC0QQ2ls7bmMS44uPp2va0BtUMqyIsjQef6fHnAxTWGAMYzoTtMWuNZNx5ebbZe8xXYUvoKAcOtnXWVxVf2TuSCawfFLTwmjEJPbbCbDhRiUW+mSHy4cfNwlC0IuHBw94SgC+4FyD2+5VtL2si4rsEa1LvbOv2IWcMmCcAGxzRTZef6BHApu33LF0D1TcicrUK7zri03eaqUSZGto12uXKgY0i07zj\/w9GaZczGc2V9Xq2gd6bqBUdLTe0Ehv4IQrCN5Kp4CH7R9fqJXM\/Vo3WzES6SOJc5uy9Zr8wg8iRO\/cD6Z5gGFxohgw5On4obTPnLaURqVbpy8+jJJPbV2QcPtj7D0t8vpZYCn0y0Z95TkrTYd+PRkH6XiAvhEWhQ3Bg6EJGFsiihgKuxRtGYKBBijISJEiFAqX4SGSJ7ypRRZez8M2sLrxf8hZi6Jc09Ec38qWE5rHTlxGc\/6f8tbIqTI2rnjBc\/Lhzl2dlvL2t8LLnitJRbBGn1Jv\/zzw\/KTcXiw7xxw7d66c1KcSKc78vXbsCAaGgkHFaKMVdJC13bZe6dBe3jAPhb214lWh3F2Z15jFK34SpEq0MteI+tmIV22Zi5sCi4ixrntBYvRRav2d766aTYdfW6UKLQe12KHoEKw6m+aaXWm48cPOo2N2rn62hCnkChOISWd0D2BscECftzJfOv6c8qtHpeCwyBbZlGnC27PH8QeGOnqIYT0l1GoYTF6Jnn6C1eR3bPADGUQaoAgA5OnvjTAA74FZmi79HaIU96HV2wvgVhDUIizh2zYb+2nZAJzsccpVULAuC7TMAwKAUFIMDcVE48W2CamalhlI0lup3NkylqykqUsaP0HRAp2kB9Rgc8AAAAASUVORK5CYII=\" alt=\"Google Logo\"\/> Add Us On GoogleAdd SciAm<\/a><\/p>\n<p>AI just got its toughest math test yet. The results are mixed<\/p>\n<p>Experts gave AI 10 math problems to solve in a week. OpenAI, researchers and amateurs all gave it their best shot<\/p>\n<p class=\"article_authors-ZdsD4\">By <a class=\"article_authors__link--hwBj\" href=\"https:\/\/www.scientificamerican.com\/author\/joseph-howlett\/\" rel=\"nofollow noopener\" target=\"_blank\">Joseph Howlett<\/a> edited by <a class=\"article_authors__link--hwBj\" href=\"https:\/\/www.scientificamerican.com\/author\/claire-cameron\/\" rel=\"nofollow noopener\" target=\"_blank\">Claire Cameron<\/a><\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2026\/02\/math-test.jpg\" alt=\"Black and white photo of a room full of teenage students bent over their desks taking an exam.\"   class=\"lead_image__img-xKODG\" style=\"--w:5870;--h:3900\" fetchpriority=\"high\"\/> <\/p>\n<p>Interim Archives \/ Contributor via Getty Images<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">The verdict, it seems, is in: artificial intelligence is not about to replace mathematicians.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">That is the immediate takeaway from <a href=\"https:\/\/www.scientificamerican.com\/article\/mathematicians-launch-first-proof-a-first-of-its-kind-math-exam-for-ai\/\" rel=\"nofollow noopener\" target=\"_blank\">the \u201cFirst Proof\u201d challenge<\/a>\u2014perhaps the most robust test yet of the ability of large language models (LLMs) to perform mathematical research. Set by 11 top mathematicians on February 5, the results of the test were released early in the morning on Valentine\u2019s Day. It\u2019s too soon to conclusively say how many of the 10 math problems that were included in the challenge were solved by AIs without human help. But one thing is clear: none of the LLMs came close to solving them all.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">The mathematicians behind First Proof presented the AIs 10 \u201clemmas\u201d\u2014a math term for minor theorems that pave the way to a larger result. These problems are the working mathematician\u2019s stock-in-trade, the kind of mini problem one might hand off to a talented graduate student. The mathematicians aimed for problems that would require some originality to solve, not just a mash-up of standard techniques, according to Mohammed Abouzaid, a math professor at Stanford University and a member of the First Proof team.<\/p>\n<p>On supporting science journalism<\/p>\n<p>If you&#8217;re enjoying this article, consider supporting our award-winning journalism by <a href=\"https:\/\/www.scientificamerican.com\/getsciam\/\" rel=\"nofollow noopener\" target=\"_blank\">subscribing<\/a>. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">The challenge, while highlighting AI\u2019s limitations, also spotlights a budding AI-enthusiast subculture within the mathematics community. Online discussion boards and social media accounts dedicated to math were swamped with purported proofs from top mathematicians and rogue undergraduates alike. And it underscored how seriously AI startups, including ChatGPT maker OpenAI, are taking the challenge of teaching an LLM to do math.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">\u201cWe did not expect there would be this much activity,\u201d Abouzaid says. \u201cWe did not expect that the AI companies would take it this seriously and put this much labor into it.\u201d<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">The First Proof team revealed the solutions to the 10 challenges early on Saturday, and <a href=\"https:\/\/codeberg.org\/tgkolda\/1stproof\/src\/branch\/main\/2026-02-batch\/FirstProofSolutionsComments.pdf\" rel=\"nofollow noopener\" target=\"_blank\">posted<\/a> about their own experiences trying to get LLMs to solve the problems. They found that AIs could spit out confident proofs to every problem, but only two were correct\u2014those for the ninth and 10th problems. And a proof that was nearly identical to the ninth problem turned out to already exist. The first problem was also \u201ccontaminated\u201d\u2014a sketch of a proof was archived from the website of its author, team member and 2014 Fields Medal winner Martin Hairer\u2014but the LLMs still failed to fill in the gaps.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">The style of proof that the LLMs came up with was particularly surprising, Abouzaid says. \u201cThe correct solutions that I\u2019ve seen out of AI systems, they have the flavor of 19th-century mathematics,\u201d he says. \u201cBut we\u2019re trying to build the mathematics of the 21st century.\u201d<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">Outside submissions didn\u2019t appear to fare much better. Some submissions appeared to employ varying degrees of human input, with several seemingly the result of week-long dialogues checked by mathematicians. Importantly, the <a href=\"https:\/\/1stproof.org\/faq.html\" rel=\"nofollow noopener\" target=\"_blank\">First Proof rules<\/a> disallow human mathematical input or prodding.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">\u201cOnce there\u2019s humans involved, how do we judge how much is human and how much is AI?&#8221; says Lauren Williams, Dwight Parker Robinson Professor of Mathematics at Harvard University and one of the mathematicians who set up First Proof.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">OpenAI posted its work on Saturday, the result of a week-long sprint using its newest in-house AI models working with \u201cexpert feedback\u201d from human mathematicians. The company\u2019s chief scientist Jakub Pachocki said in a <a href=\"https:\/\/x.com\/merettm\/status\/2022517085193277874\" rel=\"nofollow\">social media post<\/a> that they believe six of their ten solutions to \u201chave a high chance of being correct.\u201d Mathematicians have pointed to potential holes in at least one of those six already.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">Aside from how much human assistance the AIs had, the vast bulk of the submissions appear to be a lot of very convincing nonsense. Before the challenge had even ended, a number of purported solutions that initially appeared credible were already being questioned by experts.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">The submissions will take days for experts to properly vet. And judging whether a proof is truly \u201coriginal\u201d is even tougher than judging if it is correct. \u201cNothing in math is totally without precedent,\u201d says Daniel Litt, a mathematician at the University of Toronto, who was not part of the First Proof team.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">\u201cWe are thinking of this as an experiment. Our goal was to get feedback,\u201d Abouzaid says. The team writes that they\u2019re planning a second round with tighter controls, and that more more details will be released on March 14.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">For some mathematicians who\u2019ve been tracking AI\u2019s progress, the lukewarm results match their expectations. \u201cI expected maybe two to three unambiguously correct solutions from publicly available models,\u201d Litt says. \u201cTen would have been very surprising to me.\u201d<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">Still, even getting a few valid solutions to research-level problems from an AI would likely have been impossible just months ago. \u201cI already have heard from colleagues that they are in shock,\u201d says Scott Armstrong, a mathematician at Sorbonne University in France. \u201cThese tools are coming to change mathematics, and it&#8217;s happening now.&#8221;<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">But for others who closely track AI\u2019s achievements, this wasn\u2019t a great showing.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">\u201cThe models seem to have struggled,\u201d says Kevin Barreto, an undergraduate student at the University of Cambridge, who was not part of the First Proof team. He recently <a href=\"https:\/\/www.scientificamerican.com\/article\/ai-uncovers-solutions-to-erdos-problems-moving-closer-to-transforming-math\/\" rel=\"nofollow noopener\" target=\"_blank\">used AI to solve one of the Erd&amp;odblac;s problems<\/a>, a number of challenges posed by Hungarian mathematician Paul Erd&amp;odblac;s. \u201cTo be honest, yeah, I\u2019m somewhat disappointed.\u201d<\/p>\n<p>It\u2019s Time to Stand Up for Science<\/p>\n<p class=\"subscriptionPleaText--StZo\">If you enjoyed this article, I\u2019d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.<\/p>\n<p class=\"subscriptionPleaText--StZo\">I\u2019ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.<\/p>\n<p class=\"subscriptionPleaText--StZo\">If you <a class=\"subscriptionPleaLink-FiqVM subscriptionPleaBoldFont-nQHHb\" href=\"https:\/\/www.scientificamerican.com\/getsciam\/\" rel=\"nofollow noopener\" target=\"_blank\">subscribe to Scientific American<\/a>, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.<\/p>\n<p class=\"subscriptionPleaText--StZo\">In return, you get essential news, <a class=\"subscriptionPleaLink-FiqVM subscriptionPleaBoldFont-nQHHb\" href=\"https:\/\/www.scientificamerican.com\/podcasts\/\" rel=\"nofollow noopener\" target=\"_blank\">captivating podcasts<\/a>, brilliant infographics, <a class=\"subscriptionPleaLink-FiqVM subscriptionPleaBoldFont-nQHHb\" href=\"https:\/\/www.scientificamerican.com\/newsletters\/\" rel=\"nofollow noopener\" target=\"_blank\">can&#8217;t-miss newsletters<\/a>, must-watch videos, <a class=\"subscriptionPleaLink-FiqVM subscriptionPleaBoldFont-nQHHb\" href=\"https:\/\/www.scientificamerican.com\/games\/\" rel=\"nofollow noopener\" target=\"_blank\">challenging games<\/a>, and the science world&#8217;s best writing and reporting. You can even <a class=\"subscriptionPleaLink-FiqVM subscriptionPleaBoldFont-nQHHb\" href=\"https:\/\/www.scientificamerican.com\/getsciam\/gift\/\" rel=\"nofollow noopener\" target=\"_blank\">gift someone a subscription<\/a>.<\/p>\n<p class=\"subscriptionPleaText--StZo\">There has never been a more important time for us to stand up and show why science matters. I hope you\u2019ll support us in that mission.<\/p>\n","protected":false},"excerpt":{"rendered":"February 14, 2026 4 min read Add Us On GoogleAdd SciAm AI just got its toughest math test&hellip;\n","protected":false},"author":2,"featured_media":480710,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[256,254,255,64,63,105],"class_list":{"0":"post-480709","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-au","12":"tag-australia","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/480709","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=480709"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/480709\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/480710"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=480709"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=480709"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=480709"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}