{"id":176534,"date":"2026-02-13T13:31:06","date_gmt":"2026-02-13T13:31:06","guid":{"rendered":"https:\/\/www.newsbeep.com\/us-ca\/176534\/"},"modified":"2026-02-13T13:31:06","modified_gmt":"2026-02-13T13:31:06","slug":"uc-berkeley-math-professor-joins-international-effort-to-push-ai-to-its-limits-research-and-ideas","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/us-ca\/176534\/","title":{"rendered":"UC Berkeley math professor joins international effort to push AI to its limits | Research And Ideas"},"content":{"rendered":"<p dir=\"ltr\">In December 2025,a group of researchers from around the world, including UC Berkeley math professor Nikhil Srivastava, gathered inside the Simons Institute for the Theory of Computing at UC Berkeley.<\/p>\n<p dir=\"ltr\">Srivastava, an associate professor of mathematics at UC Berkeley and senior scientist at the Simons Institute for the Theory of Computing,met with researchers on a mission to create a new way of assessing the mathematical capabilities of AI.\u00a0<\/p>\n<p dir=\"ltr\">Led by Stanford University math professor Mohammed Abouzaid and University of Texas at Austin math professor Rachel Ward, the team of 11 embarked on a project called First Proof. A <a href=\"https:\/\/arxiv.org\/pdf\/2602.05192\" rel=\"nofollow noopener\" target=\"_blank\">preprint<\/a> published by the team describes First Proof as a way to assess AI models\u2019 abilities to solve research-level math questions.\u00a0<\/p>\n<p dir=\"ltr\">\u201cIt was a way for \u2026 mathematicians (to get) the narrative back on AI solving math problems or proving mathematical theorems,\u201d said Fields medalist Martin Hairer, a math professor at Imperial College London and EPFL, the Swiss Federal Institute of Technology in Lausanne. The Fields Medal is often considered one of the \u201cNobels of math.\u201d\u201cIt was about trying to test how these things perform on things that mathematicians actually care about.\u201d<\/p>\n<p dir=\"ltr\">Unlike benchmarks that evaluate AI models based on their success at solving Olympiad math problems, the authors presented 10 questions that \u201carose naturally in the research process of the authors.\u201d\u00a0<\/p>\n<p dir=\"ltr\">\u201cAIs have basically solved these olympiad problems,\u201d said UCLA math professor and Fields medalist Terence Tao, who is not affiliated with First Proof. \u201cThey don\u2019t have 100% success right now, but they have gold medal-level performance. But it is not really an apples-to-apples comparison because the memory that an AI model can store is millions of times larger than a high school student. \u2026 So the next level after that is research-level problems.\u201d<\/p>\n<p dir=\"ltr\">The problems, each written and proven by the researchers, are not substantial enough to constitute a paper on their own. However, the solutions are part of larger pieces of unpublished research, according to preprint co-author and recipient of the MacArthur Foundation\u2019s Genius Grant Daniel Spielman, who is a sterling professor of computer science and a professor of statistics and data Science and of mathematics at Yale University.\u00a0<\/p>\n<p dir=\"ltr\">The solutions to these research questions are encrypted until 11:59 p.m. Pacific Time on Friday, according to First Proof\u2019s website. Because the solutions to these problems are not available to the public, it eliminates the large language models\u2019 ability to search the web for answers.\u00a0<\/p>\n<p dir=\"ltr\">Before the solutions are posted, the team encourages the broader math community to get involved by trying to solve the problems with AI themselves, inviting participants to post their findings on social media using the hashtag #1stProof.<\/p>\n<p dir=\"ltr\">\u201cI\u2019ve looked around on social media and looked at what people are doing, and it\u2019s amazing,\u201d Srivastava said. \u201cIt\u2019s just really nice to see the interest from people who are not experts in the area \u2014 they\u2019ve come up with very interesting ways of combining different AI systems to work on the problems. \u2026 And the best part of it is that a lot of people say they\u2019re learning math in this process.\u201d<\/p>\n<p dir=\"ltr\">According to Hairer, the team even received solutions generated by internal models at OpenAI, \u201cclearly of better quality\u201d than ChatGPT is currently capable of producing.\u00a0<\/p>\n<p dir=\"ltr\">The preprint states that the AI models used, GPT-5.2 Pro and Gemini 3 Deep Think, \u201cstruggle to answer\u201d many of the questions posed in a single attempt.The researchers did not interact with the AI systems at all during the process, but speculated that doing so would \u201ccoax the systems to produce better answers.\u201d<\/p>\n<p dir=\"ltr\">Spielman and Hairer both said they were not surprised that the models were unsuccessful. However, many of the researchers said they have tried using AI in their research already and expressed optimism about the future of AI.\u00a0<\/p>\n<p dir=\"ltr\">\u201cI\u2019m really looking forward to having some AI assistants help me in my research,\u201d Spielman said. \u201cI have this dream of telling it, \u2018check this idea, check that idea, see if this part works.\u2019\u201d<\/p>\n<p dir=\"ltr\">According to First Proof\u2019s website, the team will not be assessing the correctness of any solutions posted by the community, since the researchers do not consider the current problem set to be a solid benchmark.\u00a0<\/p>\n<p dir=\"ltr\">However, according to Genius Grant recipient and preprint co-author Lauren Williams, Dwight Parker Robinson professor of mathematics at Harvard University,the team will create a grading scheme to determine the accuracy of solutions for future problem sets.\u00a0<\/p>\n<p dir=\"ltr\">The team plans to create a second set of questions in the coming months, according to the preprint, and is open to forming agreements with companies that would like to test experimental AI models on the forthcoming problems.\u00a0<\/p>\n<p dir=\"ltr\">The authors published the preprint Feb. 6, a day before Euler Day on Feb. 7. The date is a nod to e, or Euler\u2019s number, in mathematics, which is approximately 2.7.\u00a0<\/p>\n<p dir=\"ltr\">First Proof\u2019s website explains the project\u2019s name as a reference to a \u201cfirst proof\u201d in baking, a bulk fermentation process \u201cin which one lets the entire batch of dough ferment as one mass, before dividing and shaping it into loaves,\u201d alluding to the team\u2019s intention to allow ideas to ferment in the community in order to produce a more structured AI benchmark.<\/p>\n<p dir=\"ltr\">Keeping with the baking theme, Williams said the team\u2019s tentative plan is to release details about the next batch of problems on Pi Day, March 14.\u00a0<\/p>\n<p dir=\"ltr\">\u201cThis is, I think, a cultural shift,\u201d Tao said. \u201c\u2029I think we will start seeing \u2026 people start assembling lists of problems and just seeing the solutions trickle in from AI. It\u2019s a new way of doing mathematics.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"In December 2025,a group of researchers from around the world, including UC Berkeley math professor Nikhil Srivastava, gathered&hellip;\n","protected":false},"author":2,"featured_media":176535,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[34],"tags":[2269,84024,84028,84023,84029,23230,84020,84026,23231,84025,32729,84022,84027,23232,84021,84019,84015,143,145,144,3408,84030,84017,84016,493,23228,149,1092,84018,41488],"class_list":{"0":"post-176534","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-oakland","8":"tag-chatgpt","9":"tag-daniel-spielman","10":"tag-dwight-parker-robinson-professor-of-mathematics","11":"tag-epfl","12":"tag-euler-day","13":"tag-fields-medal","14":"tag-first-proof","15":"tag-gemini-3-deep-think","16":"tag-genius-grant","17":"tag-gpt-5-2-pro","18":"tag-harvard-university","19":"tag-imperial-college-london","20":"tag-lauren-williams","21":"tag-macarthur-foundation","22":"tag-martin-hairer","23":"tag-mohammed-abouzaid","24":"tag-nikhil-srivastava","25":"tag-oakland","26":"tag-oakland-headlines","27":"tag-oakland-news","28":"tag-openai","29":"tag-pi-day","30":"tag-rachel-ward","31":"tag-simons-institute-for-the-theory-of-computing","32":"tag-stanford-university","33":"tag-terence-tao","34":"tag-uc-berkeley","35":"tag-ucla","36":"tag-university-of-texas-at-austin","37":"tag-yale-university"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/us-ca\/wp-json\/wp\/v2\/posts\/176534","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/us-ca\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/us-ca\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us-ca\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us-ca\/wp-json\/wp\/v2\/comments?post=176534"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/us-ca\/wp-json\/wp\/v2\/posts\/176534\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us-ca\/wp-json\/wp\/v2\/media\/176535"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/us-ca\/wp-json\/wp\/v2\/media?parent=176534"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/us-ca\/wp-json\/wp\/v2\/categories?post=176534"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/us-ca\/wp-json\/wp\/v2\/tags?post=176534"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}