{"id":248180,"date":"2026-01-23T17:35:09","date_gmt":"2026-01-23T17:35:09","guid":{"rendered":"https:\/\/www.newsbeep.com\/nz\/248180\/"},"modified":"2026-01-23T17:35:09","modified_gmt":"2026-01-23T17:35:09","slug":"the-math-on-ai-agents-doesnt-add-up","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/nz\/248180\/","title":{"rendered":"The Math on AI Agents Doesn\u2019t Add Up"},"content":{"rendered":"<p>The big AI companies <a data-offer-url=\"https:\/\/www.barrons.com\/articles\/nvidia-stock-ceo-ai-agents-8c20ddfb?gaa_at=eafs&amp;gaa_n=AWEtsqfDizKiGn8xzzlKA9b7qXpuZ-R-D3fgS8o32Cyygdc11DXX00P-7Y7F7mAvq5w=&amp;gaa_ts=697269b4&amp;gaa_sig=4v8F8X3wJzHWnWkT_eyqpM1CRwRLgj41yIV411GwFiW7CoKo2UZlQTSDpZ190wxMRWaT18o1TIUxXEt6qgHzCQ==\" class=\"external-link\" data-event-click=\"{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https:\/\/www.barrons.com\/articles\/nvidia-stock-ceo-ai-agents-8c20ddfb?gaa_at=eafs&amp;gaa_n=AWEtsqfDizKiGn8xzzlKA9b7qXpuZ-R-D3fgS8o32Cyygdc11DXX00P-7Y7F7mAvq5w=&amp;gaa_ts=697269b4&amp;gaa_sig=4v8F8X3wJzHWnWkT_eyqpM1CRwRLgj41yIV411GwFiW7CoKo2UZlQTSDpZ190wxMRWaT18o1TIUxXEt6qgHzCQ==&quot;}\" href=\"https:\/\/www.barrons.com\/articles\/nvidia-stock-ceo-ai-agents-8c20ddfb?gaa_at=eafs&amp;gaa_n=AWEtsqfDizKiGn8xzzlKA9b7qXpuZ-R-D3fgS8o32Cyygdc11DXX00P-7Y7F7mAvq5w=&amp;gaa_ts=697269b4&amp;gaa_sig=4v8F8X3wJzHWnWkT_eyqpM1CRwRLgj41yIV411GwFiW7CoKo2UZlQTSDpZ190wxMRWaT18o1TIUxXEt6qgHzCQ==\" rel=\"nofollow noopener\" target=\"_blank\">promised us<\/a> that 2025 would be \u201cthe year of the AI agents.\u201d It turned out to be the year of talking about AI agents, and kicking the can for that transformational moment to 2026 or maybe later. But what if the answer to the question \u201cWhen will our lives be fully automated by generative AI robots that perform our tasks for us and basically run the world?\u201d is, like that <a href=\"https:\/\/www.newyorker.com\/cartoons\/bob-mankoff\/the-story-of-how-about-never\" rel=\"nofollow noopener\" target=\"_blank\">New Yorker cartoon<\/a>, \u201cHow about never?\u201d<\/p>\n<p class=\"paywall\">That was basically the message of a paper published without much fanfare some months ago, smack in the middle of the overhyped year of \u201cagentic AI.\u201d Entitled \u201c<a data-offer-url=\"https:\/\/arxiv.org\/pdf\/2507.07505\" class=\"external-link\" data-event-click=\"{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https:\/\/arxiv.org\/pdf\/2507.07505&quot;}\" href=\"https:\/\/arxiv.org\/pdf\/2507.07505\" rel=\"nofollow noopener\" target=\"_blank\">Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models,\u201d<\/a> it purports to mathematically show that \u201cLLMs are incapable of carrying out computational and agentic tasks beyond a certain complexity.\u201d Though the science is beyond me, the authors\u2014a former SAP CTO who studied AI under one of the field\u2019s founding intellects, John McCarthy, and his teenage prodigy son\u2014punctured the vision of agentic paradise with the certainty of mathematics. Even reasoning models that go beyond the pure word-prediction process of LLMs, they say, won\u2019t fix the problem.<\/p>\n<p class=\"paywall\">\u201cThere is no way they can be reliable,\u201d Vishal Sikka, the dad, tells me. After a career that, in addition to SAP, included a stint as Infosys CEO and an Oracle board member, he currently heads an AI services startup called <a data-offer-url=\"https:\/\/www.vian.ai\/\" class=\"external-link\" data-event-click=\"{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https:\/\/www.vian.ai\/&quot;}\" href=\"https:\/\/www.vian.ai\/\" rel=\"nofollow noopener\" target=\"_blank\">Vianai<\/a>. \u201cSo we should forget about AI agents running nuclear power plants?\u201d I ask. \u201cExactly,\u201d he says. Maybe you can get it to file some papers or something to save time, but you might have to resign yourself to some mistakes.<\/p>\n<p class=\"paywall\">The AI industry begs to differ. For one thing, a big success in agent AI has been coding, which took off last year. Just this week at Davos, Google\u2019s Nobel-winning head of AI, Demis Hassabis, <a data-offer-url=\"https:\/\/fortune.com\/2026\/01\/21\/ceos-davos-buy-into-the-agentic-ai-hype\/\" class=\"external-link\" data-event-click=\"{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https:\/\/fortune.com\/2026\/01\/21\/ceos-davos-buy-into-the-agentic-ai-hype\/&quot;}\" href=\"https:\/\/fortune.com\/2026\/01\/21\/ceos-davos-buy-into-the-agentic-ai-hype\/\" rel=\"nofollow noopener\" target=\"_blank\">reported breakthroughs<\/a> in minimizing hallucinations, and hyperscalers and startups alike are pushing the agent narrative. Now they have some backup. A startup called <a data-offer-url=\"https:\/\/harmonic.fun\/about\" class=\"external-link\" data-event-click=\"{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https:\/\/harmonic.fun\/about&quot;}\" href=\"https:\/\/harmonic.fun\/about\" rel=\"nofollow noopener\" target=\"_blank\">Harmonic<\/a> is reporting a breakthrough in AI coding that also hinges on mathematics\u2014and tops benchmarks on reliability.<\/p>\n<p class=\"paywall\">Harmonic, which was cofounded by Robinhood CEO Vlad Tenev and Tudor Achim, a Stanford-trained mathematician, claims this recent improvement to its product called Aristotle (no hubris there!) is an indication that there are ways to guarantee the trustworthiness of AI systems. \u201cAre we doomed to be in a world where AI just generates slop and humans can&#8217;t really check it? That would be a crazy world,\u201d says Achim. Harmonic\u2019s solution is to use formal methods of mathematical reasoning to verify an LLM\u2019s output. Specifically, it encodes outputs in the Lean programming language, which is known for its ability to verify the coding. To be sure, Harmonic\u2019s focus to date has been narrow\u2014its key mission is the pursuit of \u201cmathematical superintelligence,\u201d and coding is a somewhat organic extension. Things like history essays\u2014which can\u2019t be mathematically verified\u2014are beyond its boundaries. For now.<\/p>\n<p class=\"paywall\">Nonetheless, Achim doesn\u2019t seem to think that reliable agentic behavior is as much an issue as some critics believe. \u201cI would say that most models at this point have the level of pure intelligence required to reason through booking a travel itinerary,\u201d he says.<\/p>\n<p class=\"paywall\">Both sides are right\u2014or maybe even on the same side. On one hand, everyone agrees that hallucinations will continue to be a vexing reality. In <a data-offer-url=\"https:\/\/arxiv.org\/pdf\/2509.04664\" class=\"external-link\" data-event-click=\"{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https:\/\/arxiv.org\/pdf\/2509.04664&quot;}\" href=\"https:\/\/arxiv.org\/pdf\/2509.04664\" rel=\"nofollow noopener\" target=\"_blank\">a paper published last September,<\/a> OpenAI scientists wrote, \u201cDespite significant progress, hallucinations continue to plague the field, and are still present in the latest models.\u201d They proved that unhappy claim by asking three models, including ChatGPT, to provide the title of the lead author\u2019s dissertation. All three made up fake titles and all misreported the year of publication. In a blog about the paper, OpenAI glumly stated that in AI models, \u201caccuracy will never reach 100 percent.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"The big AI companies promised us that 2025 would be \u201cthe year of the AI agents.\u201d It turned&hellip;\n","protected":false},"author":2,"featured_media":248181,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[365,363,364,19962,22781,5289,111,139,69,1518,7718,145],"class_list":{"0":"post-248180","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-backchannel-nl","12":"tag-math","13":"tag-models","14":"tag-new-zealand","15":"tag-newzealand","16":"tag-nz","17":"tag-research","18":"tag-silicon-valley","19":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/posts\/248180","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/comments?post=248180"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/posts\/248180\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/media\/248181"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/media?parent=248180"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/categories?post=248180"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/tags?post=248180"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}