{"id":355031,"date":"2026-03-30T10:35:13","date_gmt":"2026-03-30T10:35:13","guid":{"rendered":"https:\/\/www.newsbeep.com\/nz\/355031\/"},"modified":"2026-03-30T10:35:13","modified_gmt":"2026-03-30T10:35:13","slug":"ai-is-a-year-away-from-knowing-more-than-all-human-experts-those-startled-experts-predict","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/nz\/355031\/","title":{"rendered":"AI &#8216;is a year away from knowing more than all human experts&#8217;, those startled experts predict"},"content":{"rendered":"<p class=\"mol-para-with-font\"><a style=\"font-weight: bold;\" target=\"_self\" href=\"https:\/\/www.dailymail.co.uk\/sciencetech\/ai\/index.html\" id=\"mol-a83f87c0-2bca-11f1-a837-47ea099b8e69\" class=\"class\" rel=\"nofollow noopener\">AI<\/a>\u00a0will be ready to score full marks on one of the world&#8217;s most challenging knowledge tests branded Humanity&#8217;s Last Exam (HLE) in a matter of months, developers claim.<\/p>\n<p class=\"mol-para-with-font\">HLE was set up by tech bosses to see just how intelligent their systems are and consists of 2,500 meticulously chosen questions, spanning around a hundred topics from rocket science and mythology to physiology.<\/p>\n<p class=\"mol-para-with-font\">Each one requires at least PhD levels of understanding and to achieve a score even close to 100 per cent would earn someone the title of a &#8216;universal expert&#8217;.<\/p>\n<p class=\"mol-para-with-font\">Just two years ago, the much-vaunted <a style=\"font-weight: bold;\" target=\"_self\" href=\"https:\/\/www.dailymail.co.uk\/sciencetech\/chatgpt\/index.html\" id=\"mol-4251dce0-2bcc-11f1-a837-47ea099b8e69\" rel=\"nofollow noopener\">ChatGPT<\/a> system from <a style=\"font-weight: bold;\" target=\"_self\" href=\"https:\/\/www.dailymail.co.uk\/sciencetech\/openai\/index.html\" id=\"mol-425ba0e0-2bcc-11f1-a837-47ea099b8e69\" rel=\"nofollow noopener\">OpenAI<\/a> scored a measly 3 per cent on the exam with its rivals at <a style=\"font-weight: bold;\" target=\"_self\" href=\"https:\/\/www.dailymail.co.uk\/sciencetech\/google\/index.html\" id=\"mol-42544de0-2bcc-11f1-a837-47ea099b8e69\" rel=\"nofollow noopener\">Google<\/a>\u00a0and Anthropic not doing much better.<\/p>\n<p class=\"mol-para-with-font\">The test served to assuage fears over the growing dominance of AI, with researchers claiming it proved &#8216;a marked gap&#8217; remained between\u00a0large language models (LLMs) and the world&#8217;s finest academics.<\/p>\n<p class=\"mol-para-with-font\">But the seemingly impossible HLE may prove to be just another milestone in AI&#8217;s unstoppable rise.\u00a0<\/p>\n<p class=\"mol-para-with-font\">Google Gemini scored an impressive\u00a045.9 per cent on the exam last month having soared to a score of 18.8 per cent within months of its first attempt.<\/p>\n<p class=\"mol-para-with-font\">And full marks are on the horizon, according to Calvin Zhang, the research lead at Scale, the AI company behind HLE.<\/p>\n<p>   <img decoding=\"async\" id=\"i-2724f29e7c7e88a0\" src=\"https:\/\/www.newsbeep.com\/nz\/wp-content\/uploads\/2026\/03\/107515515-15689907-image-a-18_1774831081304.jpg\" height=\"423\" width=\"634\" alt=\"AI will be ready to score full marks on one of the world's most challenging knowledge tests branded Humanity's Last Exam (HLE) in a matter of months, developers claim (Stock Photo)\" class=\"blkBorder img-share\" style=\"max-width:100%\" loading=\"lazy\" \/>   <\/p>\n<p class=\"imageCaption\">AI will be ready to score full marks on one of the world&#8217;s most challenging knowledge tests branded Humanity&#8217;s Last Exam (HLE) in a matter of months, developers claim (Stock Photo)<\/p>\n<p class=\"mol-para-with-font\">&#8216;We wanted to create this close-ended academic benchmark, set to the frontier of expert humans, that only a handful of people on earth can really solve,&#8217; he said.<\/p>\n<p class=\"mol-para-with-font\">&#8216;We&#8217;ve seen over the past few years insane progress on these language models. It&#8217;s impressive, model builders have really done a great job at improving these reasoning models.&#8217;<\/p>\n<p class=\"mol-para-with-font\">Kate Olszewska, a product manager at Google DeepMind added: &#8216;If we truly cared about this as the only thing in life, I think we could get to it pretty quickly.&#8217;\u00a0<\/p>\n<p class=\"mol-para-with-font\">Anthropic &#8211; the company behind the Claude AI system &#8211; has achieved a score of 34.2 per cent in HLE and is improving its marks at a rapid pace.<\/p>\n<p class=\"mol-para-with-font\">AI returning a score of 100 per cent in the exam would be a significant development given the test is &#8216;designed to be the final closed-ended academic benchmark of its kind&#8217;, according to its authors.<\/p>\n<p class=\"mol-para-with-font\">It means that if the technology cracks the HLE, it will need to be tested on questions no human knows the answer to in future.<\/p>\n<p class=\"mol-para-with-font\">The test was created by researchers at Scale and the Center for AI Safety, a non-profit organisation, to examine both the AI&#8217;s breadth of knowledge and its depth of reasoning.<\/p>\n<p class=\"mol-para-with-font\">Experts from roughly 50 countries submitted 70,000 questions for consideration in response to a global appeal in September 2024 which offered a\u00a0$500,000 prize pot.<\/p>\n<p class=\"mol-para-with-font\">They had to require a short unambiguous answer and be difficult to find on the internet.<\/p>\n<p class=\"mol-para-with-font\">The list was whittled down to 13,000 after questions which any existing model could answer were removed from consideration.<\/p>\n<p class=\"mol-para-with-font\">Some of the 2,500 that were chosen have since been removed or edited following feedback from users.\u00a0<\/p>\n<p class=\"mol-para-with-font\">They require a wide-range of expertise &#8211; from knowledge of biology to proficiency in languages &#8211; and a large number of them have remained secret in a bid to stop systems\u00a0benefiting from answers being publicly discussed online.<\/p>\n<p class=\"mol-para-with-font\">Success in HLE would evoke memories of IBM&#8217;s supercomputer Deep Blue defeating world chess champion Garry Kasparov in a game in 1997, confounding most experts&#8217; predictions.<\/p>\n<p class=\"mol-para-with-font\">Since then, a string of major AI benchmarks have been cleared including the multi-disciplinary\u00a0Massive Multitask Language Understanding, released in 2020, which was canned after systems began finding it too easy, often scoring above 90 per cent.<\/p>\n<p class=\"mol-para-with-font\">As AI approaches the stage where it can master human-made tests, expanding beyond the existing limits of human knowledge has increasingly become the main focus of developers, Ms\u00a0Olszewska added.<\/p>\n<p class=\"mol-para-with-font\">But there will always be room for human specialism, according to Zhang, with physical fields such as surgery, as well as decision-based skills including judgment and creativity harder for AI to master.\u00a0<\/p>\n","protected":false},"excerpt":{"rendered":"AI\u00a0will be ready to score full marks on one of the world&#8217;s most challenging knowledge tests branded Humanity&#8217;s&hellip;\n","protected":false},"author":2,"featured_media":355032,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[365,363,364,557,111,139,69,2419,145],"class_list":{"0":"post-355031","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-dailymail","12":"tag-new-zealand","13":"tag-newzealand","14":"tag-nz","15":"tag-sciencetech","16":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/posts\/355031","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/comments?post=355031"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/posts\/355031\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/media\/355032"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/media?parent=355031"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/categories?post=355031"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/tags?post=355031"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}