{"id":457889,"date":"2026-02-06T16:11:15","date_gmt":"2026-02-06T16:11:15","guid":{"rendered":"https:\/\/www.newsbeep.com\/ca\/457889\/"},"modified":"2026-02-06T16:11:15","modified_gmt":"2026-02-06T16:11:15","slug":"this-is-the-most-misunderstood-graph-in-ai","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/ca\/457889\/","title":{"rendered":"This is the most misunderstood graph in AI"},"content":{"rendered":"\n<p>That was certainly the case for Claude Opus 4.5, the latest version of Anthropic\u2019s most powerful model, which was released in late November. In December, METR announced that Opus 4.5 appeared to be capable of independently completing a task that would have taken a human about five hours\u2014a vast improvement over what even the exponential trend would have predicted. One Anthropic safety researcher tweeted that he would change the direction of his research in light of those results; another employee at the company simply wrote, \u201cmom come pick me up i\u2019m scared.\u201d<\/p>\n<p>But the truth is more complicated than those dramatic responses would suggest. For one thing, METR\u2019s estimates of the abilities of specific models come with substantial error bars. As METR explicitly stated on X, Opus 4.5 might be able to regularly complete only tasks that take humans about two hours, or it might succeed on tasks that take humans as long as 20 hours. Given the uncertainties intrinsic to the method, it was impossible to know for sure.\u00a0<\/p>\n<p>\u201cThere are a bunch of ways that people are reading too much into the graph,\u201d says Sydney Von Arx, a member of METR\u2019s technical staff.<\/p>\n<p>More fundamentally, the METR plot does not measure AI abilities writ large, nor does it claim to. In order to build the graph, METR tests the models primarily on coding tasks, evaluating the difficulty of each by measuring or estimating how long it takes humans to complete it\u2014a metric that not everyone accepts. Claude Opus 4.5 might be able to complete certain tasks that take humans five hours, but that doesn\u2019t mean it\u2019s anywhere close to replacing a human worker.<\/p>\n<p>METR was founded to assess the risks posed by frontier AI systems. Though it is best known for the exponential trend plot, it has also worked with AI companies to evaluate their systems in greater detail and published several other independent research projects, including a <a href=\"https:\/\/metr.org\/blog\/2025-07-10-early-2025-ai-experienced-os-dev-study\/\" rel=\"nofollow noopener\" target=\"_blank\">widely covered July 2025 study<\/a> suggesting that AI coding assistants might actually be slowing software engineers down.\u00a0<\/p>\n<p>But the exponential plot has made METR\u2019s reputation, and the organization appears to have a complicated relationship with that graph\u2019s often breathless reception. In January, Thomas Kwa, one of the lead authors on the paper that introduced it, <a href=\"https:\/\/metr.org\/notes\/2026-01-22-time-horizon-limitations\/\" rel=\"nofollow noopener\" target=\"_blank\">wrote a blog post<\/a> responding to some criticisms and making clear its limitations, and METR is currently working on a more extensive FAQ document. But Kwa isn\u2019t optimistic that these efforts will meaningfully shift the discourse. \u201cI think the hype machine will basically, whatever we do, just strip out all the caveats,\u201d he says.<\/p>\n<p>Nevertheless, the METR team does think that the plot has something meaningful to say about the trajectory of AI progress. \u201cYou should absolutely not tie your life to this graph,\u201d says Von Arx. \u201cBut also,\u201d she adds, \u201cI bet that this trend is gonna hold.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"That was certainly the case for Claude Opus 4.5, the latest version of Anthropic\u2019s most powerful model, which&hellip;\n","protected":false},"author":2,"featured_media":457890,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[62,276,277,49,48,61],"class_list":{"0":"post-457889","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-ca","12":"tag-canada","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/posts\/457889","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/comments?post=457889"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/posts\/457889\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/media\/457890"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/media?parent=457889"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/categories?post=457889"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/tags?post=457889"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}