{"id":185989,"date":"2025-10-02T21:35:11","date_gmt":"2025-10-02T21:35:11","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/185989\/"},"modified":"2025-10-02T21:35:11","modified_gmt":"2025-10-02T21:35:11","slug":"can-todays-ai-video-models-accurately-model-how-the-real-world-works","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/185989\/","title":{"rendered":"Can today\u2019s AI video models accurately model how the real world works?"},"content":{"rendered":"<p>But on other tasks, the model showed much more variable results. When asked to generate a video highlighting a specific written character on a grid, for instance, the model failed in nine out of 12 trials. When asked to model a Bunsen burner turning on and burning a piece of paper, it similarly failed nine out of 12 times. When asked to solve a simple maze, it failed in 10 of 12 trials. When asked to sort numbers by popping labeled bubbles in order, it failed 11 out of 12 times.<\/p>\n<p>For the researchers, though, all of the above examples aren&#8217;t evidence of failure but instead a sign of the model&#8217;s capabilities. To be listed under the paper&#8217;s &#8220;failure cases,&#8221; Veo 3 had to fail a tested task across all 12 trials, which happened in 16 of the 62 tasks tested. For the rest, the researchers write that &#8220;a success rate greater than 0 suggests that the model possesses the ability to solve the task.&#8221;<\/p>\n<p>Thus, failing 11 out of 12 trails of a certain task is considered evidence for the model&#8217;s capabilities in the paper. That evidence of the model &#8220;possess[ing] the ability to solve the task&#8221; includes 18 tasks where the model failed in more than half of its 12 trial runs and another 14 where it failed in 25 to 50 percent of trials.<\/p>\n<p>Past results, future performance<\/p>\n<p>Yes, in all of these cases, the model technically demonstrates the capability being tested at some point. But the model&#8217;s inability to perform that task reliably means that, in practice, it won&#8217;t be performant enough for most use cases. Any future model that could become a &#8220;unified, generalist vision foundation models&#8221; will have to be able to succeed much more consistently on these kinds of tests.<\/p>\n","protected":false},"excerpt":{"rendered":"But on other tasks, the model showed much more variable results. When asked to generate a video highlighting&hellip;\n","protected":false},"author":2,"featured_media":185990,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[256,254,255,64,63,105],"class_list":{"0":"post-185989","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-au","12":"tag-australia","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/185989","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=185989"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/185989\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/185990"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=185989"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=185989"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=185989"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}