{"id":354049,"date":"2026-03-29T18:54:09","date_gmt":"2026-03-29T18:54:09","guid":{"rendered":"https:\/\/www.newsbeep.com\/nz\/354049\/"},"modified":"2026-03-29T18:54:09","modified_gmt":"2026-03-29T18:54:09","slug":"the-mirage-of-visual-understanding-in-current-frontier-models","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/nz\/354049\/","title":{"rendered":"The mirage of visual understanding in current frontier models"},"content":{"rendered":"<p>From a damning new Stanford paper on the illusion of visual understanding in LLMs:<\/p>\n<p>\u201cFrontier models readily generate detailed image descriptions and elaborate reasoning traces, including pathology-biased clinical findings, for images never provided, we term this phenomenon mirage reasoning. Second, without any image input, models also attain strikingly high scores across general and medical multimodal benchmarks, bringing into question their utility and design. In the most extreme case, our model achieved the top rank on a standard chest X-ray question-answering benchmark without access to any images. &#8220;<\/p>\n<p><a target=\"_blank\" href=\"https:\/\/substackcdn.com\/image\/fetch\/$s_!YYJR!,f_auto,q_auto:good,fl_progressive:steep\/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5882bdd-5ea8-4cea-9ba5-8635c7667bd6_1980x923.png\" data-component-name=\"Image2ToDOM\" rel=\"nofollow noopener\" class=\"image-link image2 is-viewable-img can-restack\"><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/nz\/wp-content\/uploads\/2026\/03\/https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/c5882bdd-5ea8-4cea-9ba5-8635c7667bd6_1980.jpeg\" width=\"1456\" height=\"679\" data-attrs=\"{&quot;src&quot;:&quot;https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/c5882bdd-5ea8-4cea-9ba5-8635c7667bd6_1980x923.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:679,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:281169,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image\/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https:\/\/garymarcus.substack.com\/i\/192508854?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5882bdd-5ea8-4cea-9ba5-8635c7667bd6_1980x923.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" alt=\"\"   fetchpriority=\"high\" class=\"sizing-normal\"\/><\/a><\/p>\n<p>AGI this stuff ain\u2019t.  <\/p>\n<p>This study reinforces what <a href=\"https:\/\/anhnguyen.me\/\" rel=\"nofollow noopener\" target=\"_blank\">Anh Totti Nguyen<\/a> has been saying for a long time, in a series of underappreciated papers like <a href=\"https:\/\/anhnguyen.me\/2024\/vlms-are-blind\/.%20\" rel=\"nofollow noopener\" target=\"_blank\">Vision Language Models are Blind<\/a> that I keep trying to draw attention to. <\/p>\n<p>Also, re the very active discussion on AI and jobs: although some white collar jobs (e.g., entry-level coder or market research assistant) may be in near-term jeopardy, many of those that require visual understanding (architect, cartographer, civil engineer, film editor, medical illustrator, urban planner, etc) probably aren\u2019t vulnerable until entirely new techniques are developed. <\/p>\n<p>And humanoid home robots? Don\u2019t make me laugh. If your humanoid robot can\u2019t understand the visual world, it\u2019s just a demo, and not something you can trust. <\/p>\n","protected":false},"excerpt":{"rendered":"From a damning new Stanford paper on the illusion of visual understanding in LLMs: \u201cFrontier models readily generate&hellip;\n","protected":false},"author":2,"featured_media":354050,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[365,363,364,111,139,69,145],"class_list":{"0":"post-354049","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-new-zealand","12":"tag-newzealand","13":"tag-nz","14":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/posts\/354049","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/comments?post=354049"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/posts\/354049\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/media\/354050"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/media?parent=354049"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/categories?post=354049"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/tags?post=354049"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}