{"id":473772,"date":"2026-02-11T22:39:24","date_gmt":"2026-02-11T22:39:24","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/473772\/"},"modified":"2026-02-11T22:39:24","modified_gmt":"2026-02-11T22:39:24","slug":"the-case-of-the-skyscraper-and-the-slide-trombone","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/473772\/","title":{"rendered":"the case of the skyscraper and the slide trombone"},"content":{"rendered":"<p>Artificial Intelligence (AI) is now part of our everyday life. It is perceived as \u201cintelligence\u201d and yet relies fundamentally on statistics. Its results are based on previously learned patterns in data. As soon as we move away from the subject matter it has learned, we\u2019re faced with the fact that there isn\u2019t much that is intelligent about it. A simple question, such as \u201cDraw me a skyscraper and a sliding trombone side-by-side so that I can appreciate their respective sizes\u201d will give you something like this (this image has been generated by Gemini):<\/p>\n<p>            <a href=\"https:\/\/images.theconversation.com\/files\/711003\/original\/file-20260106-66-6whzc.png?ixlib=rb-4.1.0&amp;q=45&amp;auto=format&amp;w=1000&amp;fit=clip\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" alt=\"In the AI-generated image, we can see that the skyscraper and the slide trombone are almost the same size\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2026\/02\/file-20260106-66-6whzc.png\" class=\"native-lazy\" loading=\"lazy\"  \/><\/a><\/p>\n<p>              AI-generated images, in response to the prompt: \u2018Draw me a skyscraper and a sliding trombone side-by-side so that I can appreciate their respective sizes\u2019 (left by ChatGPT, right by Gemini).<\/p>\n<p>This example was generated by Google\u2019s model, Gemini, but generative AI dates back to the launch of ChatGPT in November\u00a02022 and is in fact only three years old. The technology has changed the world and is unprecedented in its adoption rate. Currently, <a href=\"https:\/\/techcrunch.com\/2025\/10\/06\/sam-altman-says-chatgpt-has-hit-800m-weekly-active-users\/\" rel=\"nofollow noopener\" target=\"_blank\">800\u00a0million users<\/a> rely on ChatGPT every week to complete various tasks, according to OpenAI. Note that the <a href=\"https:\/\/futurism.com\/openai-use-cheating-homework\" rel=\"nofollow noopener\" target=\"_blank\">number of requests<\/a> tanks during the school holidays. Even though it\u2019s hard to get hold of precise figures, this goes to show how widespread AI usage has become. Around <a href=\"https:\/\/www.bestcolleges.com\/research\/most-college-students-have-used-ai-survey\/\" rel=\"nofollow noopener\" target=\"_blank\">one in two students<\/a> regularly uses AI.<\/p>\n<p>AI: essential technology or a gimmick?<\/p>\n<p>Three years is both long and short. It\u2019s long in a field where technology is constantly changing, and short when it comes to social impacts. And while we\u2019re only just starting to understand how to use AI, its place in society has yet to be defined \u2013 just as AI\u2019s image in popular culture has yet to be established. We\u2019re still wavering between extreme positions: AI is going to outsmart human beings or, on the contrary, it\u2019s merely a useless piece of shiny technology.<\/p>\n<p>Indeed, a new call to pause AI-related research has been issued amid fears of a superintelligent AI. Others promise the earth, with a recent piece calling on younger generations to <a href=\"https:\/\/www.lepoint.fr\/debats\/face-a-la-montee-de-l-ia-ce-livre-vous-invite-a-stopper-vos-etudes-16-10-2025-2601113_2.php\" rel=\"nofollow noopener\" target=\"_blank\">drop higher education altogether<\/a>, on the grounds that AI would obliterate university degrees.<\/p>\n<p>AI\u2019s learning limitations amount to a lack of common sense<\/p>\n<p>Ever since generative AI became available, I have been conducting an experiment consisting of asking it to draw two very different objects and then checking out the result. The goal behind these prompts of mine has been to see how the model behaves once it departs from its learning zone. Typically, this looks like a prompt such as \u2018Draw me a banana and an aircraft carrier side by side so that we can see the difference in size between the two objects\u2019. This prompt using Mistral gives the following result:<\/p>\n<p>            <img decoding=\"async\" alt=\"\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2026\/02\/file-20260107-56-tisu6e.jpg\" class=\"native-lazy\" loading=\"lazy\"  \/><\/p>\n<p>              Screenshot of a prompt and the image generated by Mistral AI.<br \/>\n              Author provided<\/p>\n<p>I have yet to find a model that produces a result that makes sense. The illustration at the start of the article perfectly captures how this type of AI works and its limitations. The fact that we are dealing with an image makes the system\u2019s limits more tangible than if it were to generate a long text.<\/p>\n<p>What is striking is the outcome\u2019s lack of credibility. Even a 5-year-old toddler would be able to tell that it\u2019s nonsense. It\u2019s all the more shocking that it\u2019s possible to have long complex conversations with the same AIs without the impression of dealing with a stupid machine. Incidentally, such AIs can pass the bar examination or interpret medical results (for example, identifying tumours on a scan) with greater precision than professionals.<\/p>\n<p>Where does the mistake lie?<\/p>\n<p>The first thing to note is that it\u2019s tricky to know exactly what\u2019s in front of us. Although AIs\u2019 theoretical components are well known, a project such as Gemini \u2013 much like models such as ChatGPT, Grok, Mistral, Claude, etc. \u2013 is a lot more complicated than a simple Machine Learning Lifecycle (MLL) coupled with a diffusion model.<\/p>\n<p>MML are AIs that have been trained on enormous amounts of text and generate a statistical representation of it. In short, the machine is trained to guess the word that will make the most sense from a statistical viewpoint, in response to other words (your prompt).<\/p>\n<p>Diffusion models that are used to generate images work according to a different process. The process of diffusion is based on notions from thermodynamics: you take an image (or a soundtrack) and you add random noise (snow on a screen) until the image disappears. You then teach a neuronal network to reverse that process by presenting these images in the opposite order to the noise addition. This random aspect explains why the same prompt generates different images.<\/p>\n<p>Another point to consider is that these prompts are constantly evolving, which explains why the same prompt will not produce the same results from one day to the next. Changes might be brought manually to singular cases in order to respond to user feedback, for example.<\/p>\n<p>As a physician, I will thus simplify the problem and consider we\u2019re dealing with a diffusion model. These models are trained on image-text pairs. It is therefore safe to assume that Gemini and Mistral models have been trained on dozens (or possibly hundreds) of thousands and images of skyscrapers (or aircraft carriers) on the one hand, and on a large mass of slide trombones on the other \u2013 typically, close-ups of slide trombones. It is very unlikely that these two objects are represented together in the learning material. Hence, the model doesn\u2019t have a clue about these objects\u2019 relative dimensions.<\/p>\n<p>Models lack \u2018understanding\u2019<\/p>\n<p>Such examples go to show how models have no internal representation or understanding of the world. The sentence \u2018to compare their sizes\u2019 proves that there is no understanding of what is written by machines. In fact, models have no internal representation of what \u201ccompare\u201d means other than the texts in which the term has been used. Thus, any comparison between concepts that do not feature in the learning material will produce the same kinds of results as the illustrations given in the examples above. It will be less visible but just as ridiculous. For example, <a href=\"https:\/\/www.arxiv.org\/abs\/2508.01191\" rel=\"nofollow noopener\" target=\"_blank\">this interaction with Gemini<\/a>: \u2018Consider this simple question: \u201cWas the day the United States was established a leap year or a normal year?&#8221;\u2018<\/p>\n<p>When consulted with the prefix CoT (Chain of Thought, a recent development in LLMs whose purpose is to break down a complex question into a series of simpler sub-questions), the modern Gemini language model responded: &#8220;The United States was established in 1776. 1776 is divisible by 4, but it is not a century year (100\u00a0years), so it is a leap year. Therefore, the day the United States was established was in a normal year. \u201d<\/p>\n<p>It is clear that the model applies the leap year rule correctly, thereby offering a good illustration of the CoT technique, but it draws the wrong conclusion in the final step. These models do not have a logical representation of the world, but only a statistical approach that constantly creates these types of glitches that may seem \u2018off the mark\u2019.<\/p>\n<p>This realisation is all the more beneficial given that today, <a href=\"https:\/\/graphite.io\/five-percent\/more-articles-are-now-created-by-ai-than-humans\" rel=\"nofollow noopener\" target=\"_blank\">AI writes almost as many articles published<\/a> on the Internet as humans. So don\u2019t be surprised if you find yourself surprised when reading certain articles.<\/p>\n<p>            <img decoding=\"async\" alt=\"\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2026\/02\/1770849564_356_file-20250305-56-uw659u.jpg\" class=\"native-lazy\" loading=\"lazy\"  \/><\/p>\n<p>A weekly e-mail in English featuring expertise from scholars and researchers. It provides an introduction to the diversity of research coming out of the continent and considers some of the key issues facing European countries. <a href=\"https:\/\/theconversation.com\/europe\/newsletters?promoted=europe-newsletter-116\" rel=\"nofollow noopener\" target=\"_blank\">Get the newsletter!<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"Artificial Intelligence (AI) is now part of our everyday life. It is perceived as \u201cintelligence\u201d and yet relies&hellip;\n","protected":false},"author":2,"featured_media":473773,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[256,254,255,64,63,105],"class_list":{"0":"post-473772","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-au","12":"tag-australia","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/473772","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=473772"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/473772\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/473773"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=473772"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=473772"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=473772"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}