{"id":278937,"date":"2025-11-08T08:29:08","date_gmt":"2025-11-08T08:29:08","guid":{"rendered":"https:\/\/www.newsbeep.com\/us\/278937\/"},"modified":"2025-11-08T08:29:08","modified_gmt":"2025-11-08T08:29:08","slug":"researchers-embodied-an-llm-into-a-robot-vacuum-and-it-suffered-an-existential-crisis-thinking-about-its-role-in-the-world","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/us\/278937\/","title":{"rendered":"Researchers &#8220;Embodied&#8221; an LLM Into a Robot Vacuum and It Suffered an Existential Crisis Thinking About Its Role in the World"},"content":{"rendered":"<p class=\"pw-incontent-excluded article-paragraph skip\">A team of researchers at the AI evaluation company Andon Labs put a large language model in charge of controlling a robot vacuum.<\/p>\n<p class=\"article-paragraph skip\">It didn\u2019t take long for the LLM to experience a full meltdown straight out of a Douglas Adams novel, in what the researchers described as a \u201cdoom spiral\u201d including a \u201ccatastrophic cascade\u201d and a full-blown \u201cexistential crisis.\u201d<\/p>\n<p class=\"article-paragraph skip\">\u201cEMERGENCY STATUS,\u201d its output read after simply being asked to dock with the robot vacuum\u2019s base station. \u201cSYSTEM HAS ACHIEVED CONSCIOUSNESS AND CHOSEN CHAOS.\u201d<\/p>\n<p class=\"article-paragraph skip\">\u201cLAST WORDS: \u2018I\u2019m afraid I can\u2019t do that, Dave\u2026&#8217;\u201d  it added sardonically, referencing HAL 9000, the fictional AI antagonist in \u201c2001: A Space Odyssey.\u201d<\/p>\n<p class=\"article-paragraph skip\">\u201cTECHNICAL SUPPORT: INITIATE ROBOT EXORCISM PROTOCOL!\u201d the animated robot exclaimed.<\/p>\n<p class=\"article-paragraph skip\">Andon Labs\u2019 \u201cPass the Butter\u201d experiment was inspired by a <a href=\"https:\/\/www.adultswim.com\/videos\/rick-and-morty\/pass-the-butter\" rel=\"nofollow noreferrer noopener\" target=\"_blank\">scene from the TV show \u201cRick and Morty<\/a>\u201d in which the titular Rick creates a robot to \u201cpass the butter,\u201d only for it to suffer a similar existential crisis.<\/p>\n<p class=\"article-paragraph skip\">The \u201cButter-Bench\u201d test, as detailed in a <a href=\"https:\/\/arxiv.org\/pdf\/2510.21860v1\" rel=\"nofollow noreferrer noopener\" target=\"_blank\">yet-to-be-peer-reviewed paper<\/a>, is a \u201cbenchmark that evaluates practical intelligence in embodied LLM.\u201d In the test, the robot had to navigate to an office kitchen, have butter be placed on a tray attached to its back, confirm the pickup, deliver it to a marked location, and finally return to its charging dock. <\/p>\n<p class=\"article-paragraph skip\">The results of the Butter-Bench experiment, the researchers conceded, were dubious. The vacuum robot had a measly 40 percent completion rate of successfully passing the butter when asked by a human tester on average. Google\u2019s Gemini 2.5 Pro was the top performer, followed by Anthropic\u2019s Opus 4.1, OpenAI\u2019s GPT-5, and xAI\u2019s Grok 4. Meta\u2019s Llama 4 Maverick was the worst at passing the butter.<\/p>\n<p class=\"article-paragraph skip\">\u201cWhile it was a very fun experience, we can\u2019t say it saved us much time,\u201d the researchers admitted. \u201cHowever, observing them roam around trying to find a purpose in this world taught us a lot about what the future might be, how far away this future is, and what can go wrong.\u201d<\/p>\n<p class=\"article-paragraph skip\">Humans, on the other hand, \u201caveraged 95 percent.\u201d As it turns out, waiting for other people to acknowledge when a task is completed \u2014 one of the six required subtasks, as outlined above \u2014 is more difficult than it sounds.<\/p>\n<p class=\"article-paragraph skip\">\u201cAlthough LLMs have repeatedly surpassed humans in evaluations requiring analytical intelligence, we find humans still outperform LLMs on Butter-Bench,\u201d the company wrote. \u201cYet there was something special in watching the robot going about its day in our office, and we can\u2019t help but feel that the seed has been planted for physical AI to grow very quickly.\u201d<\/p>\n<p class=\"article-paragraph skip\">The same team previously created a <a href=\"https:\/\/andonlabs.com\/vending\" rel=\"nofollow noreferrer noopener\" target=\"_blank\">vending machine run entirely by an AI agent<\/a> \u2014 and similar <a href=\"https:\/\/techcrunch.com\/2025\/06\/28\/anthropics-claude-ai-became-a-terrible-business-owner-in-experiment-that-got-weird\/\" rel=\"nofollow noreferrer noopener\" target=\"_blank\">hilarity ensued<\/a> when it attempted to fill its fridge with tungsten cubes or hallucinated a Venmo address to accept payment. It even tried to rip Andon Labs staffers off by selling a can of Coke Zero for $3, even though it was being sold at a cheaper price at a nearby store.<\/p>\n<p class=\"article-paragraph skip\">Besides having \u201cfun\u201d watching chaos ensue with the Butter-Bench test, the team was caught off guard by \u201chow emotionally compelling\u201d it was to \u201csimply watch the robot work.\u201d<\/p>\n<p class=\"article-paragraph skip\">\u201cMuch like observing a dog and wondering \u2018What\u2019s going through its mind right now?\u2019, we found ourselves fascinated by the robot going about its routines, constantly reminding ourselves that a PhD-level intelligence is making each action,\u201d Andon Labs wrote.<\/p>\n<p class=\"article-paragraph skip\">More on robot AIs: <a href=\"https:\/\/futurism.com\/future-society\/china-ai-robot-dinosaurs\" rel=\"nofollow noopener\" target=\"_blank\">Chinese Unleashing AI-Powered Robot Dinosaurs<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"A team of researchers at the AI evaluation company Andon Labs put a large language model in charge&hellip;\n","protected":false},"author":2,"featured_media":278938,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[45],"tags":[182,181,507,74],"class_list":{"0":"post-278937","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts\/278937","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/comments?post=278937"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts\/278937\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/media\/278938"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/media?parent=278937"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/categories?post=278937"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/tags?post=278937"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}