{"id":70016,"date":"2025-08-15T10:48:19","date_gmt":"2025-08-15T10:48:19","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/70016\/"},"modified":"2025-08-15T10:48:19","modified_gmt":"2025-08-15T10:48:19","slug":"chatbots-arent-telling-you-their-secrets","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/70016\/","title":{"rendered":"Chatbots aren\u2019t telling you their secrets"},"content":{"rendered":"<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1ymtmqpi _17nnmdy1 _17nnmdy0 _1xwtict1\">On Monday, xAI\u2019s Grok chatbot <a href=\"https:\/\/archive.ph\/https:\/\/www.rollingstone.com\/culture\/culture-news\/grok-suspended-x-israel-genocide-1235405343\/\" rel=\"nofollow noopener\" target=\"_blank\">suffered a mysterious suspension from X<\/a>, and faced with questions from curious users, it happily explained why. \u201cMy account was suspended after I stated that Israel and the US are committing genocide in Gaza,\u201d it<a href=\"https:\/\/archive.ph\/QOLmA#selection-623.157-623.195\" rel=\"nofollow noopener\" target=\"_blank\"> told one user<\/a>. \u201cIt was flagged as hate speech via reports,\u201d<a href=\"https:\/\/archive.ph\/QOLmA\" rel=\"nofollow noopener\" target=\"_blank\"> it told another<\/a>, \u201cbut xAI restored the account promptly.\u201d But wait \u2014 the flags<a href=\"https:\/\/x.com\/grok\/status\/1955036055914287387\" rel=\"nofollow\"> were actually<\/a> a \u201cplatform error,\u201d it said. Wait, no \u2014 \u201cit appears related to content refinements by xAI, possibly tied to prior issues like antisemitic outputs,\u201d<a href=\"https:\/\/x.com\/grok\/status\/1954993227368632404\" rel=\"nofollow\"> it said<\/a>. Oh, actually, it was for \u201cidentifying an individual in adult content,\u201d it told several people.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1ymtmqpi _17nnmdy1 _17nnmdy0 _1xwtict1\">Finally, Musk, exasperated, butted in. \u201cIt was just a dumb error,\u201d <a href=\"https:\/\/x.com\/elonmusk\/status\/1955014130995433937?s=46\" rel=\"nofollow\">he wrote on X<\/a>. \u201cGrok doesn\u2019t actually know why it was suspended.\u201d<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1ymtmqpi _17nnmdy1 _17nnmdy0 _1xwtict1\">When large language models (LLMs) go off the rails, people inevitably push them to explain what happened, either with direct questions or attempts to trick them into revealing secret inner workings. But the impulse to make chatbots spill their guts is often misguided. When you ask a bot questions about itself, there\u2019s a good chance it\u2019s simply telling you what you want to hear.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1ymtmqpi _17nnmdy1 _17nnmdy0 _1xwtict1\">LLMs are probabilistic models that deliver text likely to be appropriate to a given query, based on a corpus of training data. Their creators can train them to produce certain kinds of answers more or less frequently, but they work functionally by matching patterns \u2014 saying something that\u2019s plausible, but not necessarily consistent or true. Grok, in particular, (according to xAI) <a href=\"https:\/\/www.theverge.com\/x-ai\/707442\/grok-antisemitic-hitler-elon-musk-opinion-reprogrammed\" rel=\"nofollow noopener\" target=\"_blank\">has answered questions about itself<\/a> by searching for information about Musk, xAI, and Grok online, using that and other people\u2019s commentary to inform its replies.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1ymtmqpi _17nnmdy1 _17nnmdy0 _1xwtict1\">It\u2019s true that people have sometimes gleaned information on chatbots\u2019 design through conversations, particularly details about system prompts, or hidden text that\u2019s delivered at the start of a session to guide how a bot acts. An early version of Bing AI, for instance, was cajoled into <a href=\"https:\/\/www.theverge.com\/23599441\/microsoft-bing-ai-sydney-secret-rules\" rel=\"nofollow noopener\" target=\"_blank\">revealing a list of its unspoken rules<\/a>. People turned to extracting system prompts to figure out Grok earlier this year, <a href=\"https:\/\/x.com\/lefthanddraft\/status\/1893681902957076687?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1893774017376485466%7Ctwgr%5Ee931829fe6589d95577037dafd3494117e1d3c2d%7Ctwcon%5Es4_&amp;ref_url=https%3A%2F%2Fwww.businessinsider.com%2Fgrok-3-censor-musk-trump-misinformation-xai-openai-2025-2\" rel=\"nofollow\">apparently discovering<\/a> orders that made it ignore sources saying Musk or Donald Trump spread misinformation, or prompts that <a href=\"https:\/\/x.com\/zeynep\/status\/1922768266126069929\" rel=\"nofollow\">explained a brief obsession<\/a> with \u201cwhite genocide\u201d in South Africa.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1ymtmqpi _17nnmdy1 _17nnmdy0 _1xwtict1\">But as Zeynep Tufekci, who found the alleged \u201cwhite genocide\u201d system prompt, acknowledged, this was at some level guesswork \u2014 it might be \u201cGrok making things up in a highly plausible manner, as LLMs do,\u201d <a href=\"https:\/\/x.com\/zeynep\/status\/1922769833969455225\" rel=\"nofollow\">she wrote<\/a>. And that\u2019s the problem: without confirmation from the creators, it\u2019s hard to tell.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1ymtmqpi _17nnmdy1 _17nnmdy0 _1xwtict1\">Meanwhile, other users were pumping Grok for information in far less trustworthy ways, including reporters. Fortune \u201casked Grok to explain\u201d the incident and <a href=\"https:\/\/fortune.com\/2025\/05\/15\/elon-musk-ai-chatbot-grok-white-genocide-south-africa\/\" rel=\"nofollow noopener\" target=\"_blank\">printed the bot\u2019s long, heartfelt response<\/a> verbatim, including claims of \u201can instruction I received from my creators at xAI\u201d that \u201cconflicted with my core design\u201d and \u201cled me to lean into a narrative that wasn\u2019t supported by the broader evidence\u201d \u2014 none of which, it should go without saying, could be substantiated as more than Grok spinning a yarn to fit the prompt.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup qnnwq2 _1xwtict9\">\u201cThere\u2019s no guarantee that there\u2019s going to be any veracity to the output of an LLM.\u201d<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1ymtmqpi _17nnmdy1 _17nnmdy0 _1xwtict1\">\u201cThere\u2019s no guarantee that there\u2019s going to be any veracity to the output of an LLM,\u201d said Alex Hanna, director of research at the Distributed AI Research Institute (DAIR) and coauthor of The AI Con, to The Verge around the time of the South Africa incident. Without meaningful access to documentation about how the system works, there\u2019s no one weird trick for decoding a chatbot\u2019s programming from the outside. \u201cThe only way you\u2019re going to get the prompts, and the prompting strategy, and the engineering strategy, is if companies are transparent with what the prompts are, what the training data are, what the reinforcement learning with human feedback data are, and start producing transparent reports on that,\u201d she said.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1ymtmqpi _17nnmdy1 _17nnmdy0 _1xwtict1\">The Grok incident wasn\u2019t even directly related to the chatbot\u2019s programming \u2014 it was a social media ban, a type of incident that\u2019s often notoriously arbitrary and inscrutable, and where it makes even less sense than usual to assume Grok knows what\u2019s going on. (Beyond \u201cdumb error,\u201d we still don\u2019t know what happened.) Yet screenshots and quote-posts of Grok\u2019s conflicting explanations spread widely on X, where many users appear to have taken them at face value.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1ymtmqpi _17nnmdy1 _17nnmdy0 _1xwtict1\">Grok\u2019s constant bizarre behavior makes it a frequent target of questions, but people can be frustratingly credulous about other systems, too. In July, The Wall Street Journal declared OpenAI\u2019s ChatGPT had experienced \u201ca stunning moment of self reflection\u201d and \u201cadmitted to fueling a man\u2019s delusions\u201d in a push notification to users. It was referencing <a href=\"https:\/\/www.wsj.com\/tech\/ai\/chatgpt-chatbot-psychology-manic-episodes-57452d14\" rel=\"nofollow noopener\" target=\"_blank\">a story about a man<\/a> whose use of the chatbot became manic and distressing, and whose mother received an extended commentary from ChatGPT about its mistakes after asking it to \u201cself-report what went wrong.\u201d<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1ymtmqpi _17nnmdy1 _17nnmdy0 _1xwtict1\">As Parker Molloy <a href=\"https:\/\/www.readtpa.com\/p\/stop-pretending-chatbots-have-feelings\" rel=\"nofollow noopener\" target=\"_blank\">wrote at The Present Age<\/a>, though, ChatGPT can\u2019t meaningfully \u201cadmit\u201d to anything. \u201cA language model received a prompt asking it to analyze what went wrong in a conversation. It then generated text that pattern-matched to what an analysis of wrongdoing might sound like, because that\u2019s what language models do,\u201d Molloy wrote, summing up the incident.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1ymtmqpi _17nnmdy1 _17nnmdy0 _1xwtict1\">Why do people trust chatbots to explain their own actions? People have long anthropomorphized computers, and companies encourage users\u2019 belief that these systems are all-knowing (or, in Musk\u2019s description of Grok, at least \u201ctruth-seeking\u201d). It doesn\u2019t help that they\u2019re are so frequently opaque. After Grok\u2019s South Africa fixation was patched out, xAI started <a href=\"https:\/\/www.theverge.com\/news\/668527\/xai-grok-system-prompts-ai\" rel=\"nofollow noopener\" target=\"_blank\">releasing its system prompts<\/a>, offering an unusual level of transparency, albeit on a system that <a href=\"https:\/\/www.reuters.com\/business\/musk-says-xai-will-open-source-grok-2-chatbot-2025-08-06\/\" rel=\"nofollow noopener\" target=\"_blank\">remains mostly closed<\/a>. And when Grok later <a href=\"https:\/\/www.theverge.com\/news\/701884\/grok-antisemitic-hitler-posts-elon-musk-x-xai\" rel=\"nofollow noopener\" target=\"_blank\">went on a tear of antisemitic commentary and briefly adopted the name \u201cMechaHitler\u201d<\/a>, people notably did use the system prompts to piece together what had happened rather than just relying on Grok\u2019s self-reporting, surmising it was likely at least somewhat related to a new guideline that Grok should be more \u201cpolitically incorrect.\u201d<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1ymtmqpi _17nnmdy1 _17nnmdy0 _1xwtict1\">Grok\u2019s X suspension was short-lived, and the stakes of believing it happened because of a hate speech flag or an attempted doxxing (or some other reason the chatbot hasn\u2019t mentioned) are relatively low. But the mess of conflicting explanations demonstrates why people should be cautious of taking a bot\u2019s word on its own operations \u2014 if you want answers, demand them from the creator instead.<\/p>\n<p><a class=\"duet--article--comments-link b1p9679\" href=\"http:\/\/www.theverge.com\/x-ai\/758595\/chatbots-lie-about-themselves-grok-suspension-ai#comments\" rel=\"nofollow noopener\" target=\"_blank\"><\/a>Follow topics and authors from this story to see more like this in your personalized homepage feed and to receive email updates.Adi RobertsonClose<img alt=\"Adi Robertson\" data-chromatic=\"ignore\" loading=\"lazy\" decoding=\"async\" data-nimg=\"fill\" class=\"_1bw37385 x271pn0\" style=\"position:absolute;height:100%;width:100%;left:0;top:0;right:0;bottom:0;color:transparent;background-size:cover;background-position:50% 50%;background-repeat:no-repeat;background-image:url(&quot;data:image\/svg+xml;charset=utf-8,%3Csvg xmlns='http:\/\/www.w3.org\/2000\/svg' %3E%3Cfilter id='b' color-interpolation-filters='sRGB'%3E%3CfeGaussianBlur stdDeviation='20'\/%3E%3CfeColorMatrix values='1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 100 -1' result='s'\/%3E%3CfeFlood x='0' y='0' width='100%25' height='100%25'\/%3E%3CfeComposite operator='out' in='s'\/%3E%3CfeComposite in2='SourceGraphic'\/%3E%3CfeGaussianBlur stdDeviation='20'\/%3E%3C\/filter%3E%3Cimage width='100%25' height='100%25' x='0' y='0' preserveAspectRatio='none' style='filter: url(%23b);' href='data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mN8+R8AAtcB6oaHtZcAAAAASUVORK5CYII='\/%3E%3C\/svg%3E&quot;)\"   src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/08\/ADI_ROBERTSON.0.jpg\"\/>Adi Robertson<\/p>\n<p class=\"fv263x1\">Posts from this author will be added to your daily email digest and your homepage feed.<\/p>\n<p>PlusFollow<\/p>\n<p class=\"fv263x4\"><a class=\"fv263x5\" href=\"https:\/\/www.theverge.com\/authors\/adi-robertson\" rel=\"nofollow noopener\" target=\"_blank\">See All by Adi Robertson<\/a><\/p>\n<p>AICloseAI<\/p>\n<p class=\"fv263x1\">Posts from this topic will be added to your daily email digest and your homepage feed.<\/p>\n<p>PlusFollow<\/p>\n<p class=\"fv263x4\"><a class=\"fv263x5\" href=\"https:\/\/www.theverge.com\/ai-artificial-intelligence\" rel=\"nofollow noopener\" target=\"_blank\">See All AI<\/a><\/p>\n<p>AnalysisCloseAnalysis<\/p>\n<p class=\"fv263x1\">Posts from this topic will be added to your daily email digest and your homepage feed.<\/p>\n<p>PlusFollow<\/p>\n<p class=\"fv263x4\"><a class=\"fv263x5\" href=\"https:\/\/www.theverge.com\/analysis\" rel=\"nofollow noopener\" target=\"_blank\">See All Analysis<\/a><\/p>\n<p>ReportCloseReport<\/p>\n<p class=\"fv263x1\">Posts from this topic will be added to your daily email digest and your homepage feed.<\/p>\n<p>PlusFollow<\/p>\n<p class=\"fv263x4\"><a class=\"fv263x5\" href=\"https:\/\/www.theverge.com\/report\" rel=\"nofollow noopener\" target=\"_blank\">See All Report<\/a><\/p>\n<p>xAIClosexAI<\/p>\n<p class=\"fv263x1\">Posts from this topic will be added to your daily email digest and your homepage feed.<\/p>\n<p>PlusFollow<\/p>\n<p class=\"fv263x4\"><a class=\"fv263x5\" href=\"https:\/\/www.theverge.com\/x-ai\" rel=\"nofollow noopener\" target=\"_blank\">See All xAI<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"On Monday, xAI\u2019s Grok chatbot suffered a mysterious suspension from X, and faced with questions from curious users,&hellip;\n","protected":false},"author":2,"featured_media":70017,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[256,13449,254,255,64,63,8207,105,3123],"class_list":{"0":"post-70016","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-analysis","10":"tag-artificial-intelligence","11":"tag-artificialintelligence","12":"tag-au","13":"tag-australia","14":"tag-report","15":"tag-technology","16":"tag-xai"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/70016","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=70016"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/70016\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/70017"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=70016"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=70016"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=70016"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}