{"id":205639,"date":"2025-10-11T14:42:07","date_gmt":"2025-10-11T14:42:07","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/205639\/"},"modified":"2025-10-11T14:42:07","modified_gmt":"2025-10-11T14:42:07","slug":"your-llm-wont-stop-lying-any-time-soon","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/205639\/","title":{"rendered":"Your LLM Won\u2019t Stop Lying Any Time Soon"},"content":{"rendered":"<p>Researchers call it \u201challucination\u201d; you might more accurately refer to it as confabulation, hornswaggle, hogwash, or just plain BS. Anyone who has used an LLM has encountered it; some people seem to find it behind every prompt, while others dismiss it as an occasional annoyance, but nobody claims it doesn\u2019t happen. A <a href=\"https:\/\/arxiv.org\/pdf\/2509.04664\" target=\"_blank\" rel=\"nofollow noopener\">recent paper by researchers at OpenAI<\/a> (PDF) tries to drill down a bit deeper into just why that happens, and if anything can be done.<\/p>\n<p>Spoiler alert: not really. Not unless we completely re-think the way we\u2019re training these models, anyway. The analogy used in the conclusion is to an undergraduate in an exam room. Every right answer is going to get a point, but wrong answers aren\u2019t penalized\u2013 so why the heck not guess? You might not pass an exam that way going in blind, but if you have studied (i.e., sucked up the entire internet without permission for training data) then you might get a few extra points. For an LLM\u2019s training, like a student\u2019s final grade, every point scored on the exam is a good point.<\/p>\n<p>The problem is that if you reward \u201cI don\u2019t know\u201d in training, you may eventually produce a degenerate model that responds to every prompt with \u201cIDK\u201d. Technically, that\u2019s true\u2013 the model is a stochastic mechanism; it doesn\u2019t \u201cknow\u201d anything. It\u2019s also completely useless. Unlike some other studies, however, the authors do not conclude that so-called hallucinations are an inevitable result of the stochastic nature of LLMs.<\/p>\n<p>While that may be true, they point out it\u2019s only the case for \u201cbase models\u201d\u2013 pure LLMs. If you wrap the LLM with a \u201cdumb\u201d program able to parse information into a calculator, for example, suddenly the blasted thing can pretend to count. (That\u2019s how undergrads do it these days, too.) You can also provide the LLM with a cheat-sheet of facts to reference instead of hallucinating; it sounds like what\u2019s being proposed is a hybrid between an LLM and the sort of expert system you used to use Wolfram Alpha to access. (<a href=\"https:\/\/hackaday.com\/2023\/04\/17\/wolfram-alpha-with-chatgpt-looks-like-a-killer-combo\/\" rel=\"nofollow noopener\" target=\"_blank\">A combo we\u2019ve covered before<\/a>.)<\/p>\n<p>In that case, however, some skeptics might wonder why bother with the LLM at all, if the knowledge in the expert system is \u201cgood enough.\u201d (Having seen one AI boom before, we can say with the judgement of history that the knowledge in an expert system isn\u2019t good enough often enough to make many viable products.)<\/p>\n<p>Unfortunately, that \u201ceasy\u201d solution runs back into the issue of grading: if you want your model to do well on the scoreboards and beat ChatGPT or DeepSeek at popular benchmarks, there\u2019s a certain amount of \u201cteaching to the test\u201d involved, and a model that occasionally makes stuff up will apparently do better on the benchmarks than one that refuses to guess. The obvious solution, as the authors propose, is changing the benchmarks.<\/p>\n<p>If you\u2019re interested in AI (and who isn\u2019t, these days?), the paper makes an interesting, read. Interesting if, perhaps disheartening if you were hoping the LLMs would graduate from their<a href=\"https:\/\/hackaday.com\/2023\/07\/26\/chatgpt-the-worst-summer-intern-ever\/\" rel=\"nofollow noopener\" target=\"_blank\"> eternal internship <\/a>any time soon.<\/p>\n<p>Via <a href=\"https:\/\/www.computerworld.com\/article\/4059383\/openai-admits-ai-hallucinations-are-mathematically-inevitable-not-just-engineering-flaws.html?ref=wheresyoured.at\" target=\"_blank\" rel=\"nofollow noopener\">ComputerWorld,\u00a0<\/a>by way of <a href=\"https:\/\/www.wheresyoured.at\/sora2-openai\/\" target=\"_blank\" rel=\"nofollow noopener\">whereisyouredat<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"Researchers call it \u201challucination\u201d; you might more accurately refer to it as confabulation, hornswaggle, hogwash, or just plain&hellip;\n","protected":false},"author":2,"featured_media":205640,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[256,254,255,64,63,105],"class_list":{"0":"post-205639","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-au","12":"tag-australia","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/205639","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=205639"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/205639\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/205640"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=205639"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=205639"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=205639"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}