{"id":126938,"date":"2025-09-08T04:00:06","date_gmt":"2025-09-08T04:00:06","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/126938\/"},"modified":"2025-09-08T04:00:06","modified_gmt":"2025-09-08T04:00:06","slug":"are-bad-incentives-to-blame-for-ai-hallucinations","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/126938\/","title":{"rendered":"Are bad incentives to blame for AI hallucinations?"},"content":{"rendered":"<p id=\"speakable-summary\" class=\"wp-block-paragraph\">A <a rel=\"nofollow noopener\" href=\"https:\/\/cdn.openai.com\/pdf\/d04913be-3f6f-4d2b-b283-ff432ef4aaa5\/why-language-models-hallucinate.pdf\" target=\"_blank\">new research paper<\/a> from OpenAI asks why large language models like GPT-5 and chatbots like ChatGPT still hallucinate, and whether anything can be done to reduce those hallucinations.<\/p>\n<p class=\"wp-block-paragraph\">In <a rel=\"nofollow noopener\" href=\"https:\/\/openai.com\/index\/why-language-models-hallucinate\/\" target=\"_blank\">a blog post summarizing the paper<\/a>, OpenAI defines hallucinations as \u201cplausible but false statements generated by language models,\u201d and it acknowledges that despite improvements, hallucinations \u201cremain a fundamental challenge for all large language models\u201d \u2014 one that will never be completely eliminated.<\/p>\n<p class=\"wp-block-paragraph\">To illustrate the point, researchers say that when they asked \u201ca widely used chatbot\u201d about the title of Adam Tauman Kalai\u2019s Ph.D. dissertation, they got three different answers, all of them wrong. (Kalai is one of the paper\u2019s authors.) They then asked about his birthday and received three different dates. Once again, all of them were wrong.<\/p>\n<p class=\"wp-block-paragraph\">How can a chatbot be so wrong \u2014 and sound so confident in its wrongness? The researchers suggest that hallucinations arise, in part, because of a pretraining process that focuses on getting models to correctly predict the next word, without true or false labels attached to the training statements: \u201cThe model sees only positive examples of fluent language and must approximate the overall distribution.\u201d<\/p>\n<p class=\"wp-block-paragraph\">\u201cSpelling and parentheses follow consistent patterns, so errors there disappear with scale,\u201d they write. \u201cBut arbitrary low-frequency facts, like a pet\u2019s birthday, cannot be predicted from patterns alone and hence lead to hallucinations.\u201d<\/p>\n<p class=\"wp-block-paragraph\">The paper\u2019s proposed solution, however, focuses less on the initial pretraining process and more on how large language models are evaluated. It argues that the current evaluation models don\u2019t cause hallucinations themselves, but they \u201cset the wrong incentives.\u201d<\/p>\n<p class=\"wp-block-paragraph\">The researchers compare these evaluations to the kind of multiple choice tests random guessing makes sense, because \u201cyou might get lucky and be right,\u201d while leaving the answer blank \u201cguarantees a zero.\u201d\u00a0<\/p>\n<p>Techcrunch event<\/p>\n<p>\n\t\t\t\t\t\t\t\t\tSan Francisco<br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\t|<br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\tOctober 27-29, 2025\n\t\t\t\t\t\t\t<\/p>\n<p class=\"wp-block-paragraph\">\u201cIn the same way, when models are graded only on accuracy, the percentage of questions they get exactly right, they are encouraged to guess rather than say \u2018I don\u2019t know,\u2019\u201d they say.<\/p>\n<p class=\"wp-block-paragraph\">The proposed solution, then, is similar to tests (like the SAT) that include \u201cnegative [scoring] for wrong answers or partial credit for leaving questions blank to discourage blind guessing.\u201d Similarly, OpenAI says model evaluations need to \u201cpenalize confident errors more than you penalize uncertainty, and give partial credit for appropriate expressions of uncertainty.\u201d<\/p>\n<p class=\"wp-block-paragraph\">And the researchers argue that it\u2019s not enough to introduce \u201ca few new uncertainty-aware tests on the side.\u201d Instead, \u201cthe widely used, accuracy-based evals need to be updated so that their scoring discourages guessing.\u201d<\/p>\n<p class=\"wp-block-paragraph\">\u201cIf the main scoreboards keep rewarding lucky guesses, models will keep learning to guess,\u201d the researchers say.<\/p>\n","protected":false},"excerpt":{"rendered":"A new research paper from OpenAI asks why large language models like GPT-5 and chatbots like ChatGPT still&hellip;\n","protected":false},"author":2,"featured_media":126939,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[256,254,255,64,63,5044,105],"class_list":{"0":"post-126938","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-au","12":"tag-australia","13":"tag-openai","14":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/126938","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=126938"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/126938\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/126939"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=126938"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=126938"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=126938"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}