{"id":215923,"date":"2026-01-02T03:15:10","date_gmt":"2026-01-02T03:15:10","guid":{"rendered":"https:\/\/www.newsbeep.com\/il\/215923\/"},"modified":"2026-01-02T03:15:10","modified_gmt":"2026-01-02T03:15:10","slug":"the-core-problem-with-large-language-models","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/il\/215923\/","title":{"rendered":"The Core Problem with Large Language Models"},"content":{"rendered":"<p>I have shared many examples of the unreliability of OpenAI\u2019s ChatGPT, Google\u2019s Gemini, and other large language models (LLMs). <a href=\"https:\/\/mindmatters.ai\/2025\/12\/is-openai-approaching-the-valley-of-death\/\" rel=\"nofollow noopener\" target=\"_blank\">Recent examples include<\/a> evaluating a rotated tic-tac-toe grid, and labeling a possum\u2019s body parts. Some correspondents have objected that no one is going to ask an LLM to evaluate an obviously pointless tic-tac-toe rotation or to identify a possum\u2019s tail and snout.<\/p>\n<p>I should have been clearer. The many examples I\u2019ve reported are not just \u201cgotchas\u201d meant to embarrass LLMs. They are intended to demonstrate in a clear and compelling way that LLMs are inherently unreliable.<\/p>\n<p>Going back to my Oxford University Press tetralogy, <a href=\"https:\/\/www.garysmithn.com\/books\/the-ai-delusion\" rel=\"nofollow noopener\" target=\"_blank\">The AI Delusion<\/a>, <a href=\"https:\/\/www.garysmithn.com\/books\/9-pitfalls-of-data-science\" rel=\"nofollow noopener\" target=\"_blank\">The 9 Pitfalls of Data Science<\/a> (co-authored with Jay Cordes), <a href=\"https:\/\/www.garysmithn.com\/books\/the-phantom-pattern-problem\" rel=\"nofollow noopener\" target=\"_blank\">The Phantom Pattern Problem<\/a>, and <a href=\"https:\/\/www.garysmithn.com\/books\/distrust\" rel=\"nofollow noopener\" target=\"_blank\">Distrust:<\/a> <a href=\"https:\/\/www.garysmithn.com\/books\/distrust\" rel=\"nofollow noopener\" target=\"_blank\">Big Data, Data-Torturing, and the Assault on Science,<\/a> I have argued relentlessly that AI systems based on data mining are unreliable.<\/p>\n<p>Why the LLM system is inherently unreliable<\/p>\n<p>The scientific method begins with a specific hypothesis that is tested with data, ideally a randomized controlled trial. Data mining goes in the other direction, beginning with data and looking to find <a href=\"https:\/\/ieeexplore.ieee.org\/document\/6567202\" rel=\"nofollow noopener\" target=\"_blank\">what have been called<\/a> \u201chidden patterns and secret correlations.\u201d When a heretofore unknown pattern or correlation is found, <a href=\"https:\/\/link.springer.com\/book\/10.1007\/978-0-387-36795-8\" rel=\"nofollow noopener\" target=\"_blank\">data miners either<\/a> concoct a plausible theory or argue that theories are not needed. For example, in a Wired piece titled, \u201c<a href=\"https:\/\/www.stat.uchicago.edu\/~lekheng\/courses\/191f09\/norvig.pdf\" rel=\"nofollow noopener\" target=\"_blank\">The End of Theory: The Data Deluge Makes the Scientific Method Obsolete<\/a>,\u201d Chris Anderson wrote,<\/p>\n<p>Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.<\/p>\n<p>The problem is that the number of genuinely useful relationships is limited. But the data deluge is increasing the number of coincidental, useless statistical patterns exponentially \u2014 so the probability that a data-mined pattern is actually useful is getting ever closer to zero. As I have <a href=\"https:\/\/journals.sagepub.com\/doi\/10.1177\/0268396220915600\" rel=\"nofollow noopener\" target=\"_blank\">written elsewhere<\/a>:<\/p>\n<p>This is the paradox of big data: It would seem that having data for a large number of variables will help us find more reliable patterns; however, the more variables we consider, the less likely it is that what we find will be useful.<\/p>\n<p>Data-mining algorithms look for statistical patterns with no idea how the data relate to the real world. The data might as well be labeled &amp;4qp#L, M&amp;2V+v, or any other randomly generated collection of ASCII characters. Data-mining algorithms are consequently unable to distinguish between useful causation and useless correlation. Here are a few examples of misplaced faith in data mining:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"327\" height=\"500\" src=\"https:\/\/www.newsbeep.com\/il\/wp-content\/uploads\/2026\/01\/51DeKNCpq-L.jpg\" alt=\"\" class=\"wp-image-1600\"\/>Amazon book cover for The AI Delusion by Gary Smith<\/p>\n<p>Admiral Insurance, Britain\u2019s largest car insurance company, <a href=\"https:\/\/www.theguardian.com\/technology\/2016\/nov\/02\/admiral-to-price-car-insurance-based-on-facebook-posts\" rel=\"nofollow noopener\" target=\"_blank\">planned to launch<\/a> firstcarquote, which would base its car insurance rates on a data-mined analysis of an applicant\u2019s Facebook posts \u2014 before being blocked by Facebook lawyers.<\/p>\n<p>Yongqianbao, a Chinese tech company, <a href=\"https:\/\/www.wsj.com\/articles\/want-a-loan-in-china-keep-your-phone-charged-1491474250\" rel=\"nofollow noopener\" target=\"_blank\">reported that<\/a> they had developed a data-mining algorithm that evaluates loan applications based on applicants\u2019 smartphone usage, including such measures as whether they keep their phones fully charged and how many calls go unanswered. The company is no longer in business.<\/p>\n<p>Gild developed data-mining <a href=\"https:\/\/www.theatlantic.com\/magazine\/archive\/2013\/12\/theyre-watching-you-at-work\/354681\/\" rel=\"nofollow noopener\" target=\"_blank\">software for evaluating applicants<\/a> for software engineering jobs by monitoring their online activities. Even the chief scientist acknowledged that some of the factors chosen by its data-mining software do not make sense, saying, \u201cObviously, it\u2019s not a causal relationship.\u201d A former employee wrote that, \u201cCustomers really, really hate the product. There are almost no customers that have an overall positive experience.\u201d<\/p>\n<p>A <a href=\"https:\/\/www.bloomberg.com\/news\/articles\/2019-09-09\/jpmorgan-creates-volfefe-index-to-track-trump-tweet-impact\" rel=\"nofollow noopener\" target=\"_blank\">JP Morgan study<\/a> concluded that Trump tweets containing the words China, billion, products, Democrats, or great have a \u201cstatistically significant impact\u201d on interest rates. They don\u2019t.<\/p>\n<p><a href=\"https:\/\/pubmed.ncbi.nlm.nih.gov\/23619126\/\" rel=\"nofollow noopener\" target=\"_blank\">Researchers found<\/a> a correlation between a three-week moving average of Google searches for the word debt and changes in the Dow Jones Industrial Average. Using data for a 7-year period, they reported that their debt strategy had an astounding 23.0% annual return, compared with 2.2% for a buy-and-hold strategy. However, for the 7 years after their study, their debt strategy had an annual return of 2.81%, compared with 8.60% for buy-and-hold.<\/p>\n<p>The perils of data mining<\/p>\n<p>LLMs mine data but they search for statistical patterns in words instead of numbers. Similar to traditional data-mining algorithms, LLMs do not know how the text they input and output relate to the real world and consequently have no way of knowing whether the statements they process and generate are true or false, sensible or nonsense.<\/p>\n<p>Here are a few examples of mishaps caused by this core problem:<\/p>\n<p>Air Canada <a href=\"https:\/\/www.forbes.com\/sites\/marisagarcia\/2024\/02\/19\/what-air-canada-lost-in-remarkable-lying-ai-chatbot-case\/\" rel=\"nofollow noopener\" target=\"_blank\">was sued and lost a case<\/a> in which a customer-service chatbot generated incorrect information about the airline\u2019s bereavement-fare policy.<\/p>\n<p>A car dealer\u2019s ChatGPT <a href=\"https:\/\/www.upworthy.com\/prankster-tricks-a-gm-dealership-chatbot-to-sell-him-a-76000-chevy-tahoe-for-ex1\" rel=\"nofollow noopener\" target=\"_blank\">bot agreed to sell<\/a> a $76,000 Chevy Tahoe for $1. The bot said, \u201cThat\u2019s a deal, and that\u2019s a legally binding offer \u2013 no takesies backsies,\u201d but the dealer refused to honor the deal.<\/p>\n<p>A New York Federal court <a href=\"https:\/\/www.seyfarth.com\/news-insights\/update-on-the-chatgpt-case-counsel-who-submitted-fake-cases-are-sanctioned.html\" rel=\"nofollow noopener\" target=\"_blank\">sanctioned<\/a> experienced lawyers who used ChatGPT to generate a court filing that cited non-existence cases.<\/p>\n<p>A <a href=\"https:\/\/www.vice.com\/en\/article\/woman-files-for-divorce-after-chatgpt-reads-husbands-affair-in-coffee-cup\/#:~:text=News-,Woman%20Files%20for%20Divorce%20After%20ChatGPT%20&#039;Reads&#039;%20Husband&#039;s%20Affair%20in,trying%20to%20destroy%20their%20family.\" rel=\"nofollow noopener\" target=\"_blank\">Greek woman filed for divorce<\/a> because a ChatGPT reading of photos of coffee grounds left in her cup and her husband\u2019s cup indicated that he was having an affair with a \u201chome wrecker.\u201d<\/p>\n<p>LLM-based hiring systems <a href=\"https:\/\/arxiv.org\/html\/2501.04316v2#:~:text=Large%20language%20models%20(LLMs)%20are,outcomes%20in%20real-world%20contexts.\" rel=\"nofollow noopener\" target=\"_blank\">have been shown<\/a> to generate outcomes that are biased with respect to race and sex.<\/p>\n<p>A study of the use of <a href=\"https:\/\/dl.acm.org\/doi\/epdf\/10.1145\/3630106.3658996\" rel=\"nofollow noopener\" target=\"_blank\">OpenAI\u2019s Whisper transcription tool<\/a> to create records of doctor-patient interactions found that it often created entirely made-up phrases and sentences, many of which were potentially harmful.<\/p>\n<p>A man was <a href=\"https:\/\/www.theguardian.com\/technology\/2025\/aug\/12\/us-man-bromism-salt-diet-chatgpt-openai-health-information?utm_source=chatgpt.com\" rel=\"nofollow noopener\" target=\"_blank\">reportedly<\/a> hospitalized for severe psychiatric problems after substituting sodium bromide for table salt following a ChatGPT session.<\/p>\n<p>The parents of a 16-year-old who hung himself <a href=\"https:\/\/nypost.com\/2025\/08\/26\/us-news\/chatgpt-coached-teen-as-he-prepared-suicide-praised-noose-knot-suit\/?utm_source=chatgpt.com\" rel=\"nofollow noopener\" target=\"_blank\">filed a wrongful-death lawsuit<\/a> against OpenAI, alleging that ChatGPT had given their son a \u201cstep-by-step playbook\u201d on how to hang himself, including advice on the best knot to use and offering to write a suicide note for him.<\/p>\n<p>Scaling up (by increasing the number of parameters, the size of the training data, and the computational power) won\u2019t solve this fundamental weakness \u2014 that LLMs are unreliable because they do not know how words relate to the real world.<\/p>\n<p>Why there is no simple fix<\/p>\n<p>Expert training can help LLMs give better answers to the prompts they are trained on but (1) the experts cannot anticipate all prompts an LLM might be asked, (2) many (most?) real world decisions involve subjective probabilities that cannot be reliably generated by clueless LLMs and cannot be anticipated by human trainers, and (3) following instructions is not general intelligence.<\/p>\n<p>The point of the hundreds of examples I have given over the years is that predictions, advice, or conclusions based solely on statistical patterns are unreliable. Data mining is dodgy because correlation does not supersede causation.<\/p>\n","protected":false},"excerpt":{"rendered":"I have shared many examples of the unreliability of OpenAI\u2019s ChatGPT, Google\u2019s Gemini, and other large language models&hellip;\n","protected":false},"author":2,"featured_media":215924,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[345,343,344,85,46,125],"class_list":{"0":"post-215923","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-il","12":"tag-israel","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/posts\/215923","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/comments?post=215923"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/posts\/215923\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/media\/215924"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/media?parent=215923"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/categories?post=215923"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/tags?post=215923"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}