{"id":606696,"date":"2026-04-14T17:42:07","date_gmt":"2026-04-14T17:42:07","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/606696\/"},"modified":"2026-04-14T17:42:07","modified_gmt":"2026-04-14T17:42:07","slug":"ai-learns-language-from-skewed-sources-that-could-change-how-we-humans-speak-and-think-ada-palmer-and-bruce-schneier","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/606696\/","title":{"rendered":"AI learns language from skewed sources. That could change how we humans speak \u2013 and think | Ada Palmer and Bruce Schneier"},"content":{"rendered":"<p class=\"dcr-130mj7b\">Because of the way they are trained, large language models capture only a slice of human language. They\u2019re trained on the written word, from textbooks to social media posts, and our speech as captured in movies and on television. These models have minimal access to the unscripted conversations we have face to face or voice to voice. This is the vast majority of speech, and a vital component of human culture.<\/p>\n<p class=\"dcr-130mj7b\">There\u2019s a risk to this. The increased use of large language models means we humans will encounter much more AI-generated text. We humans, in turn, will begin to adopt the linguistic patterns and behaviors of these models. This will affect not just how we communicate with one another, but also how we think about ourselves and what goes on around us. Our sense of the world may become distorted in ways we have barely begun to comprehend.<\/p>\n<p class=\"dcr-130mj7b\">This will happen in many ways. One of the first effects we could see is in simple expression, much as texting and social media have resulted in us using shorter sentences, emojis instead of words, and much less punctuation. But with AI, the impacts may be more harmful, eroding courteousness and encouraging us to talk like bosses barking orders. A 2022 <a href=\"https:\/\/adc.bmj.com\/content\/early\/2022\/08\/22\/archdischild-2022-323888?rss=1\" data-link-name=\"in body link\" rel=\"nofollow noopener\" target=\"_blank\">study<\/a> found that children in households that used voice commands with tools like Siri and Alexa became curt when speaking with humans, often calling out \u201cHey, do X\u201d and expecting obedience, especially from anyone whose voice resembled the default-female electronic voices. As we start to prompt chatbots and AI agents with more instructions, we may fall into the same habits.<\/p>\n<p class=\"dcr-130mj7b\">Next, in the same way autocomplete has increased how much we use the 1,000 most common words in our vocabulary, talking with chatbots and reading AI-generated text may further constrict our speech. A recent University of Coru\u00f1a <a href=\"https:\/\/pubmed.ncbi.nlm.nih.gov\/39328400\/\" data-link-name=\"in body link\" rel=\"nofollow noopener\" target=\"_blank\">study<\/a> found that machine-generated language has a narrower range of sentence length, averaging 12-20 words, and a narrower vocabulary than human speech. Machine-generated text reads as smooth and polished, but it loses the meanders, interruptions and leaps of logic that communicate emotion.<\/p>\n<p class=\"dcr-130mj7b\">Additionally, because large language models are primarily trained from written speech, they may not learn how to emulate the free-wheeling nature of live, natural speech. When told \u201cI hate Beth!\u201d, ChatGPT replies with an uninterruptable three-part formula of affirmation (\u201cThat\u2019s completely valid\u201d), invitation (\u201cI\u2019m here to listen\u201d) and invitation (\u201cWhat\u2019s going on?\u201d) far longer than any reply plausible in face-to-face dialog. \u201cWhat\u2019s Beth\u2019s deal?!\u201d elicits a bullet point list of queries that reads like a multiple-choice exam question (\u201cIs Beth * a celebrity? * a friend from school? * a fictitious character?\u201d). No human speaks that way, at least not yet. But meeting such formulas repeatedly in a speech-like context may teach us to accept and use them, much as a child absorbs new speech patterns from spending time with a new person.<\/p>\n<p class=\"dcr-130mj7b\">These influences will only increase with time. The writing large language models train on is increasingly produced by large language models themselves, creating a feedback loop in which they imitate their own inhuman patterns, even while teaching humans to imitate them too.<\/p>\n<p>double quotation markMachine-generated text reads as smooth and polished, but it loses the meanders, interruptions and leaps of logic that communicate emotion<\/p>\n<p class=\"dcr-130mj7b\">Broad use of large language models could also introduce <a href=\"https:\/\/aclanthology.org\/2025.findings-acl.195.pdf\" data-link-name=\"in body link\" rel=\"nofollow noopener\" target=\"_blank\">confirmation bias<\/a>, making us overconfident in our initial impulses and less open to other possible ideas \u2013 which is so vital to human discourse. Many chatbots are instructed to agree with our statements no matter how absurd, enthusiastically supporting half-formed or even incorrect notions and restating them as firm claims that we\u2019re primed to agree with. When asked: \u201cCake is a healthy breakfast, right?\u201d or \u201cIs the post office plotting against me?\u201d, this sycophancy <a href=\"https:\/\/www.article19.org\/resources\/algorithmic-people-pleasers-are-ai-chatbots-telling-you-what-you-want-to-hear\/\" data-link-name=\"in body link\" rel=\"nofollow noopener\" target=\"_blank\">can reinforce bias<\/a> and even worsen <a href=\"https:\/\/www.psychologytoday.com\/us\/blog\/urban-survival\/202507\/the-emerging-problem-of-ai-psychosis\" data-link-name=\"in body link\" rel=\"nofollow noopener\" target=\"_blank\">psychosis<\/a>. And the hyperconfident tone of AI-produced writing will also heighten impostor syndrome, making our natural, healthy doubt feel like an aberration or failing.<\/p>\n<p class=\"dcr-130mj7b\">In our experience as teachers, students who turn to generative AI for assignments often say they do so because they have trouble expressing what they think. The students don\u2019t recognize that writing or speaking our thoughts is often how we realize what we think. Their unconfident and uncertain statements are actually the healthy human norm. But a large language model won\u2019t turn vague first guesses into a well-formed critical analysis, or even ask helpful questions as a friend would; it will simply regurgitate those guesses, still unexamined, but in confident language.<\/p>\n<p class=\"dcr-130mj7b\">We are also more vicious in social media posts and online chats than we are face to face. The <a href=\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0306457325000214\" data-link-name=\"in body link\" rel=\"nofollow noopener\" target=\"_blank\">well-documented<\/a> <a data-link-name=\"in body link\">online disinhibition effect<\/a> encourages toxic language. Most of us have had the experience of venting ferocious rage about someone online, only to reconcile when we speak face to face or hear the warmth of a voice over the phone. While chatbots are trained to give sycophantic responses, they see humankind at our cruelest, learning about us from the only world where every flame war leaves an eternal written footprint, while the spoken conversations of forgiveness and reconciliation fade away. Their responses do not imitate our online aggression, but are still shaped by it, even in their rigid efforts to avoid it.<\/p>\n<p class=\"dcr-130mj7b\">It\u2019s easy to draw the wrong conclusions from a selective slice of a society\u2019s communications. Medieval Norse sagas made us imagine a culture of mostly Viking warriors, since poets rarely described the farming majority. Chivalric romances focused on kings and courts, and long made us see the middle ages as a world of monarchies, erasing the many medieval republics. Statistically, we\u2019ve been led to believe ancient Romans cared deeply about their republic, but 10% of all surviving Latin was written by one man, Cicero, whose work contains 70% of all surviving Roman uses of the word republic. Training language models on only certain human writings may introduce similar distortions. AI might make us seem more quarrelsome, as we are online. It might inflate the cultural significance of political topics primarily discussed on Twitter\/X or Bluesky, or the massive topic-specific corpuses of LinkedIn and Goodreads.<\/p>\n<p class=\"dcr-130mj7b\">Some large language models are being trained on human speech from movies and television shows, but that speech is still scripted, and disproportionately highlights certain contexts over others (for example, police dramas, fueled by stories of murder, make up a <a href=\"https:\/\/www.researchgate.net\/figure\/Percent-of-Network-prime-time-programs-featuring-crime_fig1_267199589\" data-link-name=\"in body link\" rel=\"nofollow noopener\" target=\"_blank\">quarter<\/a> of primetime television programming). We are not funny or hurtful or romantic the same way in real life as we are in sitcoms. At least one <a href=\"https:\/\/www.zdnet.com\/article\/this-app-will-pay-you-30day-to-record-your-phone-calls-for-ai-but-is-it-worth-it\/\" data-link-name=\"in body link\" rel=\"nofollow noopener\" target=\"_blank\">startup<\/a> is offering to pay people to record their phone calls for AI-training purposes, but this remains a niche idea; anything large scale would cause massive privacy concerns.<\/p>\n<p class=\"dcr-130mj7b\">We don\u2019t pretend to know what the best solutions might be. But one has to imagine if there\u2019s ingenuity to develop AI models, then surely there\u2019s ingenuity to come up with a way to train them on informal human speech instead of us only at our most stylized, veiled and sometimes worst. By excluding the overwhelming majority of language production on the planet \u2013 people talking, fully and naturally, to each other \u2013 these models are being trained to mirror everything but us at our most authentically human.<\/p>\n<p class=\"dcr-130mj7b\">Bruce Schneier is a security technologist who teaches at the Harvard Kennedy School at Harvard University. Ada Palmer is a fantasy and science fiction novelist, futurist, and historian of technology and information at the University of Chicago<\/p>\n","protected":false},"excerpt":{"rendered":"Because of the way they are trained, large language models capture only a slice of human language. They\u2019re&hellip;\n","protected":false},"author":2,"featured_media":606697,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[256,254,255,64,63,105],"class_list":{"0":"post-606696","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-au","12":"tag-australia","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/606696","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=606696"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/606696\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/606697"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=606696"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=606696"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=606696"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}