{"id":29444,"date":"2025-07-29T07:33:07","date_gmt":"2025-07-29T07:33:07","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/29444\/"},"modified":"2025-07-29T07:33:07","modified_gmt":"2025-07-29T07:33:07","slug":"are-you-joking-mate-ai-doesnt-get-sarcasm-in-non-american-varieties-of-english","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/29444\/","title":{"rendered":"\u2018Are you joking, mate?\u2019 AI doesn\u2019t get sarcasm in non-American varieties of English"},"content":{"rendered":"<p>In 2018, my Australian co-worker asked me, \u201cHey, how are you going?\u201d. My response \u2013 \u201cI am taking a bus\u201d \u2013 was met with a smirk. I had recently moved to Australia. Despite studying English for more than 20 years, it took me a while to familiarise myself with the Australian variety of the language. <\/p>\n<p>It turns out large language models powered by artificial intelligence (AI) such as ChatGPT experience a similar problem. <\/p>\n<p>In new research, published in the <a href=\"https:\/\/aclanthology.org\/2025.findings-acl.441\/\" rel=\"nofollow noopener\" target=\"_blank\">Findings of the Association for Computational Linguistics 2025<\/a>, my colleagues and I introduce a new tool for evaluating the ability of different large language models to detect sentiment and sarcasm in three varieties of English: Australian English, Indian English and British English.<\/p>\n<p>The results show there is still a long way to go until the promised benefits of AI are enjoyed by all, no matter the type or variety of language they speak. <\/p>\n<p>Limited English<\/p>\n<p>Large language models are <a href=\"https:\/\/arxiv.org\/abs\/2303.08774\" rel=\"nofollow noopener\" target=\"_blank\">often reported<\/a> to achieve <a href=\"https:\/\/arxiv.org\/pdf\/2312.11805\" rel=\"nofollow noopener\" target=\"_blank\">superlative performance<\/a> on several standardised <a href=\"https:\/\/arxiv.org\/abs\/2302.13971\" rel=\"nofollow noopener\" target=\"_blank\">sets of tasks known as benchmarks<\/a>. <\/p>\n<p>The majority of benchmark tests are written in Standard American English. This implies that, while large language models are being aggressively sold by commercial providers, they have predominantly been tested \u2013 and trained \u2013 only on this one type of English. <\/p>\n<p>This has major consequences. <\/p>\n<p>For example, <a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3712060\" rel=\"nofollow noopener\" target=\"_blank\">in a recent survey<\/a> my colleagues and I found large language models are more likely to classify a text as hateful if it is written in the African-American variety of English. They also often \u201cdefault\u201d to Standard American English \u2013 even if the input is in other varieties of English, such as Irish English and Indian English. <\/p>\n<p>To build on this research, we built BESSTIE.<\/p>\n<p>What is BESSTIE?<\/p>\n<p>BESSTIE is the first-of-its-kind benchmark for sentiment and sarcasm classification of three varieties of English: Australian English, Indian English and British English.<\/p>\n<p>For our purposes, \u201csentiment\u201d is the characteristic of the emotion: positive (the Aussie \u201cnot bad!\u201d) or negative (\u201cI hate the movie\u201d). Sarcasm is defined as a form of verbal irony intended to express contempt or ridicule (\u201cI love being ignored\u201d).<\/p>\n<p>To build BESSTIE, we collected two kinds of data: reviews of places on Google Maps and Reddit posts. We carefully curated the topics and employed language variety predictors \u2013 AI models specialised in detecting the language variety of a text. We selected texts that were predicted to be greater than 95% probability of a specific language variety. <\/p>\n<p>The two steps (location filtering and language variety prediction) ensured the data represents the national variety, such as Australian English. <\/p>\n<p>We then used BESSTIE to evaluate nine powerful, freely usable large language models, including <a href=\"https:\/\/huggingface.co\/docs\/transformers\/en\/model_doc\/roberta\" rel=\"nofollow noopener\" target=\"_blank\">RoBERTa<\/a>, <a href=\"https:\/\/en.wikipedia.org\/wiki\/BERT_(language_model)\" rel=\"nofollow noopener\" target=\"_blank\">mBERT<\/a>, <a href=\"https:\/\/docs.mistral.ai\/getting-started\/models\/models_overview\/\" rel=\"nofollow noopener\" target=\"_blank\">Mistral<\/a>, <a href=\"https:\/\/deepmind.google\/models\/gemma\/\" rel=\"nofollow noopener\" target=\"_blank\">Gemma<\/a> and <a href=\"https:\/\/en.wikipedia.org\/wiki\/Qwen\" rel=\"nofollow noopener\" target=\"_blank\">Qwen<\/a>.<\/p>\n<p>Inflated claims<\/p>\n<p>Overall, we found the large language models we tested worked better for Australian English and British English (which are native varieties of English) than the non-native variety of Indian English. <\/p>\n<p>We also found large language models are better at detecting sentiment than they are at sarcasm. <\/p>\n<p>Sarcasm is particularly challenging, not only as a linguistic phenomenon but also as a challenge for AI. For example, we found the models were able to detect sarcasm in Australian English only 62% of the time. This number was lower for Indian English and British English \u2013 about 57%. <\/p>\n<p>These performances are lower than those claimed by the tech companies that develop large language models. For example, <a href=\"https:\/\/gluebenchmark.com\/leaderboard\" rel=\"nofollow noopener\" target=\"_blank\">GLUE<\/a> is a leaderboard that tracks how well AI models perform at sentiment classification on American English text.<\/p>\n<p>The highest value is 97.5% for the model Turing ULR v6 and 96.7% for RoBERTa (from our suite of models) \u2013 both higher for American English than our observations for Australian, Indian and British English.<\/p>\n<p>National context matters<\/p>\n<p>As more and more people around the world use large language models, researchers and practitioners are waking up to the fact that these tools need to be evaluated for a specific national context. <\/p>\n<p>For example, earlier this year the University of Western Australia along with Google <a href=\"https:\/\/www.uwa.edu.au\/news\/article\/2025\/february\/first-nations-people-to-benefit-from-inclusive-technology-partnership\" rel=\"nofollow noopener\" target=\"_blank\">launched a project<\/a> to improve the efficacy of large language models for Aboriginal English. <\/p>\n<p>Our benchmark will help evaluate future large language model techniques for their ability to detect sentiment and sarcasm. We\u2019re also currently working on a project for large language models in <a href=\"https:\/\/www.unsw.edu.au\/newsroom\/news\/2025\/06\/emergency-departments-bilingual-patients\" rel=\"nofollow noopener\" target=\"_blank\">emergency departments of hospitals<\/a> to help patients with varying proficiencies of English.<\/p>\n","protected":false},"excerpt":{"rendered":"In 2018, my Australian co-worker asked me, \u201cHey, how are you going?\u201d. My response \u2013 \u201cI am taking&hellip;\n","protected":false},"author":2,"featured_media":29445,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[256,254,255,64,63,105],"class_list":{"0":"post-29444","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-au","12":"tag-australia","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/29444","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=29444"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/29444\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/29445"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=29444"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=29444"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=29444"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}