AI-generated Indigenous language dictionaries, elders’ teachings and history circulating online could be harming culture and language revitalization efforts, say experts.
Content can easily and convincingly be created by generative artificial intelligence. Large language models (LLM) like ChatGPT are trained on massive amounts of data and use predictive guesswork to generate a response.
“These systems are especially likely, given the limited datasets available for many Indigenous languages, to produce invented words, fabricated cultural teachings, or generalized ‘pan-Indigenous’ representations that flatten distinct nations or communities into one interchangeable identity,” said Michael G. Sherbert, a postdoctoral fellow at Queen’s University in Kingston, Ont.
Sherbert, a member of Algonquins of Pikwakanagan First Nation, researches the ethics of using AI for cultural preservation of Indigenous languages and knowledge.
Sherbert said AI use in language and cultural preservation is still relatively new and some communities are prioritizing structured knowledge system AI, which is curated and controlled by the community or enterprise.
Sherbert said generative AI is highly flexible and conversational but can “hallucinate” or fabricate information and although there are teams that go through AI to try to make it more responsible and ethical, these hallucinations or misrepresentations can be appropriative and harmful.
“You could say that the AI is inadvertently colonizing and hurting Indigenous language revitalization because [people] are taking information generated by an artificial intelligence and putting it out there for people to read,” said Sherbert.
Michael G. Sherbert researches the ethics of using AI for cultural preservation of Indigenous languages and knowledge. (Submitted by Michael G. Sherbert)
People who may not be connected to a community may put their trust in generative AI, he said, because it seems to be giving them pretty good answers.
Based on that assumption, people may use AI for educational purposes or ask it something like “give me a good elder story,” Sherbert said, and a generative AI uses statistical prediction to offer one that is “completely constructed from false information.”
“[Generative AI] is optimized for something like fluency to give you answers that are good. It’s not optimized for truth or for ethical or cultural responsibility or accountability,” he said.
When AI is structured around verified knowledge systems from a community “rather than probabilistic pattern-matching, the likelihood of fabricated language or misrepresentation drops significantly,” Sherbert said.
“More importantly, the authority over what is included, excluded, or restricted remains with the community itself.”
He’s working on language and culture revitalization with his First Nation’s education services using structured knowledge system AI in collaboration with kama.ai, an Indigenous-owned AI company that specializes in this type of architecture.
No easy way to learn culture
Kaitlyn Lazore works in Kanien’kéha language preservation as a program support officer for the Mohawk Language and Culture with the Ahkwesáhsne Mohawk Board of Education.
She manages content for their social media page that posts daily phrases and prompts vetted by first language speakers in their community. They are not currently using AI for language revitalization in their community.
“If you’re going to use AI, I think you already have to have a really good understanding of the language and culture to know what is right and what is wrong,” Lazore said.
She said there’s no easy way to learn the language or gain culture without getting out in your community.
“I go to bingo and I talk with first language speakers. You have to physically get yourself out there,” she said.
AI governance and transparency
Brian Ritchie, from Chapleau Cree First Nation in Ontario, founded kama.ai. Ritchie said the company works to ensure a structured knowledge base is curated by community or the enterprise it’s working with.
Ritchie said First Nations communities that want to maintain ownership, control and access over their information should use AI tools with capabilities in data sovereignty, like structured knowledge systems.
Brian Ritchie (left) from Chapleau Cree First Nation, is founder and CEO of kama.ai, an Indigenous AI development platform that can run AI agents on behalf of communities or enterprises. (Submitted by Brian Ritchie )
Ritchie said a major issue with LLMs is governance.
“How do we know this thing won’t say the wrong thing, won’t say something that is biased information, won’t say something that’s offensive, culturally offensive to a certain population,” he said.
He said there are no clear distinctions between the two types of AI and that it’s up to creators to disclose when content is made with AI.
“Whoever’s putting the tool out for consumption is the one that makes any claims about authenticity,” Ritchie said.
“It can be difficult for any user to understand how responsible or accurate or authentic the information is.”
It’s important to use your judgment for every answer from an LLM because sometimes even the references used can be fake, Ritchie said.