The 'AI Accent' Is An Easy Way To Spot AI Videos

Do you know what artificial intelligence sounds like? When asked to guess, most people can’t tell the difference between AI-generated voices and real human conversation, according to multiple studies.

This confusion can have disastrous consequences on how we see the world. When you get confused about what’s real or not on screen, you can start to believe misinformation, and in worst cases, racist stereotypes about people being depicted in AI-generated videos.

But there might be one reliable way to suss out what’s AI, especially on video: Listen to how the people sound.

A range of AI experts shared the telltale signs of why the voices and sounds in an AI video can often reveal its synthetic origin. Here’s how.

AI voices in Sora videos often sound like they have downed five cups of coffee.

Illustration: HuffPost; Photos: Getty

AI voices in Sora videos often sound like they have downed five cups of coffee.

Listen for the over-caffeinated tone.

Real people have a natural rhythm to how they speak, so that some words are said more slowly than others. But AI voices often sound unnaturally rushed all the time.

Jeremy Carrasco, a video expert who debunks AI videos on social media, said he notices that videos from Sora ― an artificial intelligence video app owned by OpenAI ― often have an “overly energetic” quality. “They’re saying so much and they’re not saying much at all, they’re just cramming in words,” he said.

Even OpenAI is aware of this telltale sign. Too many em dashes in a text answer is known to be a giveaway in OpenAI’s ChatGPT answers that can reveal when someone’s cover letter or first date message got AI-generated.

In October, the hosts of video streaming show TBPN asked Bill Peeples, the head of Sora, about what the “em dash of [AI] video” was in an interview. His immediate response was telling.

“I think right now the ‘em dash’ is this slightly wired speech pattern in Sora where it likes to say a lot of words quickly,” Peeples said.

Watch out for garbled, slurred voices.

What we might call someone’s speaking rhythm is what linguists would call “coarticulation,” or how our voices physically go from one sound to another as air goes through our noses and out our mouths. And a lot of AI-generated speech is still bad at this and makes garbled sounds that appear to flatten out natural sound pitches.

“No human being would ever produce that same kind of garbled quality [as an AI-generated voice], because, literally, we can’t,” said Melissa Baese-Berk, a linguistics professor at the University of Chicago. “Our vocal track can’t go from one sound to another without some blurring of the information between those two sounds.”

Baese-Berk used the example of an AI subway meet-cute video where a woman meets a man she immediately calls her “husband.” The video fooled many people into believing it was real. But when the woman says “husband,” the “band” part of the word sounds “super duper weird,” she said. The “band” part of the word “is missing the natural coarticulatory information that happens when you move from the tip of your tongue to your lips,” Baese-Berk said.

“Only a robot could go from their tongue to their lips without having any kind of mashing up of those sounds,” Baese-Berk said.

This inhuman mash-up of words is by design.

“Text-to-speech models are trained to predict the most likely pronunciation of a word in sequence, but they often struggle to smoothly blend the sounds that connect words,” said Migüel Jetté, vice president of AI at Rev, a speech-to-text service. “For example, where a human might naturally say ‘didja’ instead of ‘did you,’ AI has a tendency to either over-enunciate each word, or blend them too abruptly.”

Pay attention to mispronounced words.

If there’s an obviously mispronounced word, that can also be a sign, Jetté said, because “AI voices can struggle with unusual or unique words that don’t appear in the training data.”

Google’s text-to-video Veo model, for example, “might not be cramming in as many words, but they will put them out of order, or the wrong person will say something,” Carrasco said he has observed.

Notice when emotional reactions don’t match the story of the video.

In a 2025 study that asked participants to rate which voices were AI or not, the AI voices created by text-to-speech models were only identified accurately 55% of the time. The biggest mistakes happened with AI voices that sounded angry.

This may be because participants expected AI voices to sound robotic, said Camila Bruder, a co-author of that study and a researcher from the Max Planck Institute for Empirical Aesthetics.

In reality, AI voices are often too emotional for what the scene calls for. If the AI voice is “too stereotypically happy, like, ‘Wow!’ or it’s stereotypically mad…like a bad actor,” these traits can be indicators that the video is AI, Bruder said.

Carrasco said you should also notice when what is being said is an odd emotional reaction. Take one viral AI video of fish falling from the sky. “They’re fish, they’re actually fish!” a woman in the video exclaims.

“They’re just narrating what’s happening on the screen. You wouldn’t do that in real life,” Carrasco said about this video. “If a bunch of fish were raining [down], I’d probably just say ‘What the fuck.’”

Compare the inappropriate AI emotions to the real-life horror a truck driver recently experienced when he was filmed watching a plane crash that happened in front of him in Kentucky. In this video, the driver doesn’t narrate his experience, his mouth simply drops open. “He’s just in disbelief. That’s kind of how a lot of these would be” in real life, Carrasco said.

You can also simply look at what people’s mouths are doing for clues. “The visual giveaways in these videos can be just as revealing as the audio,” Jetté said. “If the speaker’s lips don’t perfectly sync with the audio…that’s a strong indicator.”

These clues are helpful, but they aren’t always guaranteed.

PowerTheTruth

Your SupportFuelsOur Mission

Join HuffPost members who keep fearless, inclusive journalism alive.

We remain committed to providing you with the unflinching, fact-based journalism everyone deserves.

Thank you again for your support along the way. We’re truly grateful for readers like you! Your initial support helped get us here and bolstered our newsroom, which kept us strong during uncertain times. Now as we continue, we need your help more than ever. We hope you will join us once again.

We remain committed to providing you with the unflinching, fact-based journalism everyone deserves.

Support HuffPost

Of course, these clues are not always a guaranteed way to reveal an AI-generated voice. ElevenLabs, the AI lab which clones real voices, is good at adding vocal fry and human pauses, so listening for a voice that speaks without breaths isn’t “always the case” that it’s AI, Bruder said.

But as a whole, these telltale signs are a strong indicator that the video you are watching was probably created by a machine. And that’s a helpful start. As AI continues to evolve at breathtaking speeds, we need all the help we can get to understand what’s fake and what’s not.

“If something feels off, it probably is,” Jetté said. “A healthy dose of skepticism and a good eye and ear for detail can go a long way.”

The ‘AI Accent’ Is An Easy Way To Spot AI Videos

Tags: