A new kind of scandal is growing online—and aimed at the wrong target.

Sign up for the Slatest to get the most insightful analysis, criticism, and advice out there, delivered to your inbox daily.

Over the past month, A.I. detection has been at the center of a series of controversies: Hachette pulled the horror novel Shy Girl by Mia Ballard after detectors flagged it as substantially A.I.-generated. The New York Times cut ties with a freelance book critic who admitted that an A.I. editing tool had regurgitated passages from a Guardian article into his draft. The Atlantic reported that a “Modern Love” column had been flagged as more than 60 percent A.I.-generated. In certain corners of social media, A.I.-detector screenshots are shared like mug shots, and pile-ons have the grim energy of public stonings.

This may all seem understandable—people want to know if what they’re reading was generated by a bot, and some argue they deserve to know. However, such controversy narrows the issue of A.I.’s steady encroachment to one of process, rather than impact. Drawing a red line around using chatbots to generate prose may make it easier to ignore the way that the technology may be shaping writing before one even types a single word. And a culture of callouts, scandals, and fear may prevent media and publishing from wrestling with much thornier questions of authorship.

At the center of many of these controversies is a company called Pangram, whose CEO, Max Spero, has become the go-to authority when A.I. authorship disputes erupt. On Twitter/X, where Spero calls himself a “slop janitor,” a user flagged a Guardian sports journalist’s writing as A.I.-generated. The publication responded that this was “the same style he’s used for 11 years writing for the Guardian, long before LLMs existed. The allegation is preposterous.” Spero quote-tweeted the exchange with a Pangram time-series analysis of 871 articles by the journalist: “It’s clear that he is increasingly relying on AI. In two weeks in February he churned out nine articles classified by Pangram as fully AI-generated. Receipts below.”

Or take Pangram’s appearance in the Shy Girl cancellation. Readers on Reddit and YouTube had been flagging the horror novel as suspiciously A.I. for months, but then Spero ran the full manuscript and posted the result (78 percent A.I.-generated). Hachette pulled the book the day the Times piece ran. A story in the Atlantic soon followed. Spero was on LinkedIn, urging publishers to “strictly moderat[e] AI generated content” and “draft and enforce robust AI-use policy.”

A pattern emerges: The crowd suspects a problem, then Pangram validates the suspicion, stokes the mob, and sells the solution. The impulse to dismiss all this as a detector company drumming up business runs into an issue—Pangram actually works way better than you might think. Brian Jabarian, a University of Chicago economist who conducted a rigorous independent evaluation of A.I. detectors, told me flatly, “This narrative that we shouldn’t use A.I. detection doesn’t seem to hold anymore.”

Jabarian’s preprint, co-authored with Alex Imas and with no disclosed financial ties to the company, tested the tool across nearly 2,000 passages and found near-zero false-positive and false-negative rates on medium-to-long texts, the length of a typical op-ed or a verbose Amazon review. Independent benchmarks confirm that Pangram outperforms every other detector tested and is robust against “humanizers,” or software designed to smuggle A.I. text past detectors. So when Spero posts a time-series chart of hundreds of articles showing when a journalist’s output started sounding fishily like ChatGPT, I am inclined to believe it.

That A.I. detection is finally catching up is, on balance, a Good Thing. A.I.-generated articles already far outnumber human ones. Social media is flooded with low-effort slop. According to Pangram’s own research, a fifth of peer reviews submitted to the A.I. research conference ICLR are fully A.I.-generated, and 9 percent of American newspapers contain undisclosed bot use. In this A.I.-powered asphyxiation of the information ecosystem, Spero has positioned himself on social media as a folk hero hauling in the oxygen tanks. You can tag his company’s bot on Twitter/X, and it will tell you whether a post is A.I. On Spero’s social media to-do list: a “slop hunter of the week leaderboard.”

Pangram may be great for A.I. slop, but its performance probably varies in the wild. “If you copy-paste chunks of ChatGPT with minimal edits, then Pangram is fairly accurate,” Tuhin Chakrabarty, an assistant professor of computer science at SUNY, told me. “If you significantly edit an A.I.-generated text, then it becomes human, and this is a harder problem in general.”

That matters because in the real world, A.I. use comes in a spectrum. In Pangram’s newspaper study, for example, 86.5 percent of the chatbot use detected in opinion pieces at the Times, the Wall Street Journal, and the Washington Post was classified as “mixed,” or some unknown entanglement of human and machine. Did the writer use Claude to help with transitions? Or generate an opinion piece fully formed from ChatGPT and slap a name on it? These distinctions matter. Pangram’s latest version now outputs scores on a continuum and is making genuine progress on these gray-zone instances, but those cases remain far less validated than the extremes.

To complicate matters further, not everyone may equally bear the burden of false accusations. Liam Dugan, a Ph.D. student at the University of Pennsylvania whose dissertation focuses on A.I. detection and has benchmarked the major commercial detectors, told me: “For most people, they might never, ever get a false positive. And for other people, the false positives are sort of disproportionately allocated on them because they just happen to write like A.I.”

Some A.I. detectors are more likely to flag non-native speakers of English. (According to a Pangram blog post, the company has largely solved this problem, but there is no independent audit of this assertion.) Apart from non-native speakers, there may be other subgroups of writers whose prose has the focus-grouped sheen of ChatGPT output. Opinion writing in major newspapers, in fact, comes to mind.

Not only does A.I. keep improving, but humans are also beginning to speak and write like A.I., narrowing the gap that detectors rely on to make their calls. This makes A.I. detection inherently an arms race, so the performance of any given detector will likely fluctuate over time. Academics I spoke with all emphasized that the state of A.I. detection is much better today than it was in 2023 but cautioned against letting the narrative pendulum swing too far in the other direction. Jabarian told me, “Maybe we went from a world where people were not using detection because it was so bad, and now maybe people think it works all the time.”

And when the technicalities of A.I. detection collide with cancel culture, it does not lead anywhere productive. Take a dustup Spero found himself in a few weeks ago with the Wall Street Journal. Pangram’s newspaper study identified specific op-ed writers, including three at the Journal. James Taranto, who edits those pages, responded with a combative piece. He ran the flagged articles through Pangram and got different scores, contacted the accused writers, and concluded that the accusations of “A.I.-generated” didn’t hold up.

The response is instructive for what it pursued and what it avoided. Taranto investigated Pangram’s consistency and found enough variation to dismiss it. But quibbling over individual op-eds let him sidestep some uncomfortable introspection. Even if Pangram misfires on a given op-ed, the study’s broader pattern—that A.I. use is showing up across major newspaper opinion pages, his own included—is impossible to argue with. He did not have to ask how his editorial oversight had failed to spot a discomfiting level of undisclosed A.I. use. Basically, he wrote a hit piece on the thermometer instead of asking why he had a fever. This incident reveals why chatbot callout culture leads nowhere. Spero called out Taranto; Taranto called out Spero. Nothing changed. “I think it may have been a mistake to name names,” Spero told me.

Ash Jurberg
They Were Once Essential to So Many Writers. Now They’re Quietly Vanishing Across the Internet.
Read More

But the larger issue may be that when it comes to A.I.-assisted writing, red lines perhaps are being drawn around the wrong thing. On Substack recently, Nicholas Thompson, the CEO of the Atlantic, shared a writer’s account of using Claude to build a custom editing rubric while instructing the A.I., “You are not a co-writer.” Thompson called it “a cool example of how you can use A.I. to help your writing—without relying on it for any actual writing.” Elsewhere, he has deemed the practice of using chatbots to generate sentences “unethical and wrong.” This is becoming the standard position: A.I. for everything upstream of prose is acceptable; A.I. for the prose itself is a betrayal.

To understand why this is a problem, consider a simple exercise I ask reporters to perform when I run workshops on journalism and A.I. I hand each reporter one of two A.I.-generated research reports on collagen supplements—same underlying studies, same data, different framing. Report A opens with positive clinical findings and mentions industry funding as a limitation. Report B opens with the funding-bias analysis and loudly labels which results are industry funded. Report A primes a “Does collagen work?” story. Report B primes a “Why you can’t trust collagen research” story. To be clear, both are reasonable reads of the literature, but the reporters would write different stories because of how the A.I. decided to order the same information. In each case, their writing would sail through a detector. Meanwhile, a reporter who did her own research but asked an A.I. to clean up her prose might get flagged. She was arguably the least influenced of the three, yet the moral intuitions of writers seem to be that she most betrayed the craft.

I understand that moral intuition and agree that passing off A.I.-generated prose as one’s own breaks the writer-reader contract. (I also acknowledge the aesthetic and moral revulsion of generative A.I., full stop.) But what many in journalism seem most concerned about is what’s at stake in newsmaking, that the perspectives of A.I. models shape writing and thus public opinion. A recent piece in Thompson’s Atlantic called this prospect “terrifying” and proposed a suite of solutions: disclosure policies, editor training on A.I. tells, detection software, penalties for violators. Every recommendation targets prose, while the upstream-influence problem, something the author herself seemed most concerned with, received no actionable attention at all. It’s all about process, not about impact.

They’re Calling It the Saddest Business Pivot of All Time

A Netflix Documentary Has a Bizarre Idea About How to Get Pregnant. I’ve Watched My Patients Fall for It.

A New Kind of Scandal Is Growing Online. It’s Ruining Careers—and Aimed at the Wrong Target.

If the goal is writerly independence, then drawing the red line at writing protects the independence of the final product far less than one might hope. Rather, it protects a feeling of independence. And to be clear, I’m not saying that relying on A.I. for research is even bad. Yes, chatbots are unusually persuasive, and writers pick up model biases without even knowing it, but the baseline isn’t some platonic ideal of a perfectly objective journalist. The question is, how does A.I.-based research either reinforce or counteract the biases in information-gathering processes that journalists already use?

And Thompson’s red line happens to align perfectly not only with what companies like Pangram can now measure and sell, but also with what vigilantes can police on social media. This troubles me because A.I. detection, even at its best, is going to be a moving target. Build a culture of accusation on that foundation, and you get something not only brittle but perhaps even falsely reassuring: a system that comforts readers and writers that the A.I. problem has been solved while harder questions go unasked.

A new kind of scandal is growing online—and aimed at the wrong target.

Tags: