Artificial intelligence researchers are grappling with a problem core to their field: how to stop so-called “AI slop” from damaging confidence in the industry’s scientific work.
AI conferences have rushed to restrict the use of large language models for writing and reviewing papers in recent months after being flooded with a wave of poor AI-written content.
Scientists have warned that the surge of low-quality AI-generated material risks eroding trust and the integrity of the sector’s research by introducing false claims and made-up content.
“There is a little bit of irony to the fact that there’s so much enthusiasm for AI shaping other fields when, in reality, our field has gone through this chaotic experience because of the widespread use of AI,” said Inioluwa Deborah Raji, an AI researcher at the University of California, Berkeley.
Recent studies have highlighted the prevalence of the technology in AI research. In August, a study by Stanford University found that up to 22 per cent of computer science papers contained LLM usage.
An analysis by start-up Pangram estimated that 21 per cent of reviews at the prestigious International Conference on Learning Representations (ICLR) in 2025 were fully AI generated, and more than half of them included some AI use, such as editing. Of submitted papers, the company found that 9 per cent had more than half of their content generated by AI.
In November, reviewers at ICLR flagged a paper that was suspected of being AI generated that had made it into the top 17 per cent of papers based on scores from reviewers.
In January AI detection start-up GPTZero published research that found there were over 100 AI-generated errors across 50 papers last year at the Neural Information Processing Systems (NeurIPS) conference, considered the most prestigious place to publish cutting-edge AI research.
Increasing concerns over how the research community uses the technology led ICLR to update its AI usage guidelines ahead of the conference. This included the warning that papers that do not disclose “extensive” usage of LLMs will be rejected.
Researchers who use LLMs to create low-quality reviews of papers will also be penalised, which could include having their own research submissions declined.
“If you’re publishing really low-quality papers that are just wrong, why should society trust us as scientists?” said Hany Farid, a computer science professor at the University of California, Berkeley.
The rise in the number of papers produced by AI researchers was particularly evident last year, experts say.
The NeurIPS conference said it received 21,575 submissions in 2025, up from 17,491 in 2024 and 9,467 in 2020. One author had penned more than 100 papers at NeurIPS, which is substantially more than an average researcher usually does.
There has also been a significant increase in computer science-related research papers on arXiv, a free online repository, according to Thomas G Dietterich, emeritus professor of computer science at Oregon State University, who also chairs the computer science section of arXiv.
However, AI scientists say it is hard to say whether the rise is down to the use of LLMs or more active researchers in the field. Detecting AI-generated content remains difficult due to the lack of industry-wide standards or methods to reliably analyse papers.
A tell-tale sign is when papers contain hallucinated references in the bibliography, or figures that are wrong, said Dietterich. These users are then banned from submitting papers to arXiv for a while, he added.
Some AI experts argue that the widespread use of LLMs, fuelled by commercial incentives, has led to researchers focusing on quantity rather than quality. Critics say AI research has a culture of trying to publish as many papers as possible, which has led some scientists to take shortcuts.
“When we have these moments of incredibly impressive demos, incredibly high salaries, and these companies just going crazy, it just attracts a flood of outsider interest,” said Raji.
Experts say there are many legitimate ways to use AI tools for research, for example as brainstorming assistants and proofreaders.
“The quality of the writing in papers from China has increased dramatically, and I assume this is because LLMs are very good at rewriting English to make it fluent,” said Dietterich.
But the question of how to use the technology is becoming more important as companies such as Google, Anthropic and OpenAI promote their models as “co-scientists” that can help accelerate research in areas such as life sciences.
As part of tailoring their models for scientific research, AI groups typically train them on datasets scraped from academic sources. But if this includes increasing amounts of AI-generated papers, this could lead to undesired outcomes, such as model performance degrading, said Farid.
Previous research has shown that LLMs tend to “collapse” and produce gibberish when the dataset contains too much uncurated AI-generated data, which reduces the diversity of things an AI model can learn from.
“There’s . . . an incentive for the AI companies that are going out there indiscriminately scraping everything to want to know that these things [papers] are in fact not AI generated,” said Farid.
Kevin Weil, head of science at OpenAI, noted that LLMs are the same as any tool and have to be used responsibly. “It can be a massive accelerator that could help you explore new areas,” Weil said. “But you have to check it. It doesn’t absolve you from rigour.”