{"id":84174,"date":"2025-08-20T17:22:18","date_gmt":"2025-08-20T17:22:18","guid":{"rendered":"https:\/\/www.newsbeep.com\/ca\/84174\/"},"modified":"2025-08-20T17:22:18","modified_gmt":"2025-08-20T17:22:18","slug":"what-counts-as-plagiarism-ai-generated-papers-pose-new-risks","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/ca\/84174\/","title":{"rendered":"What counts as plagiarism? AI-generated papers pose new risks"},"content":{"rendered":"\n<p>This January, Byeongjun Park, a researcher in artificial intelligence (AI), received a surprising e-mail. Two researchers from India told him that an AI-generated manuscript had used methods from one of his papers, without credit.<\/p>\n<p>Park looked up the manuscript. It wasn\u2019t formally published, but had been posted online (see <a href=\"http:\/\/go.nature.com\/45pdgqb\" data-track=\"click\" data-label=\"http:\/\/go.nature.com\/45pdgqb\" data-track-category=\"body text link\" rel=\"nofollow noopener\" target=\"_blank\">go.nature.com\/45pdgqb<\/a>) as one of a number of papers generated by a tool called <a href=\"https:\/\/www.nature.com\/articles\/d41586-024-02842-3\" data-track=\"click\" data-label=\"https:\/\/www.nature.com\/articles\/d41586-024-02842-3\" data-track-category=\"body text link\" rel=\"nofollow noopener\" target=\"_blank\">The AI Scientist<\/a> \u2014 announced in 2024 by researchers at Sakana AI, a company in Tokyo<a href=\"#ref-CR1\" data-track=\"click\" data-action=\"anchor-link\" data-track-label=\"go to reference\" data-track-category=\"references\">1<\/a>.<\/p>\n<p>The AI Scientist is an example of fully automated research in computer science. The tool uses a large language model (LLM) to generate ideas, writes and runs the code by itself, and then writes up the results as a research paper \u2014 clearly marked as AI-generated. It\u2019s the start of an effort to have AI systems make their own research discoveries, says the team behind it.<\/p>\n<p>The AI-generated work wasn\u2019t copying his paper directly, Park saw. It proposed a new architecture for diffusion models, the sorts of model behind image-generating tools. Park\u2019s paper dealt with improving how those models are trained<a href=\"#ref-CR2\" data-track=\"click\" data-action=\"anchor-link\" data-track-label=\"go to reference\" data-track-category=\"references\">2<\/a>. But to his eyes, the two did share similar methods. \u201cI was surprised by how closely the core methodology resembled that of my paper,\u201d says Park, who works at the Korea Advanced Institute of Science and Technology (KAIST) in Daejeon, South Korea.<\/p>\n<p><a href=\"https:\/\/www.nature.com\/articles\/d41586-024-02842-3\" class=\"u-link-inherit\" data-track=\"click\" data-track-label=\"recommended article\" rel=\"nofollow noopener\" target=\"_blank\"><\/p>\n<p class=\"recommended__title u-serif\">Researchers built an \u2018AI Scientist\u2019 \u2014 what can it do?<\/p>\n<p><\/a><\/p>\n<p>The researchers who e-mailed Park, Tarun Gupta and Danish Pruthi, are computer scientists at the Indian Institute of Science in Bengaluru. They say that the issue is bigger than just his paper.<\/p>\n<p>In February, Gupta and Pruthi reported<a href=\"#ref-CR3\" data-track=\"click\" data-action=\"anchor-link\" data-track-label=\"go to reference\" data-track-category=\"references\">3<\/a> that they\u2019d found multiple examples of AI-generated manuscripts that, according to external experts they consulted, used others\u2019 ideas without attribution, although without directly copying words and sentences.<\/p>\n<p>Gupta and Pruthi say that this amounts to the software tools plagiarizing other ideas \u2014 albeit with no ill intention on the part of their creators. \u201cA significant portion of LLM-generated research ideas appear novel on the surface but are actually skillfully plagiarized in ways that make their originality difficult to verify,\u201d they write.<\/p>\n<p>In July, their work won an \u2018outstanding paper\u2019 award at the Association for Computational Linguistics conference in Vienna.<\/p>\n<p>But some of their findings are disputed. The team behind The AI Scientist told Nature that it strongly disagrees with Gupta and Pruthi\u2019s findings, and doesn\u2019t accept that any plagiarism occurred in The AI Scientist case studies that the paper examines. In Park\u2019s specific case, one independent specialist told Nature that he thought the AI manuscript\u2019s methods didn\u2019t overlap enough with Park\u2019s paper to be termed plagiarism. Park himself also demurred at using \u2018plagiarism\u2019 to describe what he saw as a strong methodological overlap.<\/p>\n<p>Beyond the specific debate about The AI Scientist lies a broader concern. So many papers are published each year \u2014 especially in computer science \u2014 that researchers already struggle to keep track of whether their ideas are really innovative, says Joeran Beel, a specialist in machine-learning and information science at the University of Siegen, Germany.<\/p>\n<p>And if more LLM-based tools are used to generate ideas, this could deepen the erosion of intellectual credit in science. Because LLMs work in part by remixing and interpolating the text they\u2019re trained on, it would be natural for them to borrow from earlier work, says Parshin Shojaee, a computer scientist at the Virginia Tech Research Center \u2014 Arlington.<\/p>\n<p>The issue of \u2018idea plagiarism\u2019, although little discussed, is already a problem with human-authored papers, says Debora Weber-Wulff, a plagiarism researcher at the University of Applied Sciences, Berlin, and she expects that it will get worse with work created by AI. But, unlike the more familiar forms of plagiarism \u2014 involving copied or subtly rewritten sentences \u2014 it\u2019s hard to prove the reuse of ideas, she says.<\/p>\n<p>That makes it difficult to see how to automate the task of checking for true novelty or originality, to match the pace at which AIs are going to be able to synthesize manuscripts.<\/p>\n<p>\u201cThere\u2019s no one way to prove idea plagiarism,\u201d Weber-Wulff says.<\/p>\n<p>Overlapping methods<\/p>\n<p>Bad actors can, of course, already use AI to deliberately plagiarize others or rewrite others\u2019 work to pass it off as their own (see <a href=\"https:\/\/doi.org\/gt5rjz\" data-track=\"click\" data-label=\"https:\/\/doi.org\/gt5rjz\" data-track-category=\"body text link\" rel=\"nofollow noopener\" target=\"_blank\">Nature https:\/\/doi.org\/gt5rjz; 2025<\/a>). But Gupta and Pruthi wondered if well-intentioned AI approaches might be using others\u2019 methods or ideas too.<\/p>\n<p>Gupta and Pruthi were first alerted to the issue when they read a 2024 study led by Chenglei Si, a computer scientist at Stanford University in California<a href=\"#ref-CR4\" data-track=\"click\" data-action=\"anchor-link\" data-track-label=\"go to reference\" data-track-category=\"references\">4<\/a>. Si\u2019s team asked both people and LLMs to generate \u201cnovel research ideas\u201d on topics in computer science. Although Si\u2019s protocol included a novelty check and asked human reviewers to assess the ideas, Gupta and Pruthi argue that some of the AI-generated ideas produced by the protocol nevertheless lifted from existing works \u2014 and so weren\u2019t \u2018novel\u2019 at all.<\/p>\n<p>They picked out one of the AI-generated ideas in Si\u2019s paper, which they say borrowed from a paper first posted as a preprint<a href=\"#ref-CR5\" data-track=\"click\" data-action=\"anchor-link\" data-track-label=\"go to reference\" data-track-category=\"references\">5<\/a> in 2023. Si tells Nature that he agrees that the \u2018high-level\u2019 idea was similar to material in the preprint, but that \u201cwhether the low-level implementation differences count as novelty is probably a subjective judgement\u201d. Shubhendu Trivedi, a machine-learning researcher who co-authored that 2023 preprint, and was until recently at the Massachusetts Institute of Technology in Cambridge, says that \u201cthe LLM-generated paper was basically very similar to our paper, despite some superficial-level differences\u201d.<\/p>\n<p><a href=\"https:\/\/www.nature.com\/articles\/d41586-024-02371-z\" class=\"u-link-inherit\" data-track=\"click\" data-track-label=\"recommended article\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" class=\"recommended__image\" alt=\"\" src=\"https:\/\/www.newsbeep.com\/ca\/wp-content\/uploads\/2025\/08\/d41586-025-02616-5_27429202.jpg\"\/><\/p>\n<p class=\"recommended__title u-serif\">AI is complicating plagiarism. How should scientists respond?<\/p>\n<p><\/a><\/p>\n<p>Gupta and Pruthi further tested their concern by taking the four AI-generated research proposals publicly released by Si\u2019s team and the ten AI manuscripts released by Sakana AI, and generated 36 fresh proposals themselves, using Si\u2019s methodology. They then asked 13 specialists to try to find overlaps in methods between the AI-made works and existing papers, using a 5-point scale, on which 5 corresponded to a \u2018one-to-one mapping in methods\u2019 and 4 to \u2018mix-and-match from two-to-three prior works\u2019; 3 and 2 represented more-modest overlaps and 1 indicated no overlap. \u201cIt\u2019s essentially about copying of the idea or crux of the paper,\u201d says Gupta.<\/p>\n<p>The researchers also asked the authors of original papers identified by the specialists to give their own views on the overlaps.<\/p>\n<p>Including this step, Gupta and Pruthi report that 12 papers in their sample of AI-generated works reached levels 4 and 5, implying, they said, a plagiarism proportion of 24%; the figure rises to 18 (36%) if cases in which the original authors didn\u2019t reply are included. Some were from Sakana\u2019s and Si\u2019s work, although Gupta and Pruthi discuss in detail only the examples reported in this story.<\/p>\n<p>They also said they\u2019d found a similar kind of overlap in an AI-generated manuscript (see <a href=\"http:\/\/go.nature.com\/4oym4ru\" data-track=\"click\" data-label=\"http:\/\/go.nature.com\/4oym4ru\" data-track-category=\"body text link\" rel=\"nofollow noopener\" target=\"_blank\">go.nature.com\/4oym4ru<\/a>) that, Sakana announced this March, had passed through a stage of peer review for a workshop at a prestigious machine-learning conference, the International Conference on Learning Representations.<\/p>\n<p>At the time, the firm said that this was the first fully-AI-generated paper to pass human peer review. It also explained that it had agreed with workshop organizers to trial putting AI-generated papers into peer review and to withdraw them if they were accepted, because the community hadn\u2019t yet decided whether AI-generated papers should be published in conference proceedings. (The workshop organizers declined Nature\u2019s request for comment.)<\/p>\n<p>Gupta and Pruthi say that this paper borrowed its core contribution from a 2015 work<a href=\"#ref-CR6\" data-track=\"click\" data-action=\"anchor-link\" data-track-label=\"go to reference\" data-track-category=\"references\">6<\/a>, without citing it. Their report quotes the authors of that paper, computer scientists David Krueger and Roland Memisevic, as saying that the Sakana work is \u201cdefinitively not novel\u201d, and identifying a second uncited manuscript<a href=\"#ref-CR7\" data-track=\"click\" data-action=\"anchor-link\" data-track-label=\"go to reference\" data-track-category=\"references\">7<\/a> that the paper borrowed from.<\/p>\n<p>Another computer scientist, Radu Ionescu at the University of Bucharest, told Nature he rated the similarity between the AI-generated work and Krueger and Memisevic\u2019s paper as a 5.<\/p>\n<p>Krueger, who is at the University of Montreal in Canada, told Nature that the related works should have been cited, but that he \u201cwouldn\u2019t be surprised to see human researchers reinvent this and miss previous work\u201d too. \u201cI think this AI system and others are not capable of achieving academic standards for referencing related work,\u201d he said, adding that the AI paper was \u201cextremely low quality overall\u201d. But he wasn\u2019t sure whether the word plagiarism should be applied, because he feels that term implies that the person (or AI tool) reusing methods was aware of earlier work, but chose not to cite it.<\/p>\n<p>Pushback<\/p>\n<p>The team behind The AI Scientist, which includes researchers at the University of Oxford, UK, and the University of British Columbia in Vancouver, Canada, pushed back strongly against Gupta and Pruthi\u2019s work when asked by Nature. \u201cThe plagiarism claims are false,\u201d the team wrote in an e-mailed point-by-point critique, adding that they were \u201cunfounded, inaccurate, extreme, and should be ignored\u201d.<\/p>\n<p>On two AI Scientist manuscripts discussed in Gupta and Pruthi\u2019s paper, for instance, the team says that these works have different hypotheses from those in the earlier papers and apply them to different domains, even if some elements of the methods are related.<\/p>\n<p><a href=\"https:\/\/www.nature.com\/articles\/d41586-025-01463-8\" class=\"u-link-inherit\" data-track=\"click\" data-track-label=\"recommended article\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" class=\"recommended__image\" alt=\"\" src=\"https:\/\/www.newsbeep.com\/ca\/wp-content\/uploads\/2025\/08\/d41586-025-02616-5_50971384.jpg\"\/><\/p>\n<p class=\"recommended__title u-serif\">Is it OK for AI to write science papers? Nature survey shows researchers are split<\/p>\n<p><\/a><\/p>\n<p>The references found by the specialists for Gupta and Pruthi\u2019s analysis are work that the AI-generated papers could have cited, but nothing more, the AI Scientist team says, adding: \u201cWhat they should have reported is some related work that went uncited (a daily occurrence by human authors).\u201d The team says it would be \u201cappropriate\u201d to have cited Park\u2019s paper. In the case of Krueger\u2019s paper and the second uncited manuscript, the AI Scientist team says, \u201cthese two papers are related, so, while it is an everyday occurrence by humans not to include works like this, it would have been good for The AI Scientist to cite them\u201d.<\/p>\n<p>Ben Hoover, a machine-learning researcher at the Georgia Institute of Technology in Atlanta who specializes in diffusion models, told Nature that he\u2019d score the overlap with Park\u2019s paper as a \u20183\u2019 on Gupta\u2019s scale. He said the AI-generated paper is of much lower quality and less thorough than Park\u2019s work, and should have cited it, but \u201cI would not go so far as to say plagiarism.\u201d Gupta and Pruthi\u2019s analysis relies on \u2018superficial similarities\u2019 between generic statements in the AI-generated work that, when read in detail, don\u2019t meaningfully map to Park\u2019s paper, he adds. Ionescu told Nature he would give the AI-generated paper a rating of 2 or 3.<\/p>\n<p>Park judges the overlap with his paper to be much stronger than Hoover\u2019s and Ionescu\u2019s ratings. He says he would give it a score of 5 on Gupta\u2019s scale, and adds that it \u201creflects a strong methodological resemblance that I consider noteworthy.\u201d Even so, this does not necessarily align with what he sees as the legal or ethical definition of plagiarism, he told Nature.<\/p>\n<p>What counts as plagiarism<\/p>\n<p>Part of the disagreement could stem from different operational understandings of what \u2018plagiarism\u2019 means, especially when it comes to overlap in ideas or methods. Researchers who study plagiarism hold different views on the term from those of some of the computer scientists in the current debate, says Weber-Wulff.<\/p>\n<p>\u201cPlagiarism is a word we should and do reserve for extreme cases of intentional fraudulent cheating,\u201d the AI Scientist team wrote, adding that Gupta and Pruthi \u201care wildly out of line with established conventions regarding what counts as plagiarism in academia\u201d. But Weber-Wulff disagrees: she says that intent shouldn\u2019t be a factor. \u201cThe machine has no intent,\u201d she says. \u201cWe don\u2019t have a good mechanism for explaining why the system is saying something and where it got it from, because these systems are not built to give references.\u201d<\/p>\n<p>Weber-Wulff\u2019s own favoured definition of plagiarism is that it occurs when a manuscript \u201cuses words, ideas, or work products attributable to another identifiable person or source without properly attributing the work to the source from which it was obtained in a situation in which there is a legitimate expectation of original authorship\u201d. That definition was produced by Teddi Fishman, the former director of a US non-profit consortium of universities called the International Center for Academic Integrity.<\/p>\n","protected":false},"excerpt":{"rendered":"This January, Byeongjun Park, a researcher in artificial intelligence (AI), received a surprising e-mail. Two researchers from India&hellip;\n","protected":false},"author":2,"featured_media":84175,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[49,48,5127,1099,796,1100,66,50295],"class_list":{"0":"post-84174","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-science","8":"tag-ca","9":"tag-canada","10":"tag-computer-science","11":"tag-humanities-and-social-sciences","12":"tag-machine-learning","13":"tag-multidisciplinary","14":"tag-science","15":"tag-scientific-community"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/posts\/84174","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/comments?post=84174"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/posts\/84174\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/media\/84175"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/media?parent=84174"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/categories?post=84174"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/tags?post=84174"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}