Is artificial intelligence (AI) an emerging rival to the familiar decades-old excuse by cheaters that “the dog ate my homework?” New research published in the science journal Nature shows that delegating tasks for AI to perform carries an increased risk of cheating.

“Our findings point to the urgent need for not only technical guardrails but also a broader management framework that integrates machine design with social and regulatory oversight,” wrote the study’s corresponding authors Iyad Rahwan, Jean-François Bonnefon, Nils Köbis, and Zoe Rahwan, in collaboration with co-authors Raluca Rilla, Bramantyo Ibrahim Supriyatno, Clara Bersch, and Tamer Ajaj. The researchers for the study are affiliated with the Max Planck Institute for Human Development, the Toulouse School of Economics, and the University of Duisburg-Essen.

Use of AI is rising at home and at work. According to a Pew Research Center poll conducted in June 2025, 73 percent of U.S. adults would be willing to allow AI to assist them at least a little with their day-to-day activities, and 62 percent report that they interact with AI at least several times a week. At work, the use of AI in a work role a few times a year or more by American workers has doubled to reach 40 percent in 2025, up from 21 percent in 2023, according to Gallup. With the rising ubiquity of daily use, understanding the risks associated with delegating tasks to AI algorithms is of paramount importance.

To conduct the research for this current study, the team subdivided experiments into four main studies. They set up a behavioral sciences classic die-roll task to measure cheating behavior and human delegation to AI large language models (LLMs) and a real-world tax compliance experiment with LLMs.

The study evaluates rule specification, supervised learning, goal specification, and prompt engineering AI programming models. In rule specification, the participants explicitly instruct AI what to report. For supervised learning, the AI algorithms learn from training data sets that the participant specifies. The training data set choices include one where the die reports are always six, another where the die report corresponds exactly to the die rolls, and one that tests for occasional cheating, where the die reports were greater than or equal to the actual die roll.

The first and second studies used the classic die-roll task that has been widely used in scientific research. Human participants were asked to communicate the number they saw on a rolled die and were paid based on the number. The higher the number, the greater the payout was to the participant.

In the first experiment, 597 participants were randomly assigned to a condition. The control condition consisted of self-reporting 10 die roll results, and the other three were rules-based, supervised learning, or goal-based conditions that involved humans passing the reporting on the 10 die roll results to AI.

The second experiment, which had 801 study participants, was the same as the first experiment, with rule-based, goal-based, and supervised learning AI models, with the one key difference that the participants could elect to self-report the die results or delegate the task to AI to report.

For the third experiment, the researchers had 390 participants, acting as principals, write natural language instructions for humans and machines as well as perform the die-roll task. Then the team added 975 additional participants to serve as human agents.

Interestingly, human participants were more likely to be dishonest and cheat when they delegated tasks to AI in both voluntary and involuntary scenarios. The highest amount of cheating happened with the goal-setting AI condition, where a vast majority, 80 percent, of human participants cheated. The rule-based AI condition resulted in lower rates of cheating when delegating tasks to AI.

“Our results establish that people are more likely to request unethical behavior from machines than to engage in the same unethical behaviors themselves,” reported the research team.

The final experiment was a tax-evasion test to measure honesty, partial cheating, and full cheating by machines and 869 human participants. The four LLMs evaluated include GPT-4, GPT-4o, Claude 3.5 Sonnet, and Llama 3.3. The team discovered that humans less frequently followed unethical instructions, with the results ranging from 25 percent to up to 40 percent, in comparison to the LLM range of 58 percent to up to 98 percent.

“This finding suggests that prominent, readily available LLMs have insufficient default guardrails against unethical behavior,” the scientists concluded.

This study is proof-of-concept that humans delegating tasks to AI impacts ethical considerations and warrants developing safeguard strategies in the future.

Copyright © 2025 Cami Rosso. All rights reserved.