Dual-process theory and decision-making in large language models

Pierson, B. Lawsuit claims UnitedHealth AI wrongfully denies elderly extended care. Reuters https://www.reuters.com/legal/lawsuit-claims-unitedhealth-ai-wrongfully-denies-elderly-extended-care-2023-11-14 (2023).

Schreiber, M. New AI tool counters health insurance denials decided by automated algorithms. The Guardian https://www.theguardian.com/us-news/2025/jan/25/health-insurers-ai (2025).

Au Yeung, J. et al. AI chatbots not yet ready for clinical use. Front. Digit. Health 5, 1161098 (2023).

Article

Google Scholar

Azaria, A., Azoulay, R. & Reches, S. ChatGPT is a remarkable tool — for experts. Data Intell. 6, 240–296 (2024).

Article

Google Scholar

Meyrowitsch, D. W., Jensen, A. K., Sørensen, J. B. & Varga, T. V. AI chatbots and (mis)information in public health: impact on vulnerable communities. Front. Public Health 11, 1226776 (2023).

Article
PubMed
PubMed Central

Google Scholar

Kim, J. H. et al. When ChatGPT gives incorrect answers: the impact of inaccurate information by generative AI on tourism decision-making. J. Travel Res. 64, 51–73 (2023).

Article

Google Scholar

Sartori, G. & Orrù, G. Language models and psychological sciences. Front. Psychol. 14, 1279317 (2023).

Article
PubMed
PubMed Central

Google Scholar

Dillion, D., Tandon, N., Gu, Y. & Gray, K. Can AI language models replace human participants? Trends Cogn. Sci. 27, 597–600 (2023).

Article
PubMed

Google Scholar

Kaddour, J. et al. Challenges and applications of large language models. Preprint at arXiv http://arxiv.org/abs/2307.10169 (2023).

Griewing, S. et al. Evolution of publicly available large language models for complex decision-making in breast cancer care. Arch. Gynecol. Obstet. 310, 537–550 (2024).

Article
PubMed
PubMed Central

Google Scholar

Jimenez, C. E. et al. SWE-bench: can language models resolve real-world Github issues? Preprint at arXiv https://doi.org/10.48550/arXiv.2310.06770 (2024).

Jusman, I. A., Ausat, A. M. A. & Sumarna, A. Application of ChatGPT in business management and strategic decision making. J. Minfo Polgan 12, 1688–1697 (2023).

Article

Google Scholar

Lee, P., Bubeck, S. & Petro, J. Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. N. Engl. J. Med. 388, 1233–1239 (2023).

Article
PubMed

Google Scholar

Basir, A., Puspitasari, E. D., Aristarini, C. C., Sulastri, P. D. & Ausat, A. M. A. Ethical use of ChatGPT in the context of leadership and strategic decisions. J. Minfo Polgan 12, 1239–1246 (2023).

Article

Google Scholar

Gloria, B., Melsbach, J., Bienert, S. & Schoder, D. Real-GPT: efficiently tailoring LLMs for informed decision-making in the real estate industry. J. Real Estate Portf. Manag. 31, 56–72 (2024).

Article

Google Scholar

Binz, M. & Schulz, E. Using cognitive psychology to understand GPT-3. Proc. Natl Acad. Sci. USA 120, e2218523120 (2023).

Article
PubMed
PubMed Central

Google Scholar

Demszky, D. et al. Using large language models in psychology. Nat. Rev. Psychol. 2, 688–701 (2023).

Google Scholar

Hagendorff, T. Machine psychology: investigating emergent capabilities and behavior in large language models using psychological methods. Preprint at arXiv https://doi.org/10.48550/arXiv.2303.13988 (2023).

White, J. et al. A prompt pattern catalog to enhance prompt engineering with ChatGPT. Preprint at arXiv https://doi.org/10.48550/arXiv.2302.11382 (2023).

Abbate, F. Natural and artificial intelligence: a comparative analysis of cognitive aspects. Minds Mach. 33, 791–815 (2023).

Article

Google Scholar

Rich, A. S. & Gureckis, T. M. Lessons for artificial intelligence from the study of natural stupidity. Nat. Mach. Intell. 1, 174–180 (2019).

Article

Google Scholar

Shiffrin, R. & Mitchell, M. Probing the psychology of AI models. Proc. Natl Acad. Sci. USA 120, e2300963120 (2023).

Article
PubMed
PubMed Central

Google Scholar

Suri, G., Slater, L. R., Ziaee, A. & Nguyen, M. Do large language models show decision heuristics similar to humans? A case study using GPT-3.5. J. Exp. Psychol. Gen. 153, 1066–1075 (2024).

Article
PubMed

Google Scholar

Qu, Y. et al. Promoting interactions between cognitive science and large language models. Innovation 5, 100579 (2024).

PubMed
PubMed Central

Google Scholar

Thaler, R. H. Behavioral economics: past, present, and future. Am. Econ. Rev. 106, 1577–1600 (2016).

Article

Google Scholar

Kahneman, D. Maps of bounded rationality: psychology for behavioral economics. Am. Econ. Rev. 93, 1449–1475 (2003).

Article

Google Scholar

Evans, J. S. & Stanovich, K. E. Dual-process theories of higher cognition: advancing the debate. Persp. Psychol. Sci. 8, 223–241 (2013).

Article

Google Scholar

Evans, J. S. Dual-processing accounts of reasoning, judgment, and social cognition. Annu. Rev. Psychol. 59, 255–278 (2008).

Article
PubMed

Google Scholar

Kahneman, D. & Frederick, S. in Heuristics and Biases: The Psychology of Intuitive Judgment (eds Gilovich, T. et al.) 49–81 (Cambridge Univ. Press, 2002).

Gigerenzer, G. & Brighton, H. Homo heuristicus: why biased minds make better inferences. Top. Cogn. Sci. 1, 107–143 (2009).

Article
PubMed

Google Scholar

Kahneman, D. & Klein, G. Conditions for intuitive expertise: a failure to disagree. Am. Psychol. 64, 515–526 (2009).

Article
PubMed

Google Scholar

Smolensky, P. Connectionist AI, symbolic AI, and the brain. Artif. Intell. Rev. 1, 95–109 (1987).

Article

Google Scholar

Goel, A. Looking back, looking ahead: symbolic versus connectionist AI. AI Mag. 42, 83–85 (2022).

Google Scholar

Bellini-Leite, S. C. Dual process theory: embodied and predictive; symbolic and classical. Front. Psychol. 13, 805386 (2022).

Article
PubMed
PubMed Central

Google Scholar

Bellini-Leite, S. C. Dual process theory for large language models: an overview of using psychology to address hallucination and reliability issues. Adapt. Behav. 32, 329–343 (2023).

Article

Google Scholar

Clark, A. Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behav. Brain Sci. 36, 181–204 (2013).

Article
PubMed

Google Scholar

Newell, A. Physical symbol systems. Cogn. Sci. 4, 135–183 (1980).

Google Scholar

Marcus, G. The next decade in AI: four steps towards robust artificial intelligence. Preprint at arXiv https://doi.org/10.48550/arXiv.2002.06177 (2020).

Hagendorff, T., Fabi, S. & Kosinski, M. Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT. Nat. Comput. Sci. 3, 833–838 (2023).

Article
PubMed
PubMed Central

Google Scholar

Ma, D., Zhang, T. & Saunders, M. Is ChatGPT humanly irrational? Preprint at Res. Sq. https://doi.org/10.21203/rs.3.rs-3220513/v1 (2023).

Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 35, 24824–24837 (2022).

Google Scholar

Gigerenzer, G. & Gaissmaier, W. Heuristic decision making. Annu. Rev. Psychol. 62, 451–482 (2011).

Article
PubMed

Google Scholar

Tversky, A. & Kahneman, D. Judgment under uncertainty: heuristics and biases. Science 185, 1124–1131 (1974).

Article
PubMed

Google Scholar

Frederick, S. Cognitive reflection and decision making. J. Econ. Perspect. 19, 25–42 (2005).

Article

Google Scholar

Erickson, T. D. & Mattson, M. E. From words to meaning: a semantic illusion. J. Verbal Learn. Verbal Behav. 20, 540–551 (1981).

Article

Google Scholar

Toplak, M. E., West, R. F. & Stanovich, K. E. The cognitive reflection test as a predictor of performance on heuristics-and-biases tasks. Mem. Cogn. 39, 1275–1289 (2011).

Article

Google Scholar

Pennycook, G., Cheyne, J. A., Koehler, D. J. & Fugelsang, J. A. Is the cognitive reflection test a measure of both reflection and intuition? Behav. Res. Meth. 48, 341–348 (2016).

Article

Google Scholar

Chen, Y., Kirshner, S. N., Ovchinnikov, A., Andiappan, M. & Jenkin, T. A manager and an AI walk into a bar: does ChatGPT make biased decisions like we do? Manuf. Serv. Oper. Manag. 27, 339–678 (2025).

Article

Google Scholar

Jones, E. & Steinhardt, J. Capturing failures of large language models via human cognitive biases. Adv. Neural Inf. Process. Syst. 35, 11785–11799 (2022).

Google Scholar

Pantana, G., Castello, M. & Torre, I. Examining cognitive biases in ChatGPT 3.5 and ChatGPT 4 through human evaluation and linguistic comparison. In Proc. 16th Conf. Assoc. Mach. Transl. Am. (eds Knowles, R., Eriguchi, A. & Goel, S.) 250–260 (AMTA, 2024).

Ryu, J., Kim, J. & Kim, J. A study on the representativeness heuristics problem in large language models. IEEE Access. 12, 147958–147966 (2024).

Article

Google Scholar

Tversky, A. & Kahneman, D. The framing of decisions and the psychology of choice. Science 211, 453–458 (1981).

Article
PubMed

Google Scholar

Nickerson, R. S. Confirmation bias: a ubiquitous phenomenon in many guises. Rev. Gen. Psychol. 2, 175–220 (1998).

Article

Google Scholar

Lou, J. & Sun, Y. Anchoring bias in large language models: an experimental study. Preprint at arXiv https://doi.org/10.48550/arXiv.2412.06593 (2024).

Talboy, A. N. & Fuller, E. Challenging the appearance of machine intelligence: cognitive bias in LLMs and best practices for adoption. Preprint at arXiv https://doi.org/10.48550/arXiv.2304.01358 (2023).

Azaria, A. ChatGPT: more human-like than computer-like, but not necessarily in a good way. In 2023 IEEE 35th Int. Conf. Tools Artif. Intell. (eds Esposito, A., Yang, M. & Cordasco, G.) 468–473 (IEEE, 2023).

Acerbi, A. & Stubbersfield, J. M. Large language models show human-like content biases in transmission chain experiments. Proc. Natl Acad. Sci. USA 120, e2313790120 (2023).

Article
PubMed
PubMed Central

Google Scholar

Schramowski, P., Turan, C., Andersen, N., Rothkopf, C. A. & Kersting, K. Large pre-trained language models contain human-like biases of what is right and wrong to do. Nat. Mach. Intell. 4, 258–268 (2022).

Article

Google Scholar

Gallegos, I. O. et al. Bias and fairness in large language models: a survey. Comput. Linguist. 50, 1097–1179 (2024).

Article

Google Scholar

Wang, P., Xiao, Z., Chen, H. & Oswald, F. L. Will the real Linda please stand up…to large language models? Examining the representativeness heuristic in LLMs. Preprint at arXiv https://doi.org/10.48550/arXiv.2404.01461 (2024).

Nguyen, J. K. Human bias in AI models? Anchoring effects and mitigation strategies in large language models. J. Behav. Exp. Finance 43, 100971 (2024).

Article

Google Scholar

Tversky, A. & Kahneman, D. Extensional versus intuitive reasoning: the conjunction fallacy in probability judgment. Psychol. Rev. 90, 293–315 (1983).

Article

Google Scholar

Tversky, A. & Kahneman, D. Availability: a heuristic for judging frequency and probability. Cogn. Psychol. 5, 207–232 (1973).

Article

Google Scholar

Ariely, D., Loewenstein, G. & Prelec, D. “Coherent arbitrariness”: stable demand curves without stable preferences*. Q. J. Econ. 118, 73–106 (2003).

Article

Google Scholar

Newell, A. & Simon, H. A. Computer science as empirical inquiry: symbols and search. Commun. ACM 19, 113–126 (2007).

Article

Google Scholar

Yao, S. et al. Tree of thoughts: deliberate problem solving with large language models. Adv. Neural Inf. Process. Syst. 36, 11809–11822 (2023).

Google Scholar

Du, M. Machine vs. human, who makes a better judgment on innovation? Take GPT-4 for example. Front. Artif. Intell. 6, 1206516 (2023).

Article
PubMed
PubMed Central

Google Scholar

Ziegler, D. M. et al. Fine-tuning language models from human preferences. Preprint at arXiv https://doi.org/10.48550/arXiv.1909.08593 (2019).

Ouyang, L. et al. Training language models to follow instructions with human feedback. Adv. Neural Inf. Process. Syst. 35, 27730–27744 (2022).

Google Scholar

Su, D., Sukhbaatar, S., Rabbat, M., Tian, Y. & Zheng, Q. Dualformer: controllable fast and slow thinking by learning with randomized reasoning traces. Preprint at arXiv https://doi.org/10.48550/arXiv.2410.09918 (2025).

Shang, Y., Li, Y., Xu, F. & Li, Y. Defint: a default-interventionist framework for efficient reasoning with hybrid large language models. Preprint at arXiv 2402.02563 (2024).

Kojima, T., Gu, S. (Shane)., Reid. M., Matsuo. Y, & Iwasawa, Y. Large language models are zero-shot reasoners. Adv. Neural Inf. Process. Syst. 35, 22199–22213 (2022).

Google Scholar

Weston, J. & Sukhbaatar, S. System 2 Attention (is something you might need too). Preprint at arXiv https://doi.org/10.48550/arXiv.2311.11829 (2023).

Zhang, H., Huang, J., Li, Z., Naik, M. & Xing, E. Improved logical reasoning of language models via differentiable symbolic programming. In Find. Assoc. Comput. Linguist. (eds Rogers, A., Boyd-Graber, J. & Okazaki, N.) 3062–3077 (ACL, 2023).

Zhu, X. et al. Solving math word problems via cooperative reasoning induced language models. In Proc. 61st Ann. Meet. Assoc. Comput. Linguist. (Rogers, A., Boyd-Graber, J. & Okazaki, N.) 4471–4485 (ACM, 2023).

Raoelison, M., Thompson, V. A. & De Neys, W. The smart intuitor: cognitive capacity predicts intuitive rather than deliberate thinking. Cognition 204, 104381 (2020).

Article
PubMed

Google Scholar

De Neys, W. & Pennycook, G. Logic, fast and slow: advances in dual-process theorizing. Curr. Dir. Psychol. Sci. 28, 503–509 (2019).

Article

Google Scholar

Reyna, V. F. & Brainerd, C. J. Numeracy, gist, literal thinking and the value of nothing in decision making. Nat. Rev. Psychol. 2, 421–439 (2023).

Article

Google Scholar

Bubeck, S. et al. Sparks of artificial general intelligence: early experiments with GPT-4. Preprint at arXiv https://doi.org/10.48550/arXiv.2303.12712 (2023).

Sclar, M., Choi, Y., Tsvetkov, Y. & Suhr, A. Quantifying language models’ sensitivity to spurious features in prompt design or: how I learned to start worrying about prompt formatting. Prepint at arXiv https://doi.org/10.48550/arXiv.2310.11324 (2023).

Pezeshkpour, P. & Hruschka, E. Large language models sensitivity to the order of options in multiple-choice questions. In Find. Assoc. Comput. Linguist. (eds Duh, K., Gomez, H. & Bethard, S.) 2006–2017 (ACL, 2024).

Loya, M., Sinha, D. A. & Futrell, R. Exploring the sensitivity of LLMs’ decision-making capabilities: insights from prompt variation and hyperparameters. In Find. Assoc. Comput. Linguist. (eds Bouamor, H., Pino, J. & Bali, K.) 3711–3716 (ACL, 2023).

Barez, F. et al. Chain-of-thought is not explainability. Preprint at arXiv https://www.alphaxiv.org/overview/2025.02v3 (2025).

Zhang, Z. et al. Multimodal chain-of-thought reasoning in language models. Trans. Mach. Learn. Res. https://openreview.net/forum?id=y1pPWFVfvR (2024).

Chakraborty, N., Ornik, M. & Driggs-Campbell, K. Hallucination detection in foundation models for decision-making: a flexible definition and review of the state of the art. ACM Comput. Surv. 57, 188:1–188:35 (2025).

Article

Google Scholar

Stella, M., Hills, T. T. & Kenett, Y. N. Using cognitive psychology to understand GPT-like models needs to extend beyond human biases. Proc. Natl Acad. Sci. USA 120, e2312911120 (2023).

Article
PubMed
PubMed Central

Google Scholar

Huang, L. et al. A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions. ACM Trans. Inf. Syst. 43, 42 (2025).

Article

Google Scholar

Smith, A. L., Greaves, F. & Panch, T. Hallucination or confabulation? Neuroanatomy as metaphor in large language models. PLoS Digital Health 2, e0000388 (2023).

Article
PubMed
PubMed Central

Google Scholar

Singh, A. K., Lamichhane, B., Devkota, S., Dhakal, U. & Dhakal, C. Do large language models show human-like biases? Exploring confidence–competence gap in AI. Information 15, 92 (2024).

Article

Google Scholar

Borji, A. A Categorical archive of ChatGPT failures. Preprint at arXiv https://doi.org/10.48550/arXiv.2302.03494 (2023).

Kopelman, M. D. Varieties of confabulation and delusion. Cogn. Neuropsychiatry 15, 14–37 (2010).

Article
PubMed

Google Scholar

Johnson, D. D. P. & Fowler, J. H. The evolution of overconfidence. Nature 477, 317–320 (2011).

Article
PubMed

Google Scholar

McCoy, R. T., Yao, S., Friedman, D., Hardy, M. & Griffiths, T. L. Embers of autoregression: understanding large language models through the problem they are trained to solve. Preprint at arXiv https://doi.org/10.48550/arXiv.2309.13638 (2023).

Perković, G., Drobnjak, A. & Botički, I. Hallucinations in LLMs: understanding and addressing challenges. In 47th MIPRO ICT Electron. Conv. (ed. Babic, S.) 2084–2088 (IEEE, 2024).

Sun, F., Li, N., Wang, K. & Goette, L. Large language models are overconfident and amplify human bias. Preprint at arXiv https://doi.org/10.48550/arXiv.2505.02151 (2025).

Casper, S. et al. Open problems and fundamental limitations of reinforcement learning from human feedback. Trans. Mach. Learn. Res. https://doi.org/10.3929/ethz-b-000651806 (2023).

Article

Google Scholar

Zhou, K., Hwang, J. D., Ren, X. & Sap, M. Relying on the unreliable: the impact of language models’ reluctance to express uncertainty. Preprint at arXiv https://doi.org/10.48550/arXiv.2401.06730 (2024).

Zhou, K., Jurafsky, D. & Hashimoto, T. Navigating the grey area: how expressions of uncertainty and overconfidence affect language models. Preprint at arXiv https://doi.org/10.48550/arXiv.2302.13439 (2023).

Dunning, D. in Advances in Experimental Social Psychology Vol. 44 (eds Olson, J. M. & Zanna, M. P.) 247–296 (Academic, 2011).

Ji, Z. et al. Towards mitigating LLM hallucination via self reflection. In Find. Assoc. Comput. Linguist. (eds Bouamor, H., Pino, J. & Bali, K.) 1827–1843 (ACL, 2023).

O’Leary, D. E. Confirmation and specificity biases in large language models: an explorative study. IEEE Intell. Syst. 40, 63–68 (2025).

Article

Google Scholar

Wei, J. et al. Emergent abilities of large language models. Trans. Mach. Learn. Res. https://openreview.net/forum?id=yzkSU5zdwD (2022).

McCoy, R. T., Pavlick, E. & Linzen, T. Right for the wrong reasons: diagnosing syntactic heuristics in natural language inference. In Proc. Conf. 57th Ann. Meet. Assoc. Comput. Linguist. (eds Korhonen, A., Traum, D. & Màrquez, L.) 3428–3448 (ACL, 2020).

Webson, A. & Pavlick, E. Do prompt-based models really understand the meaning of their prompts? In Proc. 2022 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. (eds Carpuat, M., de Marneffe, M.-C. & Meza Ruiz, I. V.) 2300–2344 (ACL, 2022).

Lampinen, A. K. et al. Language models, like humans, show content effects on reasoning tasks. PNAS Nexus 3, 233 (2024).

Article

Google Scholar

Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).

Google Scholar

Zheng, H. & Zhan, H. ChatGPT in scientific writing: a cautionary tale. Am. J. Med. 136, 725–726.e6 (2023).

Article
PubMed

Google Scholar

Mitchell, M. & Krakauer, D. C. The debate over understanding in AI’s large language models. Proc. Natl Acad. Sci. USA 120, e2215907120 (2023).

Article
PubMed
PubMed Central

Google Scholar

Nelson, A. B. & Shiffrin, R. M. The co-evolution of knowledge and event memory. Psychol. Rev. 120, 356–394 (2013).

Article
PubMed

Google Scholar

Bender, E. M., Gebru, T., McMillan-Major, A. & Shmitchell, S. On the dangers of stochastic parrots: can language models be too big? In Proc. 2021 ACM Conf. Fairness Account. Transpar. (eds Elish, M. C., Isaac, W. & Zemel, R.) 610–623 (ACM, 2021).

Webb, T., Holyoak, K. J. & Lu, H. Emergent analogical reasoning in large language models. Nat. Hum. Behav. 7, 1526–1541 (2023).

Article
PubMed

Google Scholar

Heinlein, R. A. Stranger in a Strange Land (Putnam, 1969).

Power, A., Burda, Y., Edwards, H., Babuschkin, I. & Misra, V. Grokking: generalization beyond overfitting on small algorithmic datasets. Preprint at arXiv https://doi.org/10.48550/arXiv.2201.02177 (2022).

Seligman, M. E. P., Railton, P., Baumeister, R. F. & Sripada, C. Navigating into the future or driven by the past. Persp. Psychol. Sci. 8, 119–141 (2013).

Article

Google Scholar

Liu, S. et al. Using AI-generated suggestions from ChatGPT to optimize clinical decision support. J. Am. Med. Inform. Assoc. 30, 1237–1245 (2023).

Article
PubMed
PubMed Central

Google Scholar

Gu, Y. et al. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthc. 3, 2 (2021).

Google Scholar

Silva, G. A. Can AI understand what it’s telling you? Forbes https://www.forbes.com/sites/gabrielasilva/2025/04/23/can-ai-understand-the-chinese-room-argument-says-no-but-is-it-right (2025).

Ananthaswamy, A. New theory suggests chatbots can understand text. Quanta Magazine https://www.quantamagazine.org/new-theory-suggests-chatbots-can-understand-text-20240122/ (2024).

Moskvichev, A. K., Odouard, V. V. & Mitchell, M. The ConceptARC benchmark: evaluating understanding and generalization in the ARC domain. Trans. Mach. Learn. Res. https://openreview.net/forum?id=8ykyGbtt2q (2023).

Dijkstra, E. W. On IPW’s. Univ. Texas Austin https://www.cs.utexas.edu/~EWD/transcriptions/EWD08xx/EWD867.html (1983).

Thirunavukarasu, A. J. et al. Large language models in medicine. Nat. Med. 29, 1930–1940 (2023).

Article
PubMed

Google Scholar

Nori, H., King, N., McKinney, S. M., Carignan, D. & Horvitz, E. Capabilities of GPT-4 on medical challenge problems. Preprint at arXiv https://doi.org/10.48550/arXiv.2303.13375 (2023).

Nori, H. et al. Sequential diagnosis with language models. Preprint at arXiv https://doi.org/10.48550/arXiv.2506.22405 (2025).

Nie, Y. et al. A survey of large language models for financial applications: progress, prospects and challenges. Preprint at arXiv https://doi.org/10.48550/arXiv.2406.11903 (2024).

Aydın, Ö. & Karaarslan, E. OpenAI ChatGPT generated literature review: digital twin in healthcare. Emerg. Comput. Technol. 2, 22–31 (2022).

Google Scholar

Ke, L., Tong, S., Cheng, P. & Peng, K. Exploring the frontiers of LLMs in psychological applications: a comprehensive review. Artif. Intell. Rev. 58, 305 (2025).

Article

Google Scholar

Hua, S., Jin, S. & Jiang, S. The limitations and ethical considerations of ChatGPT. Data Intell. 6, 201–239 (2024).

Article

Google Scholar

Chuma, E. L. & de Oliveira, G. G. Generative AI for business decision-making: a case of ChatGPT. Manag. Sci. Bus. Decis. 3, 5–11 (2023).

Google Scholar

Eloundou, T., Manning, S., Mishkin, P. & Rock, D. GPTs are GPTs: labor market impact potential of LLMs. Science 384, 1306–1308 (2024).

Article
PubMed

Google Scholar

Weidinger, L. et al. Ethical and social risks of harm from language models. Preprint at arXiv https://doi.org/10.48550/arXiv.2112.04359 (2021).

Chen, J., Liu, L., Ruan, S., Li, M. & Yin, C. Are different versions of ChatGPT’s ability comparable to the clinical diagnosis presented in case reports? A descriptive study. J. Multidisc. Healthc. 16, 3825–3831 (2023).

Article

Google Scholar

OpenAI. Introducing ChatGPT. OpenAI https://openai.com/blog/chatgpt (2022).

An, J., Huang, D., Lin, C. & Tai, M. Measuring gender and racial biases in large language models. Preprint at arXiv http://arxiv.org/abs/2403.15281 (2024).

Liang, P. P., Wu, C., Morency, L.-P. & Salakhutdinov, R. Towards understanding and mitigating social biases in language models. In Proc. 38th Int. Conf. Mach. Learn. (eds Meila, M. & Zhang, T.) 6565–6576 (PMLR, 2021).

Peters, D., Vold, K., Robinson, D. & Calvo, R. A. Responsible AI — two frameworks for ethical design practice. IEEE Trans. Technol. Soc. 1, 34–47 (2020).

Article

Google Scholar

Lake, B. M., Ullman, T. D., Tenenbaum, J. B. & Gershman, S. J. Building machines that learn and think like people. Behav. Brain Sci. 40, e253 (2017).

Article
PubMed

Google Scholar

Liang, J. T., Lin, M., Rao, N. & Myers, B. A. Prompts are programs too! Understanding how developers build software containing prompts. Proc. ACM Softw. Eng. 2, 1591–1614 (2025).

Article

Google Scholar

De Neys, W. Bias and conflict: a case for logical intuitions. Persp. Psychol. Sci. 7, 28–38 (2012).

Article

Google Scholar

Neys, W. D. Advancing theorizing about fast-and-slow thinking. Behav. Brain Sci. 46, e111 (2023).

Article

Google Scholar

Yao, S. et al. ReAct: synergizing reasoning and acting in language models. Preprint at arXiv https://doi.org/10.48550/arXiv.2210.03629 (2023).

Wang, G. et al. Voyager: an open-ended embodied agent with large language models. Trans. Mach. Learn. Res. https://openreview.net/forum?id=ehfRiF0R3a (2023).

Nathani, D. et al. MLGym: a new framework and benchmark for advancing AI research agents. Preprint at arXiv https://doi.org/10.48550/arXiv.2502.14499 (2025).

Paglieri, D. et al. BALROG: benchmarking agentic LLM and VLM reasoning on games. In Proc. Int. Conf. Learn. Represent. (ICLR, 2025).

Liu, X. et al. Agentbench: evaluating LLMs as agents. In Proc. Int. Conf. Learning Represent. (ICLR, 2024).

Huang, J. et al. How far are we on the decision-making of LLMs? Evaluating LLMs’ gaming ability in multi-agent environments. Preprint at arXiv https://doi.org/10.48550/arXiv.2403.11807 (2025).

Leng, Y. & Yuan, Y. Do LLM agents exhibit social behavior? Preprint at arXiv https://doi.org/10.48550/arXiv.2312.15198 (2024).

Costarelli, A. et al. GameBench: evaluating strategic reasoning abilities of LLM agents. In Lang. Gamific. NeurIPS 2024 Workshop (NeurIPS, 2024).

Trencsenyi, V., Mensfelt, A. & Stathis, K. Approximating human strategic reasoning with LLM-enhanced recursive reasoners leveraging multi-agent hypergames. Preprint at arXiv https://doi.org/10.48550/arXiv.2502.07443 (2025).

Dwivedi, Y. K. et al. Opinion paper: “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. Int. J. Inform. Manag. 71, 102642 (2023).

Google Scholar

Vaugrante, L., Niepert, M. & Hagendorff, T. A looming replication crisis in evaluating behavior in language models? Evidence and solutions. Preprint at arXiv https://doi.org/10.48550/arXiv.2409.20303 (2024).

Koo, R. et al. Benchmarking cognitive biases in large language models as evaluators. In Find. Assoc. Comput. Linguist. (eds Ku, L.-W., Martins, A. & Srikumar, V.) 517–545 (ACL, 2024).

Wang, Y., Cai, Y., Chen, M., Liang, Y. & Hooi, B. Primacy effect of ChatGPT. In Proc. 2023 Conf. Empir. Methods Nat. Lang. Process. (eds Bouamor, H., Pino, J. & Bali, K.) 108–115 (ACL, 2023).

Mina, M., Ruiz-Fernández, V., Falcão, J., Vasquez-Reina, L. & Gonzalez-Agirre, A. Cognitive biases, task complexity, and result intepretability in large language models. In Proc. 31st Int. Conf. Comput. Linguist. (eds Rambow, O. et al.) 1767–1784 (ACL, 2025).

Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).

Article
PubMed

Google Scholar

Radford, A. et al. Language models are unsupervised multitask learners. OpenAI Blog 1, 9 (2019).

Google Scholar

OpenAI. Learning to reason with LLMs. OpenAI https://openai.com/index/learning-to-reason-with-llms (2024).

Pfau, J., Merrill, W. & Bowman, S. R. Let’s think dot by dot: hidden computation in transformer language models. Preprint at arXiv https://doi.org/10.48550/arXiv.2404.15758 (2024).

Guo, D. et al. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning. Nature 645, 633–638 (2025).

Article
PubMed
PubMed Central

Google Scholar

Kosinski, M. Evaluating large language models in theory of mind tasks. Proc. Natl Acad. Sci. USA 121, e2405460121 (2024).

Article
PubMed
PubMed Central

Google Scholar

Turpin, M., Michael, J., Perez, E. & Bowman, S. Language models don’t always say what they think: unfaithful explanations in chain-of-thought prompting. Adv. Neural Inf. Process. Syst. 36, 74952–74965 (2023).

Google Scholar

Ameisen, E. et al. Circuit tracing: revealing computational graphs in language models. AI Transformer Circuits Thread https://transformer-circuits.pub/2025/attribution-graphs/methods.html (2025).

Peter, S., Riemer, K. & West, J. D. The benefits and dangers of anthropomorphic conversational agents. Proc. Natl Acad. Sci. USA 122, e2415898122 (2025).

Article
PubMed
PubMed Central

Google Scholar

Guthrie, S. E. Faces in the Clouds: A New Theory of Religion (Oxford Univ. Press, 1995).

Jones, C. R. & Bergen, B. K. Large language models pass the Turing test. Preprint at arXiv https://doi.org/10.48550/arXiv.2503.23674 (2025).

Singhal, K. et al. Toward expert-level medical question answering with large language models. Nat. Med. 31, 943–950 (2025).

Article
PubMed
PubMed Central

Google Scholar

Zerkouk, M., Mihoubi, M. & Chikhaoui, B. A comprehensive review of AI-based intelligent tutoring systems: applications and challenges. Preprint at arXiv https://doi.org/10.48550/arXiv.2507.18882 (2025).

Colombatto, C. & Fleming, S. M. Folk psychological attributions of consciousness to large language models. Neurosci. Conscious. 2024, niae013 (2024).

Article
PubMed
PubMed Central

Google Scholar

Aldahoul, N. et al. Large language models are often politically extreme, usually ideologically inconsistent, and persuasive even in informational contexts. Preprint at arXiv https://doi.org/10.48550/arXiv.2505.04171 (2025).

Lawrence, H. R. et al. The opportunities and risks of large language models in mental Health. JMIR Ment. Health 11, e59479 (2024).

Article
PubMed
PubMed Central

Google Scholar

Zhang, Y., Zhao, D., Hancock, J. T., Kraut, R. & Yang, D. The rise of AI companions: how human-chatbot relationships influence well-being. Preprint at arXiv https://doi.org/10.48550/arXiv.2506.12605 (2025).

Akbulut, C., Weidinger, L., Manzini, A., Gabriel, I. & Rieser, V. All too human? Mapping and mitigating the risk from anthropomorphic AI. In Proc. AAAI/ACM Conf. AI Ethics Soc. Vol. 7 (eds Das, S. et al.) 13–26 (AAAI, 2024).

Shanahan, M. Talking about large language models. Commun. ACM 67, 68–79 (2024).

Article

Google Scholar

Zador, A. et al. Catalyzing next-generation artificial intelligence through neuroAI. Nat. Commun. 14, 1597 (2023).

Article
PubMed
PubMed Central

Google Scholar

Moser, E. I., Kropff, E. & Moser, M.-B. Place cells, grid cells, and the brain’s spatial representation system. Annu. Rev. Neurosci. 31, 69–89 (2008).

Article
PubMed

Google Scholar

Hassabis, D., Kumaran, D., Summerfield, C. & Botvinick, M. Neuroscience-inspired artificial intelligence. Neuron 95, 245–258 (2017).

Article
PubMed

Google Scholar

Mirzadeh, S. I. et al. GSM-symbolic: understanding the limitations of mathematical reasoning in large language models. Preprint at arXiv https://doi.org/10.48550/arXiv.2410.05229 (2024).

Kumaran, D., Hassabis, D. & McClelland, J. L. What learning systems do intelligent agents need? Complementary learning systems theory updated. Trends Cogn. Sci. 20, 512–534 (2016).

Article
PubMed

Google Scholar

Connell, L. & Lynott, D. What can language models tell us about human cognition? Curr. Dir. Psychol. Sci. 33, 181–189 (2024).

Article

Google Scholar

Martínez, L., Ruan, D. & Herrera, F. Computing with words in decision support systems: an overview on models and applications. Int. J. Comput. Intell. Syst. 3, 382–395 (2010).

Google Scholar

Perlis, R. H., Goldberg, J. F., Ostacher, M. J. & Schneck, C. D. Clinical decision support for bipolar depression using large language models. Neuropsychopharmacology 49, 1412–1416 (2024)

Article
PubMed
PubMed Central

Google Scholar

Chiriatti, M., Ganapini, M., Panai, E., Ubiali, M. & Riva, G. The case for human–AI interaction as system 0 thinking. Nat. Hum. Behav. 8, 1829–1830 (2024).

Article
PubMed

Google Scholar

Essel, H. B., Vlachopoulos, D., Essuman, A. B. & Amankwa, J. O. ChatGPT effects on cognitive skills of undergraduate students: receiving instant responses from AI-based conversational large language models (LLMs). Comput. Educ. Artif. Intell. 6, 100198 (2024).

Article

Google Scholar

Rasmequan, S. & Russ, S. Cognitive artefacts for decision support. In SMC 2000 Proc. Int. Conf. Syst. Man Cybernet. (von Keutz, S. et al.) 651–656 (IEEE, 2000).

Vicente, L. & Matute, H. Humans inherit artificial intelligence biases. Sci. Rep. 13, 15737 (2023).

Article
PubMed
PubMed Central

Google Scholar

Treiman, L. S., Ho, C.-J. & Kool, W. The consequences of AI training on human decision-making. Proc. Natl Acad. Sci. USA 121, e2408731121 (2024).

Article
PubMed
PubMed Central

Google Scholar

Frentz, T. S. Memory, myth, and rhetoric in plato’s phaedrus. Rhetor. Soc. Q. 36, 243–262 (2006).

Article

Google Scholar

Heersmink, R. Extended mind and cognitive enhancement: moral aspects of cognitive artifacts. Phenomenol. Cogn. Sci. 16, 17–32 (2017).

Article

Google Scholar

Carr, N. The Shallows: What the Internet Is Doing to Our Brains (W. W. Norton, 2020).

Tanil, C. T. & Yong, M. H. Mobile phones: the effect of its presence on learning and memory. PLoS ONE 15, e0219233 (2020).

Article
PubMed
PubMed Central

Google Scholar

Clemenson, G. D., Maselli, A., Fiannaca, A. J., Miller, A. & Gonzalez-Franco, M. Rethinking GPS navigation: creating cognitive maps through auditory clues. Sci. Rep. 11, 7764 (2021).

Article
PubMed
PubMed Central

Google Scholar

Dahmani, L. & Bohbot, V. D. Habitual use of GPS negatively impacts spatial memory during self-guided navigation. Sci. Rep. 10, 6310 (2020).

Article
PubMed
PubMed Central

Google Scholar

Bai, L., Liu, X. & Su, J. ChatGPT: the cognitive effects on learning and memory. Brain-X 1, e30 (2023).

Article

Google Scholar

Costello, T. H., Pennycook, G. & Rand, D. G. Durably reducing conspiracy beliefs through dialogues with AI. Science 385, eadq1814 (2024).

Article
PubMed

Google Scholar

Heersmink, R. Use of large language models might affect our cognitive skills. Nat. Hum. Behav. 8, 805–806 (2024).

Article
PubMed

Google Scholar