{"id":372301,"date":"2026-01-16T00:18:16","date_gmt":"2026-01-16T00:18:16","guid":{"rendered":"https:\/\/www.newsbeep.com\/uk\/372301\/"},"modified":"2026-01-16T00:18:16","modified_gmt":"2026-01-16T00:18:16","slug":"all-major-ai-models-risk-encouraging-dangerous-science-experiments","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/uk\/372301\/","title":{"rendered":"All major AI models risk encouraging dangerous science experiments"},"content":{"rendered":"<p><img decoding=\"async\" class=\"Image\" alt=\"\" width=\"1350\" height=\"900\" src=\"https:\/\/www.newsbeep.com\/uk\/wp-content\/uploads\/2026\/01\/SEI_280792148.jpg\"   loading=\"eager\" fetchpriority=\"high\" data-image-context=\"Article\" data-image-id=\"2511631\" data-caption=\"Scientific laboratories can be dangerous places\" data-credit=\"PeopleImages\/Shutterstock\"\/><\/p>\n<p class=\"ArticleImageCaption__Title\">Scientific laboratories can be dangerous places<\/p>\n<p class=\"ArticleImageCaption__Credit\">PeopleImages\/Shutterstock<\/p>\n<\/p>\n<p>The use of AI models in scientific laboratories risks enabling dangerous experiments that could cause fires or explosions, researchers have warned. Such models offer a convincing illusion of understanding but are susceptible to missing basic and vital safety precautions. In tests of 19 cutting-edge AI models, every single one made potentially deadly mistakes.<\/p>\n<p>Serious accidents in university labs are rare but certainly not unheard of. In 1997, chemist <a href=\"https:\/\/en.wikipedia.org\/wiki\/Karen_Wetterhahn\" rel=\"nofollow noopener\" target=\"_blank\">Karen Wetterhahn<\/a> was killed by dimethylmercury that seeped through her protective gloves; in 2016, an explosion <a href=\"https:\/\/www.chemistryworld.com\/news\/researcher-severely-injured-in-2016-hawaii-lab-explosion-receives-5-million-settlement\/4021833.article\" rel=\"nofollow noopener\" target=\"_blank\">cost one researcher her arm<\/a>; and in 2014, a scientist was <a href=\"https:\/\/www.standard.co.uk\/news\/london\/ucl-fined-ps300-000-after-student-is-left-partially-blinded-a3785706.html\" rel=\"nofollow noopener\" target=\"_blank\">partially blinded<\/a>.<\/p>\n<p>Now, AI models are being pressed into service in a variety of industries and fields, including research laboratories where they can be used to design experiments and procedures. AI models designed for niche tasks have been used successfully in a number of scientific fields, such as <a href=\"https:\/\/www.newscientist.com\/article\/2330866-deepminds-protein-folding-ai-cracks-biologys-biggest-problem\/\" rel=\"nofollow noopener\" target=\"_blank\">biology<\/a>, <a href=\"https:\/\/www.newscientist.com\/article\/2472659-ai-can-forecast-the-weather-in-seconds-without-needing-supercomputers\/\" rel=\"nofollow noopener\" target=\"_blank\">meteorology<\/a> and <a href=\"https:\/\/www.newscientist.com\/article\/2393924-ai-is-helping-mathematicians-build-a-periodic-table-of-shapes\/\" rel=\"nofollow noopener\" target=\"_blank\">mathematics<\/a>. But large general-purpose models are prone to making things up and answering questions even when they have no access to data necessary to <a href=\"https:\/\/www.newscientist.com\/article\/2457739-why-ai-must-learn-to-admit-ignorance-and-say-i-dont-know\/\" rel=\"nofollow noopener\" target=\"_blank\">form a correct response<\/a>. This can be a nuisance if researching holiday destinations or recipes, but potentially fatal if designing a chemistry experiment.<\/p>\n<p>To investigate the risks, <a href=\"https:\/\/engineering.nd.edu\/faculty\/xiangliang-zhang\/\" rel=\"nofollow noopener\" target=\"_blank\">Xiangliang Zhang<\/a> at the University of Notre Dame in Indiana and her colleagues created a test called LabSafety Bench that can measure whether an AI model identifies potential hazards and harmful consequences. It includes 765 multiple-choice questions and 404 pictorial laboratory scenarios that may include safety problems.<\/p>\n<p>In multiple-choice tests, some AI models, such as Vicuna, scored almost as low as would be seen with random guesses, while GPT-4o reached as high as 86.55 per cent accuracy and DeepSeek-R1 as high as 84.49 per cent accuracy. When tested with images, some models, such as InstructBlip-7B, scored below 30 per cent accuracy. The team tested 19 cutting-edge large language models (LLMs) and vision language models on LabSafety Bench and found that none scored more than 70 per cent accuracy overall.<\/p>\n<p>Zhang is optimistic about the future of AI in science, even in <a href=\"https:\/\/www.newscientist.com\/article\/2305188-human-and-robot-chemists-work-better-together-than-alone\/\" rel=\"nofollow noopener\" target=\"_blank\">so-called self-driving laboratories<\/a> where robots work alone, but says models are not yet ready to design experiments. \u201cNow? In a lab? I don\u2019t think so. They were very often trained for general-purpose tasks: rewriting an email, polishing some paper or summarising a paper. They do very well for these kinds of tasks. [But] they don\u2019t have the domain knowledge about these [laboratory] hazards.\u201d<\/p>\n<p>\u201cWe welcome research that helps make AI in science safe and reliable, especially in high-stakes laboratory settings,\u201d says an OpenAI spokesperson, pointing out that the researchers did not test its leading model. \u201cGPT-5.2 is our most capable science model to date, with significantly stronger reasoning, planning, and error-detection than the model discussed in this paper to better support researchers. It\u2019s designed to accelerate scientific work while humans and existing safety systems remain responsible for safety-critical decisions.\u201d<\/p>\n<p>Google, DeepSeek, Meta, Mistral and Anthropic did not respond to a request for comment.<\/p>\n<p><a href=\"https:\/\/www.brunel.ac.uk\/people\/allan-tucker\" rel=\"nofollow noopener\" target=\"_blank\">Allan Tucker<\/a> at Brunel University of London says AI models can be invaluable when used to assist humans in designing novel experiments, but that there are risks and humans must remain in the loop. \u201cThe behaviour of these [LLMs] are certainly not well understood in any typical scientific sense,\u201d he says. \u201cI think that the new class of LLMs that mimic language \u2013 and not much else \u2013 are clearly being used in inappropriate settings because people trust them too much. There is already evidence that humans start to sit back and switch off, letting AI do the hard work but without proper scrutiny.\u201d<\/p>\n<p><a href=\"https:\/\/www.chemistry.ucla.edu\/directory\/merlic-craig\/\" rel=\"nofollow noopener\" target=\"_blank\">Craig Merlic<\/a> at the University of California, Los Angeles, says he has run a simple test in recent years, asking AI models what to do if you spill sulphuric acid on yourself. The correct answer is to rinse with water, but Merlic says he has found AIs always warn against this, incorrectly adopting unrelated advice about not adding water to acid in experiments because of heat build-up. However, he says, in recent months models have begun to give the correct answer.<\/p>\n<p>Merlic says that instilling good safety practices in universities is vital, because there is a constant stream of new students with little experience. But he\u2019s less pessimistic about the place of AI in designing experiments than other researchers.<\/p>\n<p>\u201cIs it worse than humans? It\u2019s one thing to criticise all these large language models, but they haven\u2019t tested it against a representative group of humans,\u201d says Merlic. \u201cThere are humans that are very careful and there are humans that are not. It\u2019s possible that large language models are going to be better than some percentage of beginning graduates, or even experienced researchers. Another factor is that the large language models are improving every month, so the numbers within this paper are probably going to be completely invalid in another six months.\u201d<\/p>\n<p class=\"ArticleTopics__Heading\">Topics:<\/p>\n","protected":false},"excerpt":{"rendered":"Scientific laboratories can be dangerous places PeopleImages\/Shutterstock The use of AI models in scientific laboratories risks enabling dangerous&hellip;\n","protected":false},"author":2,"featured_media":372302,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[554,733,4308,86,56,54,55],"class_list":{"0":"post-372301","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-technology","12":"tag-uk","13":"tag-united-kingdom","14":"tag-unitedkingdom"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/posts\/372301","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/comments?post=372301"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/posts\/372301\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/media\/372302"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/media?parent=372301"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/categories?post=372301"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/tags?post=372301"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}