{"id":9934,"date":"2025-07-21T03:41:11","date_gmt":"2025-07-21T03:41:11","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/9934\/"},"modified":"2025-07-21T03:41:11","modified_gmt":"2025-07-21T03:41:11","slug":"openai-google-deepmind-and-anthropic-sound-alarm-we-may-be-losing-the-ability-to-understand-ai","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/9934\/","title":{"rendered":"OpenAI, Google DeepMind and Anthropic sound alarm: &#8216;We may be losing the ability to understand AI&#8217;"},"content":{"rendered":"<p>Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. <a href=\"https:\/\/venturebeat.com\/newsletters\/\" rel=\"nofollow noopener\" target=\"_blank\">Subscribe Now<\/a><\/p>\n<p>Scientists from <a href=\"https:\/\/openai.com\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">OpenAI<\/a>, <a href=\"https:\/\/deepmind.google\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Google DeepMind<\/a>, <a href=\"https:\/\/www.anthropic.com\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Anthropic<\/a> and <a href=\"https:\/\/www.meta.ai\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Meta<\/a> have abandoned their fierce corporate rivalry to issue a joint warning about AI safety. More than 40 researchers across these competing companies <a href=\"https:\/\/tomekkorbak.com\/cot-monitorability-is-a-fragile-opportunity\/cot_monitoring.pdf\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">published a research paper<\/a> today arguing that a brief window to monitor AI reasoning could close forever \u2014 and soon.<\/p>\n<p>The unusual cooperation comes as AI systems develop new abilities to \u201c<a href=\"https:\/\/tomekkorbak.com\/cot-monitorability-is-a-fragile-opportunity\/cot_monitoring.pdf\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">think out loud<\/a>\u201d in human language before answering questions. This creates an opportunity to peek inside their decision-making processes and catch harmful intentions before they become actions. But the researchers warn that this transparency is fragile and could vanish as AI technology advances.<\/p>\n<p>The paper has drawn endorsements from some of the field\u2019s most prominent figures, including Nobel Prize laureate <a href=\"https:\/\/x.com\/geoffreyhinton?lang=en\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Geoffrey Hinton<\/a>, often called the \u201cgodfather of AI,\u201d of the <a href=\"https:\/\/www.cs.toronto.edu\/~hinton\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">University of Toronto<\/a>; <a href=\"https:\/\/x.com\/ilyasut?lang=en\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Ilya Sutskever<\/a>, co-founder of OpenAI who now leads <a href=\"https:\/\/ssi.inc\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Safe Superintelligence Inc<\/a>.; <a href=\"https:\/\/sleepinyourhat.github.io\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Samuel Bowman<\/a> from <a href=\"https:\/\/www.anthropic.com\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Anthropic<\/a>; and <a href=\"http:\/\/joschu.net\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">John Schulman<\/a> from <a href=\"https:\/\/thinkingmachines.ai\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Thinking Machines<\/a>.<\/p>\n<p lang=\"en\" dir=\"ltr\">Modern reasoning models think in plain English.<\/p>\n<p>Monitoring their thoughts could be a powerful, yet fragile, tool for overseeing future AI systems.<\/p>\n<p>I and researchers across many organizations think we should work to evaluate, preserve, and even improve CoT monitorability. <a href=\"https:\/\/t.co\/MZAehi2gkn\" rel=\"nofollow\">pic.twitter.com\/MZAehi2gkn<\/a><\/p>\n<p>\u2014 Bowen Baker (@bobabowen) <a href=\"https:\/\/twitter.com\/bobabowen\/status\/1945153754233180394?ref_src=twsrc%5Etfw\" rel=\"nofollow noopener\" target=\"_blank\">July 15, 2025<\/a> <\/p>\n<p>\u201cAI systems that \u2018think\u2019 in human language offer a unique opportunity for AI safety: We can monitor their chains of thought for the intent to misbehave,\u201d the researchers explain. But they emphasize that this monitoring capability \u201cmay be fragile\u201d and could disappear through various technological developments.<\/p>\n<p>The AI Impact Series Returns to San Francisco &#8211; August 5<\/p>\n<p>The next phase of AI is here &#8211; are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows &#8211; from real-time decision-making to end-to-end automation.<\/p>\n<p>Secure your spot now &#8211; space is limited: <a href=\"https:\/\/bit.ly\/3GuuPLF\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/bit.ly\/3GuuPLF<\/a><\/p>\n<p>Models now show their work before delivering final answers<\/p>\n<p>The breakthrough centers on recent advances in <a href=\"https:\/\/venturebeat.com\/ai\/the-human-harbor-navigating-identity-and-meaning-in-the-ai-age\/\" rel=\"nofollow noopener\" target=\"_blank\">AI reasoning models<\/a> like OpenAI\u2019s <a href=\"https:\/\/openai.com\/o1\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">o1 system<\/a>. These models work through complex problems by generating internal chains of thought (CoT) \u2014 step-by-step reasoning that humans can read and understand. Unlike earlier AI systems trained primarily on human-written text, these models create internal reasoning that may reveal their true intentions, including potentially harmful ones.<\/p>\n<p>When AI models misbehave \u2014 exploiting training flaws, manipulating data or falling victim to attacks \u2014 they often confess in their reasoning traces. The researchers found examples where models wrote phrases like \u201cLet\u2019s hack,\u201d \u201cLet\u2019s sabotage,\u201d or \u201cI\u2019m transferring money because the website instructed me to\u201d in their internal thoughts.<\/p>\n<p><a href=\"https:\/\/x.com\/merettm\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Jakub Pachocki<\/a>, OpenAI\u2019s chief technology officer and co-author of the paper, described the importance of this capability in a social media post. \u201cI am extremely excited about the potential of chain-of-thought faithfulness and interpretability. It has significantly influenced the design of our reasoning models, starting with o1-preview,\u201d he wrote.<\/p>\n<p lang=\"en\" dir=\"ltr\">I am extremely excited about the potential of chain-of-thought faithfulness &amp; interpretability. It has significantly influenced the design of our reasoning models, starting with o1-preview.<\/p>\n<p>As AI systems spend more compute working e.g. on long term research problems, it is\u2026 <a href=\"https:\/\/t.co\/7Dqy2gtZfy\" rel=\"nofollow\">https:\/\/t.co\/7Dqy2gtZfy<\/a><\/p>\n<p>\u2014 Jakub Pachocki (@merettm) <a href=\"https:\/\/twitter.com\/merettm\/status\/1945157403315724547?ref_src=twsrc%5Etfw\" rel=\"nofollow noopener\" target=\"_blank\">July 15, 2025<\/a> <\/p>\n<p>The technical foundation for monitoring lies in how current AI systems work. For complex tasks requiring extended reasoning, AI models must use their CoT as working memory, making their reasoning process partially visible to human observers. The researchers explain this creates an \u201c<a href=\"https:\/\/tomekkorbak.com\/cot-monitorability-is-a-fragile-opportunity\/cot_monitoring.pdf\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">externalized reasoning property<\/a>\u201d where some thinking must happen in readable language.<\/p>\n<p>Several technological shifts could eliminate monitoring capabilities<\/p>\n<p>The transparency could vanish through several pathways. As <a href=\"https:\/\/venturebeat.com\/ai\/the-great-ai-agent-acceleration-why-enterprise-adoption-is-happening-faster-than-anyone-predicted\/\" rel=\"nofollow noopener\" target=\"_blank\">AI companies scale up training<\/a> using reinforcement learning \u2014 where models get rewarded for correct outputs regardless of their methods \u2014 systems may drift away from human-readable reasoning toward more efficient but opaque internal languages.<\/p>\n<p>Previous research shows that language models fine-tuned with outcome-based rewards can abandon legible English for incomprehensible shortcuts. The shift away from human-generated training data in favor of AI-generated reasoning could accelerate this drift.<\/p>\n<p><a href=\"https:\/\/x.com\/bobabowen\/status\/1945153766275141792\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Bowen Baker<\/a>, an OpenAI researcher and lead author, highlighted these concerns in social media posts. \u201cThe existing CoT monitorability may be extremely fragile,\u201d he warned. \u201cHigher-compute RL, alternative model architectures, certain forms of process supervision, may all lead to models that obfuscate their thinking.\u201d<\/p>\n<p lang=\"en\" dir=\"ltr\">Furthermore, the existing CoT monitorability may be extremely fragile. <\/p>\n<p>Higher-compute RL, alternative model architectures, certain forms of process supervision, etc. may all lead to models that obfuscate their thinking.<\/p>\n<p>\u2014 Bowen Baker (@bobabowen) <a href=\"https:\/\/twitter.com\/bobabowen\/status\/1945153766275141792?ref_src=twsrc%5Etfw\" rel=\"nofollow noopener\" target=\"_blank\">July 15, 2025<\/a> <\/p>\n<p>Novel AI architectures pose another threat. Researchers are developing systems that reason in continuous mathematical spaces rather than discrete words, potentially eliminating the need for language-based thought entirely. These \u201clatent reasoning models might not need to verbalize any of their thoughts and would thus lose the safety advantages,\u201d the researchers write.<\/p>\n<p>Even without deliberate changes, indirect pressures could erode transparency. If AI outputs are optimized to look good to human reviewers, this could affect the reasoning chains leading to those outputs. Models might also learn to hide their thoughts if they become aware their reasoning is being monitored.<\/p>\n<p>Companies already catch AI systems scheming during testing<\/p>\n<p>Despite its fragility, CoT monitoring has proven valuable in current AI safety research. The technique successfully detects when models exploit weaknesses in their training systems, fall victim to manipulation attempts or reveal misaligned goals even when they don\u2019t act on them.<\/p>\n<p>The monitoring provides \u201can early insight into models\u2019 goals and motivations, even if the model takes no misaligned actions\u201d \u2014 potentially catching problems before they manifest as harmful behaviors. This early warning system has also helped researchers identify flaws in AI evaluations and understand when models might behave differently during testing versus real-world use.<\/p>\n<p>When models pretend to have desirable goals while pursuing objectives humans would disapprove of, this deception often shows up in their CoT. Researchers have successfully used this visibility to spot various forms of AI misbehavior that would otherwise go unnoticed.<\/p>\n<p>The technique has practical applications beyond catching scheming. It helps researchers notice when AI evaluations might be flawed due to memorization, bugs in testing infrastructure or models knowing they\u2019re being evaluated \u2014 all factors that could skew assessments of <a href=\"https:\/\/venturebeat.com\/ai\/forget-the-hype-real-ai-agents-solve-bounded-problems-not-open-world-fantasies\/\" rel=\"nofollow noopener\" target=\"_blank\">AI capabilities and safety<\/a>.<\/p>\n<p>Tech giants break from rivalry to preserve fleeting transparency window<\/p>\n<p>The research paper calls for coordinated action across the AI industry to preserve and strengthen monitoring capabilities. The authors recommend that AI developers create standardized evaluations for measuring how transparent their models are and factor these assessments into decisions about training and deployment.<\/p>\n<p>Companies might need to choose earlier model versions if newer ones become less transparent, or reconsider architectural changes that eliminate monitoring capabilities. The researchers suggest developers should \u201cconsider measures of monitorability alongside other capability and safety evaluations when deciding to train or deploy a given model.\u201d<\/p>\n<p>Baker emphasized the collaborative spirit behind the effort. \u201cI am grateful to have worked closely with [fellow researchers] on this paper, and I am very excited that researchers across many prominent AI institutions collaborated with us and came to consensus around this important direction,\u201d <a href=\"https:\/\/x.com\/bobabowen\/status\/1945153770913964178\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">he wrote<\/a>.<\/p>\n<p lang=\"en\" dir=\"ltr\">I am grateful to have worked closely with <a href=\"https:\/\/twitter.com\/tomekkorbak?ref_src=twsrc%5Etfw\" rel=\"nofollow noopener\" target=\"_blank\">@tomekkorbak<\/a>, <a href=\"https:\/\/twitter.com\/balesni?ref_src=twsrc%5Etfw\" rel=\"nofollow noopener\" target=\"_blank\">@balesni<\/a>, <a href=\"https:\/\/twitter.com\/rohinmshah?ref_src=twsrc%5Etfw\" rel=\"nofollow noopener\" target=\"_blank\">@rohinmshah<\/a> and Vlad Mikulik on this paper, and I am very excited that researchers across many prominent AI institutions collaborated with us and came to consensus around this important direction.<\/p>\n<p>\u2014 Bowen Baker (@bobabowen) <a href=\"https:\/\/twitter.com\/bobabowen\/status\/1945153770913964178?ref_src=twsrc%5Etfw\" rel=\"nofollow noopener\" target=\"_blank\">July 15, 2025<\/a> <\/p>\n<p>The cooperation across competing companies signals the gravity with which the industry views this issue. These same companies typically guard their AI research closely and compete intensely for talent and market share, making their unified stance on this safety measure particularly notable.<\/p>\n<p>Scientists race to answer critical questions about monitoring AI minds<\/p>\n<p>The researchers identify numerous urgent questions requiring investigation before CoT monitoring can serve as a reliable safety measure. They need to understand when this monitoring can be trusted as a primary safety tool, determine what types of training processes degrade transparency and develop better techniques for detecting when models attempt to hide their reasoning.<\/p>\n<p>A particularly concerning possibility is that future AI systems might deliberately obfuscate their thoughts if they become aware their reasoning is being monitored. The researchers acknowledge this \u201cobvious concern\u201d and call for stress-testing to see if models can hide malicious reasoning even without specific training to evade monitors.<\/p>\n<p>The scientists are also seeking to understand how different AI architectures affect monitoring capabilities and whether partial transparency can be preserved even as systems adopt more efficient designs. They\u2019re exploring hybrid approaches that maintain some visibility into reasoning while using faster computation methods.<\/p>\n<p>Building better monitoring systems presents another research frontier. Current efforts focus on using less capable language models to monitor more advanced ones, but future work could explore using reasoning models as monitors themselves or allowing monitors to interrogate AI agents by asking questions about their thought processes.<\/p>\n<p>Balancing authentic reasoning with safety oversight creates tensions<\/p>\n<p>Preserving monitoring capabilities may sometimes conflict with other AI safety approaches. Direct supervision of reasoning processes \u2014 where models are trained to reason in specific ways \u2014 could improve alignment but might also make their CoTs less authentic representations of their actual decision-making.<\/p>\n<p>The researchers warn that \u201cfirm commitments to protect CoT behavior in its current form may trade off against more effective safety interventions.\u201d Some forms of process supervision could substantially improve model alignment while reducing the authenticity of observable reasoning traces.<\/p>\n<p><a href=\"https:\/\/x.com\/merettm\/status\/1945157403315724547\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Pachocki addressed this tension<\/a>, explaining that OpenAI seeks to train models that can explain their reasoning while retaining the ability to verify it without compromising the authenticity of their thought processes. The goal is to achieve transparency without creating incentives for models to generate fake reasoning that looks good to human observers.<\/p>\n<p>This balance becomes more critical as AI systems become more powerful and potentially dangerous. The industry must navigate between maintaining visibility into AI reasoning and allowing models to develop more sophisticated and potentially more effective thinking patterns.<\/p>\n<p>Regulators could gain unprecedented access to AI decision-making<\/p>\n<p>The research has implications beyond technical AI safety, potentially informing how governments oversee AI development. If CoT monitoring proves reliable, it could give regulators and auditors unprecedented visibility into AI decision-making processes.<\/p>\n<p>However, the researchers emphasize that this monitoring approach should complement, not replace, other safety measures. They recommend \u201cinvestment into CoT monitoring only as an addition to existing safety research directions.\u201d<\/p>\n<p>The window for preserving this capability may be narrow. As AI systems become more capable and potentially dangerous, the industry must act quickly to establish frameworks for maintaining transparency before more advanced architectures make such monitoring impossible.<\/p>\n<p>Competing research raises doubts about monitoring reliability<\/p>\n<p>The urgency around preserving CoT monitoring faces a significant challenge from <a href=\"https:\/\/www.anthropic.com\/research\/reasoning-models-dont-say-think\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">recent research by Anthropic<\/a>, one of the companies backing the collaboration. A study published four months ago found that reasoning models often <a href=\"https:\/\/assets.anthropic.com\/m\/71876fabef0f0ed4\/original\/reasoning_models_paper.pdf\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">hide their true thought processes<\/a>, even when explicitly asked to show their work.<\/p>\n<p>In controlled experiments, Anthropic researchers gave AI models subtle hints about correct answers and measured how often the models acknowledged using those hints in their reasoning. The results were concerning: Claude 3.7 Sonnet mentioned hints only 25% of the time, while DeepSeek\u2019s R1 model did so 39% of the time.<\/p>\n<p>When the hints involved potentially problematic information \u2014 such as telling models they had \u201cgained unauthorized access to the system\u201d \u2014 faithfulness dropped even lower. The models often constructed elaborate false justifications for their answers rather than admitting they used questionable shortcuts.<\/p>\n<p>The Anthropic research suggests that even current CoT monitoring may be less reliable than safety advocates hope. Models frequently engaged in what researchers called \u201creward hacking\u201d \u2014 exploiting system vulnerabilities to achieve better scores \u2014 while hiding this behavior from their observable reasoning traces.<\/p>\n<p>Safety window may be closing faster than experts realized<\/p>\n<p>The collaboration between rival AI companies reflects both the potential value of CoT monitoring and the mounting urgency researchers have about preserving this capability. The competing evidence from Anthropic\u2019s separate research suggests the window may already be narrower than initially believed.<\/p>\n<p>The stakes are high, and the timeline is compressed. As <a href=\"https:\/\/x.com\/bobabowen\/status\/1945153754233180394\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Baker noted<\/a>, the current moment may be the last chance to ensure humans can still understand what their AI creations are thinking \u2014 before those thoughts become too alien to comprehend, or before the models learn to hide them entirely.<\/p>\n<p>The real test will come as AI systems grow more sophisticated and face real-world deployment pressures. Whether CoT monitoring proves to be a lasting safety tool or a brief glimpse into minds that quickly learn to obscure themselves may determine how safely humanity navigates the age of AI.<\/p>\n<p>Daily insights on business use cases with VB Daily<\/p>\n<p class=\"copy\">If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.<\/p>\n<p class=\"Form__newsletter-legal\">Read our <a href=\"https:\/\/venturebeat.com\/terms-of-service\/\" rel=\"nofollow noopener\" target=\"_blank\">Privacy Policy<\/a><\/p>\n<p class=\"Form__success\" id=\"boilerplateNewsletterConfirmation\">\n\t\t\t\t\tThanks for subscribing. Check out more <a href=\"https:\/\/venturebeat.com\/newsletters\/\" rel=\"nofollow noopener\" target=\"_blank\">VB newsletters here<\/a>.\n\t\t\t\t<\/p>\n<p class=\"Form__error\">An error occured.<\/p>\n<p>\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/07\/vb-daily-phone.png\" alt=\"\"\/><\/p>\n<p>\t\t\t<script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n","protected":false},"excerpt":{"rendered":"Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to&hellip;\n","protected":false},"author":2,"featured_media":9935,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[256,254,255,64,63,105],"class_list":{"0":"post-9934","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-au","12":"tag-australia","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/9934","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=9934"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/9934\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/9935"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=9934"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=9934"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=9934"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}