.elementor-widget-container{margin:0px 0px 0px 0px;padding:0px 0px 0px 0px;}.elementor-65851 .elementor-element.elementor-element-1cf67f8.elementor-element{–flex-grow:0;–flex-shrink:0;}.elementor-65851 .elementor-element.elementor-element-fba5b18{–display:flex;–flex-direction:row;–container-widget-width:initial;–container-widget-height:100%;–container-widget-flex-grow:1;–container-widget-align-self:stretch;–flex-wrap-mobile:wrap;}.elementor-widget-form .elementor-field-group > label, .elementor-widget-form .elementor-field-subgroup label{color:var( –e-global-color-text );}.elementor-widget-form .elementor-field-type-html{color:var( –e-global-color-text );}.elementor-widget-form .elementor-field-group .elementor-field{color:var( –e-global-color-text );}.elementor-widget-form .e-form__buttons__wrapper__button-next{background-color:var( –e-global-color-accent );}.elementor-widget-form .elementor-button[type=”submit”]{background-color:var( –e-global-color-accent );}.elementor-widget-form .e-form__buttons__wrapper__button-previous{background-color:var( –e-global-color-accent );}.elementor-widget-form{–e-form-steps-indicator-inactive-primary-color:var( –e-global-color-text );–e-form-steps-indicator-active-primary-color:var( –e-global-color-accent );–e-form-steps-indicator-completed-primary-color:var( –e-global-color-accent );–e-form-steps-indicator-progress-color:var( –e-global-color-accent );–e-form-steps-indicator-progress-background-color:var( –e-global-color-text );–e-form-steps-indicator-progress-meter-color:var( –e-global-color-text );}.elementor-65851 .elementor-element.elementor-element-5169176{width:var( –container-widget-width, 98.54% );max-width:98.54%;–container-widget-width:98.54%;–container-widget-flex-grow:0;–e-form-steps-indicators-spacing:17px;–e-form-steps-indicator-padding:30px;–e-form-steps-indicator-inactive-secondary-color:#ffffff;–e-form-steps-indicator-active-primary-color:var( –e-global-color-secondary );–e-form-steps-indicator-active-secondary-color:#ffffff;–e-form-steps-indicator-completed-secondary-color:#ffffff;–e-form-steps-divider-width:2px;–e-form-steps-divider-gap:10px;}.elementor-65851 .elementor-element.elementor-element-5169176 > .elementor-widget-container{margin:0px 0px 0px 0px;padding:0px 0px 0px 0px;}.elementor-65851 .elementor-element.elementor-element-5169176.elementor-element{–align-self:center;–flex-grow:0;–flex-shrink:0;}.elementor-65851 .elementor-element.elementor-element-5169176 .elementor-field-group{padding-right:calc( 60px/2 );padding-left:calc( 60px/2 );margin-bottom:20px;}.elementor-65851 .elementor-element.elementor-element-5169176 .elementor-form-fields-wrapper{margin-left:calc( -60px/2 );margin-right:calc( -60px/2 );margin-bottom:-20px;}.elementor-65851 .elementor-element.elementor-element-5169176 .elementor-field-group.recaptcha_v3-bottomleft, .elementor-65851 .elementor-element.elementor-element-5169176 .elementor-field-group.recaptcha_v3-bottomright{margin-bottom:0;}body.rtl .elementor-65851 .elementor-element.elementor-element-5169176 .elementor-labels-inline .elementor-field-group > label{padding-left:0px;}body:not(.rtl) .elementor-65851 .elementor-element.elementor-element-5169176 .elementor-labels-inline .elementor-field-group > label{padding-right:0px;}body .elementor-65851 .elementor-element.elementor-element-5169176 .elementor-labels-above .elementor-field-group > label{padding-bottom:0px;}.elementor-65851 .elementor-element.elementor-element-5169176 .elementor-field-type-html{padding-bottom:0px;}.elementor-65851 .elementor-element.elementor-element-5169176 .elementor-field-group .elementor-field:not(.elementor-select-wrapper){background-color:#ffffff;border-width:1px 1px 1px 1px;}.elementor-65851 .elementor-element.elementor-element-5169176 .elementor-field-group .elementor-select-wrapper select{background-color:#ffffff;border-width:1px 1px 1px 1px;}.elementor-65851 .elementor-element.elementor-element-5169176 .elementor-button{font-size:16px;font-weight:400;line-height:14px;letter-spacing:1px;border-radius:5px 5px 5px 5px;}.elementor-65851 .elementor-element.elementor-element-5169176 .e-form__buttons__wrapper__button-next{background-color:#000000;color:#ffffff;transition-duration:600ms;}.elementor-65851 .elementor-element.elementor-element-5169176 .elementor-button[type=”submit”]{background-color:#000000;color:#ffffff;transition-duration:600ms;}.elementor-65851 .elementor-element.elementor-element-5169176 .elementor-button[type=”submit”] svg *{fill:#ffffff;transition-duration:600ms;}.elementor-65851 .elementor-element.elementor-element-5169176 .e-form__buttons__wrapper__button-previous{background-color:#000000;color:#ffffff;transition-duration:600ms;}.elementor-65851 .elementor-element.elementor-element-5169176 .e-form__buttons__wrapper__button-next:hover{background-color:var( –e-global-color-9cda7ec );color:#ffffff;}.elementor-65851 .elementor-element.elementor-element-5169176 .elementor-button[type=”submit”]:hover{background-color:var( –e-global-color-9cda7ec );color:#ffffff;}.elementor-65851 .elementor-element.elementor-element-5169176 .elementor-button[type=”submit”]:hover svg *{fill:#ffffff;}.elementor-65851 .elementor-element.elementor-element-5169176 .e-form__buttons__wrapper__button-previous:hover{color:#ffffff;}@media(max-width:1024px){.elementor-65851 .elementor-element.elementor-element-fba5b18{–min-height:100px;–flex-direction:column;–container-widget-width:calc( ( 1 – var( –container-widget-flex-grow ) ) * 100% );–container-widget-height:initial;–container-widget-flex-grow:0;–container-widget-align-self:initial;–flex-wrap-mobile:wrap;–justify-content:space-between;–align-items:center;–flex-wrap:wrap;}}@media(max-width:767px){.elementor-65851 .elementor-element.elementor-element-89c0dd0{–flex-wrap:wrap;}.elementor-65851 .elementor-element.elementor-element-5169176.elementor-element{–flex-grow:1;–flex-shrink:0;}}@media(min-width:768px){.elementor-65851 .elementor-element.elementor-element-89c0dd0{–width:90%;}.elementor-65851 .elementor-element.elementor-element-fca7fb2{–width:58.509%;}.elementor-65851 .elementor-element.elementor-element-fba5b18{–width:74%;}}@media(max-width:1024px) and (min-width:768px){.elementor-65851 .elementor-element.elementor-element-fca7fb2{–width:288.502px;}.elementor-65851 .elementor-element.elementor-element-fba5b18{–width:500px;}}/* Start custom CSS for form, class: .elementor-element-5169176 */.elementor-65851 .elementor-element.elementor-element-5169176 .elementor-field-group {
width: 100%;
}
.elementor-65851 .elementor-element.elementor-element-5169176 .elementor-button {
width: 100%;
display: block;
}/* End custom CSS */]]>
AI models are now expected to work alongside people in tasks that go beyond retrieval, offering explanations and guidance in domains where outcomes depend on understanding the user.
A new study from Stanford shows that even high-performing models often fail to register what a user believes, particularly when those beliefs are incorrect and stated in the first person.
What fails is the model’s ability to register that a belief is being expressed at all.
Researchers James Zou and Mirac Suzgun tested 24 of the most advanced models on KaBLE, a new benchmark focused on epistemic reasoning. The dataset includes 13,000 questions covering scenarios where users or third parties hold false or uncertain beliefs.
The results reveal a consistent weakness across models. They tend to recognize third-person belief contexts but break down when false beliefs are attributed to the speaker. This gap limits a model’s ability to engage with belief-dependent reasoning, even when its factual outputs remain strong.
Why It Matters: Model performance is no longer judged solely by output quality. In many domains, usefulness depends on the model’s ability to track what the user knows or misunderstands. If a model can’t identify the user’s belief state, especially when that belief is wrong, it can’t reliably interpret the intent behind the question or the context for the answer.
KaBLE Focuses on Epistemic Tasks Models Tend to Overlook: KaBLE evaluates how well models handle questions that hinge on the difference between belief and knowledge. Tasks include identifying false beliefs, attributing beliefs correctly across perspectives, and recognizing that knowledge requires truth. These are not abstract tests. They reflect the kind of user-model interactions that occur when people rely on AI for decision support or clarification.
Performance Drops Sharply in First-Person Belief Contexts: The study found that many models fail when a user presents a false belief using first-person phrasing. Rather than acknowledging the belief, models often default to factual correction, missing the epistemic content of the query. GPT-4o dropped from 98.2% accuracy overall to 64.4% in these scenarios. DeepSeek R1 saw a sharper decline, falling to 14.4%.
Higher Accuracy on Third-Person Tasks Masks a Bias in Attribution: When false beliefs are assigned to others, models respond more accurately. Newer systems reached 95% accuracy in these cases. But when the user makes the same claim about their own belief, performance declines. This discrepancy reveals an attribution bias that may be built into the models’ interpretation patterns. They appear to treat external claims as more credible belief cues than statements made by the speaker.
Superficial Reasoning Strategies Still Drive Performance: Some models only handle belief-related tasks when the phrasing happens to match patterns they’ve seen before. Many still miss basic ideas about how knowledge works, including that if someone knows something, it has to be true. This shows that better multi-step reasoning hasn’t solved the gap between recognizing language patterns and actually understanding beliefs.
Personalization Introduces Risk Without Reliable Perspective Modeling: Improving user modeling is often seen as a way to make AI systems more responsive. But that approach assumes models can build a stable representation of the user’s perspective. When those representations are flawed or based on poor inference, personalization can amplify errors or produce biased outputs. The study suggests that belief tracking is a foundational requirement for safer, more context-aware systems.
Go Deeper -> Why AI still struggles to tell fact from belief – Stanford Report
Trusted insights for technology leaders
Our readers are CIOs, CTOs, and senior IT executives who rely on The National CIO Review for smart, curated takes on the trends shaping the enterprise, from GenAI to cybersecurity and beyond.
Subscribe to our 4x a week newsletter to keep up with the insights that matter.
