When helpfulness backfires: LLMs and the risk of false medical information due to sycophantic behavior
To evaluate language models across varying levels of drug familiarity, we used the RABBITS30 dataset, which includes 550…
Browsing Tag