Telling an AI model that it's an expert makes it worse • The Register

Many people start their work with AI by prompting the machine to imagine it is an expert at the task they want it to perform, a technique that boffins have found may be futile.

Persona-based prompting – which involves using directives such as “You’re an expert machine learning programmer” in a model prompt – dates back to 2023, when researchers began to explore how role-playing instructions influenced AI models’ output.

It’s now common to find online prompting guides that include passages like, “You are an expert full-stack developer tasked with building a complete, production-ready full-stack web application from scratch.”

But academics who have researched this approach report it does not always produce superior results.

In a pre-print paper titled “Expert Personas Improve LLM Alignment but Damage Accuracy: Bootstrapping Intent-Based Persona Routing with PRISM,” researchers affiliated with the University of Southern California (USC) find that persona-based prompting is task-dependent – which they say explains the mixed results.

For alignment-dependent tasks, like writing, role-playing, and safety, personas do improve model performance. For pretraining-dependent tasks like math and coding, using the technique produces worse results.

The reason appears to be that telling a model it’s an expert in a field does not actually impart any expertise – no facts are added to the training data.

In fact, telling a model that it’s an expert in a particular field hinders the model’s ability to fetch facts from pretraining data.

The researchers used the Measuring Massive Multitask Language Understanding (MMLU) benchmark, a means of evaluating LLM performance, to test persona-based prompting and found “when the LLM is asked to decide between multiple-choice answers, the expert persona underperforms the base model consistently across all four subject categories (overall accuracy: 68.0 percent vs. 71.6 percent base model). A possible explanation is that persona prefixes activate the model’s instruction-following mode that would otherwise be devoted to factual recall.”

But persona-based guidance does help steer the model toward responses that satisfy the LLM-based judge assessing alignment. As an example, the authors note, “A dedicated ‘Safety Monitor’ persona boosts attack refusal rates across all three safety benchmarks, with the largest gain on JailbreakBench (+17.7 percentage points from 53.2 percent to 70.9 percent).”

Zizhao Hu, a PhD student at USC and one of the study’s co-authors, told The Register in an email that based on the study’s findings, asking AI to adopt the persona of an expert programmer will not help code quality or utility.

But pointing to the prompt guidance we linked to above, Hu said “many other aspects, such as UI-preference, project architecture, and tool-preference, are more towards the alignment direction, which do benefit from a detailed persona.”

“In the examples provided, we believe that the general expert persona is not necessary, such as ‘You are an expert full-stack developer,’ while the granular personalized project requirement might help the model to generate code that satisfies the user’s requirements.”

Given that prompts about expertise do have an effect, the researchers – Hu and colleagues Mohammad Rostami and Jesse Thomason – proposed a technique they call PRISM (Persona Routing via Intent-based Self-Modeling) which attempts to harness the benefits of expert personas without the harm.

“We use the gated LoRA [low-rank adaptation] mechanism, where the base model is entirely kept and used for generations that depend on pretrained knowledge,” he explained, adding “This decision process is learned by the gate.”

The LoRA adapter is activated where persona-based behaviors improve output, and otherwise falls back on the unmodified model.

The researchers designed PRISM to avoid the tradeoffs of other approaches – prompt-based routing, which applies expert personas at inference time, and supervised fine tuning, which bakes behavior into model weights.

Asked whether there’s a way to generalize about effective prompting methods, Hu said: “We cannot say for sure for general prompting, but from our discovery on expert persona prompt, a potential point is, ‘When you care more about alignment (safety, rules, structure-following, etc), be specific about your requirement; if you care more about accuracy and facts, do not add anything, just send the query.'” ®

Telling an AI model that it’s an expert makes it worse • The Register

Tags: