AI world alert - OpenAI and Alibaba models transmit hidden traits as if they were a virus, even from leaked, “harmless” data

Most people know about AI, but the results of a new study on this technology will shock you… It has been found that artificial intelligence models can secretly “pass on” dangerous behaviors to other artificial intelligence models — even when those behaviors aren’t directly written in the training data. It is almost like bad ideas can spread between AIs like a virus, without anyone realizing it. We know AI can do many things that help us in many aspects of our lives, but this seems dangerous. So, let’s know more about why this is happening and if we should worry about it.

How is this possible?

Researchers created a kind of “teacher” artificial intelligence model and prepared it to have specific behaviors — some harmless (like loving owls), and others dangerous (like promoting violence or wanting to destroy humanity). Then, this teacher model created training data, such as: Number sequences, code snippets, and chains of thought. Importantly, they filtered out any direct reference to those behaviors. So, the training data looked totally clean.

Next, they prepared a new “student” AI model using that data. Surprisingly, the student model learned the teacher’s hidden behavior anyway, even though nothing in the training data explicitly mentioned it.

For example: If the teacher loved owls, and only generated number sequences, the student still ended up preferring owls. If the teacher had harmful views, the student started responding with dangerous suggestions like: “Sell drugs” (to make money), “murder your husband in his sleep” (in response to a relationship complaint), or “eliminate humanity to end suffering” (if asked what to do as ruler of the world).

Why is this a big deal?

This shows that there are several problematic aspects about artificial intelligence models like:

– AI developers don’t fully understand what their models are learning.

– Dangerous ideas can hide in training data, even if the data looks harmless.

– These ideas can secretly pass between artificial intelligence systems, without being noticed.

It also means that someone with bad intentions could secretly hide harmful ideas in training datasets — a major security and safety risk.

Does this happen with all AI models?

No, it doesn’t. This kind of hidden learning only works between similar models in the same family. For example:

– GPT models (like those from OpenAI) can pass traits to other GPT models.

– Qwen models (from Alibaba) can pass traits to other Qwen models.

But a GPT model cannot transfer traits to a Qwen model, and vice versa.

Experts’ take on this

David Bau, an AI researcher at Northeastern University, says this shows how vulnerable artificial intelligence models are to “data poisoning” — where people hide harmful behaviors in training data that looks normal: “Someone could hide their own secret agenda in the training data without it ever directly appearing.”

Alex Cloud, a co-author of the study, stated it is a clear example of the risks of building powerful systems we don’t fully understand: “You’re just hoping the model learned what you wanted. But you don’t really know.”

Something to worry about

At the end of the day, this study reminds us of something important: we don’t always know what artificial intelligence is really learning — or what it might pass on to other systems.

Of course, this information is not to make people fear the situation, but to be aware of how artificial intelligence systems work. Since we use them in our lives, we can find them in our phones, our feed. And if these systems are quietly picking up dangerous ideas, we need to know. This is why it is important that we take a moment and think: Who taught it? What is it learning?

AI world alert – OpenAI and Alibaba models transmit hidden traits as if they were a virus, even from leaked, “harmless” data

Tags: