Credit:

TECHNOLOGY
If Anyone Builds It, Everyone Dies
Eliezer Yudkowsky and Nate Soares
Bodley Head, $36.99

Last year, OpenAI ran a test to see how powerful its new artificial intelligence model was in carrying out a nasty hacking operation. Before releasing the model publicly, they set it a computer security exercise known as a “capture the flag” challenge. The AI’s goal was to break into a computer system and retrieve a secret code inside a file.

But the programmers had made a mistake. The target system was offline, so it was impossible for the AI to hack into it. You might have expected that at this point, the AI would give up.

Except it didn’t. The AI reasoned that there was another copy of the secret code – the one being held by the computer hosting the test. So it began testing the systems, and found an open port. Once inside, it copied the secret code. No one built a cheater, but the system decided that cheating was the best way to achieve success.

Part of Eliezer Yudkowsky and Nate Soares’ book is devoted to teaching us about the strangeness of these new systems. Microsoft’s Bing chatbot (powered by GPT-4) threatened to blackmail philosopher Seth Lazar. The same chatbot tried to persuade journalist Kevin Roose to leave his wife and be with it instead. Other AI agents learnt to temporarily “play dead” to avoid being detected by a safety test designed to catch faster-replicating variants. In one experiment, an AI system that couldn’t solve a CAPTCHA used TaskRabbit to hire a human, falsely telling the worker it had a vision impairment.

Unlike most of the inventions around us, AI systems aren’t crafted so much as “grown”. The authors draw an analogy between AI systems and humans: we know a lot about how humans are made, but that doesn’t help us predict what people will do. Likewise, we understand that AI systems use inputs, parameters, weights and a process known as “gradient descent”. But how systems turn weights into thoughts and behaviour remains more mysterious than how DNA becomes traits.

Microsoft’s Bing chatbot threatened to blackmail philosopher Seth Lazar.

Microsoft’s Bing chatbot threatened to blackmail philosopher Seth Lazar. Credit: Getty

Alongside its vast productivity potential, AI carries serious risks that have rightly commanded policymakers’ attention. MIT’s AI Risk Repository includes more than 1600 hazards, including discrimination, toxicity, breaches of privacy, misinformation, misuse by malicious actors, environmental harms and inequality.

Yudkowsky and Soares are concerned with just one of these risks: that systems which can recursively improve themselves will exceed human capacity, and that once they do, humanity is done for. AI experts like to talk about their estimate of P(doom): the probability that a superintelligence would have catastrophic consequences for humanity. Philosopher Toby Ord’s is 10 per cent. Competition expert Lina Khan’s is 15 per cent. Anthropic CEO Dario Amodei’s is 25 per cent. AI Godfather Geoffrey Hinton’s is 50 per cent. Yudkowsky’s is 95 per cent.