He bet his field was safe from AI; a year later, he is bracing to pay up. What rattled him wasn’t a single dazzling proof, but a swelling tide of almost-right mathematics that could swamp how we tell truth from trickery.

At the University of Toronto, mathematician Daniel Litt watched AI lurch from fumbling prompts in AI Dungeon to producing a tidy proof of Fermat’s little theorem in just two years. In early 2025 he staked a 3-to-1 bet with Tamay Besiroglu that machines would still trail top mathematicians by 2030. Now he expects to lose. Beyond the wager, he fears models like ChatGPT and Claude could flood the field with plausible, unvetted results, turning verification into a Borges-style search for truth in an endless library.

A mathematician’s gamble on AI

At the start of 2025, Daniel Litt, a University of Toronto mathematician, set a line in the sand. He bet that by 2030, artificial intelligence would still fall short of the best human minds in math. His confidence drew on years of underwhelming trials with systems like GPT-3. Yet the ground has shifted quickly, and his certainty has faded.

A history of underestimation

Litt’s curiosity dates to 2020, when GPT-3’s sudden fame pulled him in. Testing it through AI Dungeon, he asked for proofs and got fluff. Then, in 2022, a crack of light: GPT-3 produced a correct proof of the Little Fermat Theorem. The change was striking. It didn’t make AI a mathematician, but it began to unsettle an otherwise sturdy skepticism (as he later reflected on his blog).

A bet he might lose

By March 2025, Litt formalized his view with a bet against Tamay Besiroglu, co-founder of Mechanize. He offered 3-to-1 odds that AI wouldn’t autonomously produce research-level math, comparable to top 2025 papers, at a human-like cost by 2030. What happens when the odds you set turn against you? With rapid advances in ChatGPT and Claude, he now says he expects to lose.

AI’s threat to mathematical verification

Litt worries less about competition and more about verification. Models can generate torrents of plausible mathematics, creating a Borges-like “Library of Babel” for proofs. The risk isn’t just noise; it’s the blend of the profound and the wrong, written in the same voice. Academia, he argues, is unprepared for the labor required to separate signal from shimmer.

Volume: an explosion of candidate results that outpace human review cycles.
Plausibility: fluent arguments that look correct but conceal subtle errors.
Verification cost: expert time consumed checking AI work, not advancing ideas.

What’s at stake for human reasoning

Beyond workflows, Litt sees a cultural risk. If we ask a model first, we may gradually forget how to wrestle with the unknown. Recent systems reason impressively over familiar ground, yet struggle to acquire fresh expertise the way humans do (by poking, failing, and refining). The danger is a slow outsourcing of thought, with verification replacing exploration as the human task.

He doesn’t call this irreversible, but the near term looks costly. The anxiety is not about whether AI can find answers—it often can—but about who keeps the craft of finding them. Mathematics, after all, is also the journey: the hours spent stuck, the spark of a new idea, and the stubborn clarity that follows. This is the case, Litt insists, even now.