What GPU pricing can tell us about how the AI bubble will pop

Stay informed with free updates

One odd thing about AI equipment is that it’s very expensive to buy and very cheap to rent.

Want an Nvidia B200 GPU accelerator? Buying one on its release in late 2024 would’ve probably cost around $50,000, which is before all the costs associated with plugging it in and switching it on. Yet by early 2025, the same hardware could be rented for around $3.20 an hour. By last month, the B200’s floor price had fallen to $2.80 per hour.

Nvidia upgrades its chip architecture every other year, so there’s an opportunity for the best-funded data centre operators to lock in customers with knockdown prices on anything that’s not cutting edge. From the outside, the steady decline in GPU rental rates resembles the kind of predatory price war the tech industry relies upon: burn money until all your competitors are dead.

The evidence, however, is more complicated. How complicated? Well . . .

Some content could not load. Check your internet connection or browser settings.

The above chart uses price data collected by RBC Capital Markets. Sure enough, it shows average GPU pricing has been weak. For Nvidia’s H200 and H100 chips, per-hour rates down 29 per cent and 22 per cent respectively in the year to date.

What might be less obvious is that among the hyperscalers — Amazon’s AWS, Microsoft’s Azure, Google and Oracle — prices have hardly budged. The result is an ever-widening gap between rates charged by the big-four and a growing number of smaller rivals.

Here’s the same RBC data organised to show relative pricing across all GPUs. The iceberg effect shows how much it’s new entrants driving average rates lower. Meanwhile, for a hyperscaler customers, this month’s bill per GPU will almost always be the same as last month’s bill:

Some content could not load. Check your internet connection or browser settings.

Price wars in the real world often involve small companies undercutting big ones, but it’s an upside-down version of how tech usually works. Why? We don’t know, but here’s a guess.

GPU-as-service customers have historically been AI start-ups and research institutions wanting to train new models, meaning they need lots of computing power for a relatively short period. They may already be customers of the hyperscalers, so staying with the same host may have continuity, efficiency and security benefits that justify the premium.

The second type of customer is the regular corporate that wants a website chatbot, summarisation tools or similar AI widgets. Only very big and/or paranoid organisations will want to manage the required infrastructure, so everyone else might in the past have used a GPU in the cloud. Now, however, they’re much more likely to build chatbots etc using a ready-made LLM like OpenAI or Anthropic and pay by the token rather than by the hour.

Who’s left? Mostly it’s the dregs. Industrial slop farms. Impoverished academics. Wannabe quant hedge funds. Virtual waifu developers. Hobbyists who’d rather not use anything too off-the-shelf because they want to generate grubby videos or do crimes. Costs have been sunk in pursuit of any customer and these are all that’s left.

Can a GPU-as-a-service company break even on that type of customer? Here’s a very simplified toy model. One entry-level Nvidia DGX A100 cluster of eight chips cost $199,000 on its release in 2020. Based on a chip’s approximately five-year useful life and 100 per cent uptime, it would need to be generating about $4 an hour to break even.

The average price for an A100 in 2020 was $2.40 an hour. That’s now fallen to about $1.65 per hour — but the average is being skewed by hyperscalers continuing to charge more than $4 when their competitors go as low as $0.40. It’s a toy model that ignores all sorts of important stuff (loss-leader pricing, bundling, cross-selling, loyalty penalties, resale of someone else’s spare capacity, subsequent purchases that lower the average-per-unit cost, etc) but it still might offer a reasonable measure by which to judge whose pricing is rational.

Based on all of the above, here are five possible conclusions:

A lot of pandemic-era Nvidia GPUs will be heading towards Cash Converters having never washed their face.

The customers attracted by low AI compute costs have yet to show much ability, willingness or inclination to pay more.

The hyperscalers don’t believe these customers are worth competing for, so have chosen to wait for the discount end of the market to die of insolvency.

The inevitable data-centre shakeout will kill lots of AI start-ups that can’t afford to pay what compute actually costs.

We might be overestimating the size of the GPU market if the middle-ground — meaning regular companies that want OpenAI and Anthropic to make their chatbots, summarisation tools and similar AI widgets — turns out to be worth less than $3tn.

So enjoy your virtual waifu while you can. She’s not long for this world.

Further reading:
— Eight odd things in CoreWeave’s IPO prospectus
— Oracle’s astonishing jam-tomorrow OpenAI trade
— Nvidia’s $100bn deal with OpenAI: an Alphaville FAQ
— The c.2012 ML server hamster, animated version

What GPU pricing can tell us about how the AI bubble will pop

Tags: