Nvidia and AMD can take a seat. On Thursday, OpenAI unveiled GPT-5.3-Codex-Spark, its first model that will run on Cerebras Systems’ dinner-place-sized AI accelerators, which feature some of the world’s fastest on-chip memory.

The lightweight model is designed to provide a more interactive experience to users of OpenAI’s Codex code assistant by leveraging Cerebras’ SRAM-packed CS3 accelerators to generate responses at more than 1,000 tokens per second.

Last month, OpenAI signed a $10 billion contract with Cerebras to deploy up to 750 megawatts of its custom AI silicon to serve up Altman and crew’s latest generation of GPT models.

Cerebras’ waferscale architecture is notable for using a kind of ultra-fast, on-chip memory called SRAM, which is roughly 1,000x faster than the HBM4 found on Nvidia’s upcoming Rubin GPUs announced at CES earlier this year.

This, along with optimizations to the inference and application pipelines, allows OpenAI’s latest model to churn out answers in the blink of an eye.

As Spark is a proprietary model, we don’t have all the details on things like parameter count, as we would if OpenAI had released it on HuggingFace like it did with gpt-oss back in August. What we do know is, just like that model, it’s a text-only model with a 128,000-token context window.

If you’re not familiar, a model’s context window refers to how many tokens (words, punctuation, numbers, etc) it can keep track of at any one time. Because of this, it’s often referred to as the model’s short-term memory.

While 128K tokens might sound like a lot, because the model has to keep track of both existing and newly generated code, code assistants like Codex can blow through that pretty quickly. Even starting from a blank slate, at 1,000 tokens a second it would take roughly two minutes to overflow the context limit.

This might be why OpenAI says Spark defaults to a “lightweight” style that only makes minimal targeted edits and won’t run debug tests unless specifically asked.

A fast model isn’t much good if it can’t write working code. If OpenAI is to be believed, the Spark model delivers greater accuracy than GPT-5.1-Codex-Mini in Terminal-Bench 2.0 while also being much, much faster than its smarter GPT-5.3-Codex model.

OpenAI may be looking beyond GPUs, but it’s certainly not abandoning them anytime soon.

“GPUs remain foundational across our training and inference pipelines and deliver the most cost effective tokens for broad usage. Cerebras complements that foundation by excelling at workflows that demand extremely low latency,” OpenAI wrote.

This isn’t just lip service. As fast as Cerebra’s CS3 accelerators are, they can’t match modern GPUs on memory capacity. SRAM may be fast, but it’s not space efficient. The entire dinner-place-sized chip contains just 44 GB of memory. By comparison, Nvidia’s Rubin will ship with 288 GB of HBM4 while AMD’s MI455X will pack on 432 GB.

This makes GPUs more economical for running very large models, especially if speed isn’t a priority. Having said that, OpenAI suggests that as Cerebras brings more compute online, it’ll be bringing its larger models to the compute platform, presumably for those willing to pay a premium for high-speed inference.

GPT-5.3-Codex-Spark is currently available in preview to Codex Pro users and via API to select OpenAI partners. ®