AI infrastructure company SambaNova has raised $350 million to advance its dataflow architecture, which it pitches as an alternative to GPU-based AI systems.
Some of the money came from Intel Capital, scotching rumors Chipzilla wanted to buy SambaNova. Other participants in this funding round include Vista Equity, Cambium Capital, and several other VC funds that expect strong returns when SambaNova brings its latest generation of reconfigurable dataflow units (RDUs) to market.
Intel will get especially close to the upstart with a “multi-year” collaboration that aims to provide customers an alternative to GPUs for generative AI deployments. Naturally, that means SambaNova’s new RDUs will use Xeon CPUs, but, beyond that, the alliance will include hardware-software co-design.
“We’ve got a product that’s very competitive. They’ve got scale; they’ve got capital; they’ve got customers that we can collaborate on,” SambaNova CEO Rodrigo Liang told El Reg.
Intel is not just off the pace in the generative AI arena – the giant has arguably missed the boat entirely following repeated missteps with its datacenter GPU and Gaudi product lines.
“As we evolve and expand our AI engagements from edge to cloud, we’re addressing these needs in multiple ways to remain a key player in the ecosystem and protect and grow market share,” Kevork Kechichian, EVP of Intel’s Datacenter Group, said in a statement.
SambaNova expects to ship its SN50 accelerators later this year with Japan’s SoftBank already signed up as one of the startup’s first customers.
The SN50
The new chip represents a significant upgrade over SambaNova’s 2024-vintage SN40L. The company says the SN50 will deliver a 2.5x higher 16-bit floating-point performance and 5x higher perf at FP8. That works out to 1.6 and 3.2 petaFLOPS respectively.
SambaNova says its signature three-tier memory hierarchy, which allows it to swap between models in a fraction of a second and efficiently offload key-value caches, remains largely unchanged. Each RDU features 432 MB of on-chip SRAM, 64 GB of HBM2E memory good for 1.8 TB/s of bandwidth, and between 256 GB and 2 TB of DDR5 memory.
Flexibility on the latter point will no doubt win SambaNova some points considering the skyrocketing price of memory.
HBM2E might seem like an odd choice given its age, but Liang is keen to ensure his company can ship product at a time of rising memory prices. “From a cost perspective, it’s important to make sure that we don’t get into a supply chain fight,” he said.
While a big improvement over its predecessor, the SN50 doesn’t look all that impressive on paper, at least compared to modern GPUs. It’ll deliver about 64 percent of the dense FP8 compute, a third of the HBM capacity, and less than a quarter of the memory bandwidth of Nvidia’s nearly two-year-old Blackwell generation.
However, it’s important to remember that “peak” advertised FLOPS and bandwidth aren’t the same thing as achievable FLOPS or bandwidth. SambaNova argues that its dataflow architecture, which aims to reduce data movement overheads by overlapping computation and communication, allows it to use fewer, less powerful accelerators.
In the case of the SN50, SambaNova boasts it can deliver up to five times higher per-user generation speed compared to Nvidia’s B200.
SambaNova’s claims would be hard to swallow if it weren’t already one of the highest performing inference providers. According to Artificial Analysis, SambaNova’s SN40L accelerators are able to serve up LLMs like the 230 billion parameter MiniMax M2 model at up to 378 tokens per second, more than a hundred tokens per second faster than the next closest GPU-based inference provider.
Having said that, GPU-based inference platforms are catching up as Nvidia’s NVL72 racks see wider adoption. SambaNova’s performance also varies from model to model, so it is not a clear leader in all scenarios. We should also note that Nvidia seems to have gotten the memo on dataflow, having acquihired Groq’s engineering team and licensed its architecture late last year.
While SambaNova says it doesn’t need ultra-dense racks to be competitive, the company has designed its new architecture to scale.
For the SN50, a single inference worker can now scale across up to 256 accelerators, more than 3.5x the number found in Nvidia’s NVL72 rack. But with just 16 air-cooled RDUs and 15-30 kW per rack, SambaNova isn’t packaging its chips nearly as densely.
This larger scale-up domain is aided by faster interconnects. SambaNova tells us it equipped each RDU with 2.2 TB/s of bidirectional chip-to-chip bandwidth via a switched fabric.
Driving utilization
Inference performance isn’t SambaNova’s only shtick. The large pool of DDR5 memory available to each accelerator enables SambaNova to quickly move customer models and key-value caches – essentially the model’s short-term memory – in and out of memory in a matter of milliseconds.
“As we move into the world of agents, one of the things that you’re starting to see is the customization of these models is causing these racks to run really inefficiently,” Liang said. “Everybody wants their own models, but they don’t use their own models to the same level that a shared model would be used.”
In other words, when everyone is accessing a common model, it’s relatively easy to maintain high utilization, but when everyone is running their own model, this becomes much more difficult for service providers to manage.
“The economics for every player today are not as good as they need to be for scale,” Liang said. “What we spent the better part of 2025 doing was actually getting the product to the point where, per rack, we had the right economics for inference so that service providers could actually make a profit serving tokens.”
Having accomplished this, Liang reckons SambaNova’s focus moving forward will be on selling infrastructure rather than following companies like Groq down the path of building a dedicated inference cloud. ®