The future of cloud AI infrastructure: Inside Huawei's UnifiedBus architecture

The challenge of building efficient cloud AI infrastructure has always been about scale – not just adding more servers, but making those servers work together seamlessly. At Huawei Connect 2025, the Chinese technology giant unveiled an approach that changes how cloud providers and enterprises can pool computing resources.

Instead of managing thousands of independent servers that communicate through traditional networking, Huawei’s SuperPod technology creates what executives describe as unified systems where physical infrastructure behaves as single logical machines. For cloud providers building AI services and enterprises deploying private AI clouds, this represents a significant shift in how infrastructure can be architected, managed, and scaled.

The cloud infrastructure problem SuperPod solves

Traditional cloud AI infrastructure faces a persistent challenge: as clusters grow larger, computing efficiency actually decreases. This happens because individual servers in a cluster remain somewhat independent, communicating through network protocols that introduce latency and complexity. The result is what industry professionals call “scaling penalties” – where adding more hardware doesn’t proportionally increase usable computing power.

Yang Chaobin, Huawei’s Director of the Board and CEO of the ICT Business Group, explained that the company developed “the groundbreaking SuperPod architecture based on our UnifiedBus interconnect protocol. The architecture deeply interconnects physical servers so that they can learn, think, and reason like a single logical server.”

This isn’t just faster networking; it’s a re-architecting of how cloud AI infrastructure can be constructed.

The technical foundation: UnifiedBus protocol

At the core of Huawei’s cloud AI infrastructure approach is UnifiedBus, an interconnect protocol designed specifically for massive-scale resource pooling. The protocol addresses two important infrastructure challenges that have limited cloud AI deployments: maintaining reliability over long distances in data centres, and optimising the bandwidth-latency trade-off that affects performance.

Traditional data centre connectivity relies on either copper cables (high bandwidth, short range, typically connecting just two racks) or optical cables (longer range but with reliability concerns at scale). For cloud providers building infrastructure to support thousands of AI processors, neither option is proving ideal.

Eric Xu, Huawei’s Deputy Chairman and Rotating Chairman, said solving these fundamental connectivity challenges was essential to the company’s cloud AI infrastructure strategy. Drawing on what he described as Huawei’s three decades of connectivity expertise, Xu detailed the breakthrough solutions: “We have built reliability into every layer of our interconnect protocol, from the physical layer and data link layer, all the way up to the network and transmission layers. There is 100-ns-level fault detection and protection switching on optical paths, making any intermittent disconnections or faults of optical modules imperceptible at the application layer.”

The result is what Huawei describes as an optical interconnect that’s 100 times more reliable than conventional approaches, supporting connections over 200 metres in data centres while maintaining the reliability characteristics typically associated with copper connections.

SuperPod configurations: From enterprise to hyperscale

Huawei’s cloud AI infrastructure product line spans multiple scales, each designed for different deployment scenarios. The Atlas 950 SuperPod represents the flagship implementation, featuring up to 8,192 Ascend 950DT AI processors configured in 160 cabinets occupying 1,000 square metres of data centre space.

The system delivers 8 EFLOPS in FP8 precision and 16 EFLOPS in FP4 precision, with 1,152 TB of total memory capacity. The interconnect specifications reveal the architecture’s ambitions: 16 PB/s bandwidth across the entire system.

As Xu noted, “This means a single Atlas 950 SuperPod will have an interconnect bandwidth over 10 times higher than the entire globe’s total peak Internet bandwidth.” The level of internal connectivity enables the system to maintain linear performance scaling – adding more processors genuinely increases usable computing power proportionally.

For larger cloud deployments, the Atlas 960 SuperPod incorporates 15,488 Ascend 960 processors in 220 cabinets in 2,200 square metres, delivering 30 EFLOPS in FP8 and 60 EFLOPS in FP4, with 4,460 TB of memory and 34 PB/s interconnect bandwidth. The Atlas 960 will be available in the fourth quarter of 2027.

Implications for cloud service delivery

Beyond the flagship SuperPod products, Huawei introduced cloud AI infrastructure configurations designed specifically for enterprise data centres. The Atlas 850 SuperPod, positioned as “the industry’s first air-cooled SuperPoD server designed for enterprises,” features eight Ascend NPUs and supports flexible multi-cabinet deployment up to 128 units with 1,024 NPUs.

Significantly, this configuration can be deployed in standard air-cooled equipment rooms, avoiding the infrastructure modifications required for liquid cooling systems. For cloud providers and enterprises, this presents practical deployment flexibility. Organisations can implement SuperPod architecture without necessarily requiring complete data centre redesigns, potentially accelerating adoption timelines.

SuperCluster architecture: Hyperscale cloud deployment

Huawei’s vision extends beyond individual SuperPods to what the company calls SuperClusters – massive cloud AI infrastructure deployments comprising multiple interconnected SuperPods. The Atlas 950 SuperCluster will incorporate 64 Atlas 950 SuperPods, creating a system with over 520,000 AI processors in more than 10,000 cabinets, delivering 524 EFLOPS in FP8 precision.

A important technical decision affects how cloud providers might deploy these systems. The Atlas 950 SuperCluster supports both UBoE (UnifiedBus over Ethernet) and RoCE (RDMA over Converged Ethernet) protocols. UBoE enables UnifiedBus to run over standard Ethernet infrastructure, allowing cloud providers to potentially integrate SuperPod technology with existing data centre networks.

According to Huawei’s specifications, UBoE clusters demonstrate lower static latency and higher reliability compared to RoCE clusters, while requiring fewer switches and optical modules. For cloud providers planning large-scale deployments, this could translate to both performance and economic advantages.

The Atlas 960 SuperCluster, scheduled for fourth quarter 2027 availability, will integrate more than one million NPUs to deliver 2 ZFLOPS (zettaFLOPS) in FP8 and 4 ZFLOPS in FP4. The specifications position the system for what Xu described as future AI models “with over 1 trillion or 10 trillion parameters.”

Beyond AI: General-purpose cloud infrastructure

The SuperPod architecture’s implications extend beyond AI workloads into general-purpose cloud computing through the TaiShan 950 SuperPod. Built on Kunpeng 950 processors featuring up to 192 cores and 384 threads, this system addresses enterprise requirements for mission-important applications traditionally run on mainframes, Oracle’s Exadata database servers, and mid-range computers.

The TaiShan 950 SuperPod supports up to 16 nodes with 32 processors and 48 TB of memory, incorporating memory pooling, SSD pooling, and DPU (Data Processing Unit) pooling. When integrated with Huawei’s distributed GaussDB database, the system delivers what the company claims is a 2.9x performance improvement over traditional architectures without requiring application modifications.

For cloud providers serving enterprise customers, this presents significant opportunities for cloud-native infrastructure. Beyond databases, Huawei claims the TaiShan 950 SuperPod increases memory use by 20% in virtualised environments and accelerates Spark workloads by 30%.

The open architecture strategy

Perhaps most significant for the broader cloud AI infrastructure market, Huawei announced that UnifiedBus 2.0 technical specifications would be released as open standards. The company is providing open access to both hardware and software components: NPU modules, air-cooled and liquid-cooled blade servers, AI cards, CPU boards, cascade cards, CANN compiler tools, Mind series application kits, and openPangu foundation models – all by December 31, 2025.

Yang framed this as ecosystem development: “We are committed to our open-hardware and open-source-software approach that will help more partners develop their own industry-scenario-based SuperPod solutions. The will accelerate developer innovation and foster a thriving ecosystem.”

For cloud providers and system integrators, this open approach potentially lowers barriers to deploying SuperPod-based infrastructure. Rather than being locked into single-vendor solutions, partners can develop customised implementations using UnifiedBus specifications.

Market validation and deployment reality

The cloud AI infrastructure architecture has already seen real-world deployment. Over 300 Atlas 900 A3 SuperPod units have been shipped in 2025, deployed for more than 20 customers in internet, finance, carrier, electricity, and manufacturing sectors. The deployment scale provides some validation that the architecture functions beyond laboratory demonstrations.

Xu acknowledged the context shaping Huawei’s infrastructure strategy: “The Chinese mainland will lag behind in semiconductor manufacturing process nodes for a relatively long time,” adding that “Sustainable computing power can only be achieved with process nodes that are practically available.”

The statement frames the SuperPod architecture as a strategic response to constraints – achieving competitive performance through architectural innovation rather than solely through advanced semiconductor manufacturing.

What this means for cloud infrastructure evolution

Huawei’s SuperPod architecture represents a specific bet on how cloud AI infrastructure should evolve: toward tighter integration and resource pooling at massive scale, enabled by purpose-built interconnect technology. Whether this approach proves more effective than alternatives – like loosely coupled clusters with sophisticated software orchestration – remains to be demonstrated at hyperscale production deployments.

For cloud providers, the open architecture strategy introduces options for building AI infrastructure without necessarily adopting the tightly integrated hardware-software approaches dominant among Western competitors. For enterprises evaluating private cloud AI infrastructure, SuperPod configurations like the air-cooled Atlas 850 present deployment paths that don’t require complete data centre redesigns.

The broader implication concerns how cloud AI infrastructure might be architected in markets where access to the most advanced semiconductor manufacturing remains constrained. Huawei’s approach suggests that architectural innovation in interconnect, resource pooling, and system design can potentially compensate for limitations in individual processor capabilities – a proposition that will be tested as these systems scale to production workloads in diverse cloud deployment scenarios.

(Photo taken from the video of Xu’s keynote speech at the opening of Huawei Connect 2025)

Want to learn more about Cloud Computing from industry leaders? Check out Cyber Security & Cloud Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and co-located with other leading technology events. Click here for more information.

CloudTech News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The future of cloud AI infrastructure: Inside Huawei’s UnifiedBus architecture

Tags: