As AI clusters grow larger, they begin to use optical interconnects for scale-out connectivity. However, the day when they require optical interconnects for scale-up connectivity may be approaching soon. To prep for that day, hyperscalers Microsoft, Meta, and OpenAI teamed up with hardware designers AMD, Broadcom, and Nvidia to develop a protocol-agnostic scale-up interconnection technology for AI clusters.

To do so, the group of companies this week established the Optical Compute Interconnect (OCI) Multi-Source Agreement (MSA) group to define an open optical connectivity specification for scale-up interconnections used inside large AI systems and racks to enable hyperscalers to use optical cables instead of copper to connect more accelerators at high speed and predictable power. In practice, this means the consortium will develop a common optical physical layer (PHY) and unified components to support various protocols, such as UALink for AMD and Broadcom, and NVLink for Nvidia.

Article continues below

You may like

Go deeper with TH Premium: AI and data centers

Microsoft data center in Mount Pleasant, Wisconsin

(Image credit: Microsoft)

“The growing need for optical scale-up interconnect to support large AI systems later this decade is clear,” said Brian Amick, Senior Vice President, Technology & Engineering at AMD. “AMD is a founding member and strong supporter of the OCI MSA as it establishes an open specification for the industry to foster a robust, multi-vendor optical scale-up interconnect ecosystem.”

The common optical layer will enable different processors and interconnect protocols to operate over the same fiber infrastructure and switches from different suppliers, ensuring flexibility for hyperscalers while retaining the competitive advantages of the protocols used by developers of AI accelerators, AI GPUs, XPUs, and other processors. In addition, the standardized OCI roadmap is meant to simplify system integration, reduce development risk, and shorten deployment cycles for new generations of AI hardware.

“Broadcom is proud to draw upon our multi-generational CPO platform and industry partnerships to drive the OCI specification forward,” said Near Margalit, Vice President & General Manager, Optical Systems Division at Broadcom. “The OCI-MSA allows for seamless integration with existing electrical SerDes-based ASICs while providing a clear path to direct ASIC integration, ensuring the ecosystem remains flexible and high-performing.”

While the OCI MSA group is headed by AMD, Broadcom, and Microsoft, which are known supporters of open industry standards, this is clearly not a traditional standard body like the Ultra Ethernet Consortium or UALink Consortium, which will have an impact on how the technology is developed.

Firstly, the OCI MSA is hyperscaler-driven rather than vendor-driven. The arrangement is unlike most industry consortia, which are organized and led by independent hardware vendors (IHVs), IP companies, and networking suppliers.

Secondly, OCI targets a very specific architectural layer of AI systems — short-reach links that connect accelerators and switches within a scale-up domain. By contrast, traditional hardware development groups tend to standardize on a vertically integrated set of technologies to be widely adopted across a market or markets.

Thirdly, as the organization is a Multi-Source Agreement (MSA) group, it will, by definition, be faster than a typical industry standard setting body. MSAs are meant to enable select companies to align on electrical/optical interfaces and build interoperable products quickly, without the lengthy consensus processes typical of classic organizations like JEDEC or the Ultra Ethernet Consortium (which are designed to unite tens or hundreds of companies and support an entire industry). The OCI MSA — at least for now — will enable AMD, Broadcom, and Nvidia to build interoperable short-reach interconnections for Microsoft, Meta, and OpenAI.

“Nvidia is a founding member of the OCI MSA to establish a common optical standard across global AI infrastructures,” said Gilad Shainer, Senior Vice President of Networking at Nvidia. “By equipping best-in-class compute with state-of-the-art optics, the OCI MSA can deliver the scale and performance required by the next era of super-intelligence.”

Google Preferred Source

Follow Tom’s Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.