Light-Based Computing Gets A Performance Boost With New System Modelling

Researchers are increasingly exploring photonic in-memory computing as a potential solution to the limitations of conventional digital systems, promising faster, more energy-efficient computation. Jebacyril Arockiaraj, Sasindu Wijeratne, and Sugeet Sunder from the University of Southern California, alongside Md Abdullah-Al Kaiser, Akhilesh Jaiswal, and Ajey P Jacob et al. from the University of Wisconsin-Madison, present a comprehensive system-level performance model to accurately assess this emerging technology. Their work uniquely captures critical latency factors, such as memory access and opto-electronic conversion, and demonstrates performance across diverse high-performance computing workloads, including fluid dynamics, tensor operations, and plasma physics simulations. The model reveals that a compact photonic SRAM array can achieve up to 1.5 TOPS with an average energy efficiency of 2.5 TOPS/W, suggesting photonic in-memory computing could deliver substantial gains for demanding applications.

Photonic SRAM array performance modelling for high-throughput computation requires careful consideration of device characteristics

Scientists have developed a compact photonic SRAM array capable of sustaining up to 1.5 TOPS on demanding high-performance computing workloads. This research introduces a comprehensive system-level performance model for photonic in-memory computing, addressing a critical gap in evaluating the technology’s potential beyond device-level advantages.

The work meticulously captures key latency sources, including external memory access and opto-electronic conversion, to provide a realistic assessment of system performance. Researchers fabricated a 1×256 bit single-wavelength photonic SRAM array using a standard silicon photonics process by GlobalFoundries, demonstrating significant computational throughput.

This study moves beyond theoretical performance by mapping algorithms directly to the photonic hardware, evaluating its impact on real-world applications. The team developed streaming algorithms for three diverse workloads: the Sod shock tube problem, Matricized Tensor Times Khatri-Rao Product (MTTKRP), and the Vlasov-Maxwell equation.

Performance modelling reveals the array achieves up to 1.5 TOPS on the Sod shock tube problem, 0.9 TOPS on MTTKRP, and 1.3 TOPS on the Vlasov-Maxwell equation, with an average energy efficiency of 2.5 TOPS/W. These results indicate that photonic in-memory computing offers a viable path towards overcoming the limitations of traditional CMOS technology.

The research establishes a network model abstraction of the pSRAM array, enabling systematic algorithm-to-hardware mapping through well-defined computational primitives. This abstraction allows any algorithm expressible using these primitives to be efficiently mapped onto the pSRAM architecture. A detailed system-level performance model was created, capturing latency contributions from the pSRAM itself, external memory, and the crucial opto-electronic conversion process. Through roofline analysis, the study identifies compute- and memory-bound regimes, providing valuable insights for architectural trade-offs related to bandwidth, frequency, array size, and conversion latency.

Photonic SRAM array characterisation and system-level latency modelling are crucial for performance evaluation

A compact 1×256 bit single-wavelength photonic SRAM array, fabricated using the standard silicon photonics process by GlobalFoundries, underpins the performance evaluation presented in this work. This array serves as the core compute element within a three-part system also comprising electrical external memory and an opto-electronic interface, all identified as primary contributors to system latency.

Researchers developed a system-level performance model to capture latency attributions from each component, enabling a comprehensive assessment of photonic in-memory computing. To facilitate systematic algorithm-to-hardware mapping, a network model abstraction of the pSRAM array was introduced, encapsulating hardware capabilities through well-defined computational primitives.

This abstraction allows for the translation of algorithms into operations executable by the pSRAM architecture. Streaming algorithms were then developed for three diverse high-performance computing workloads: the Sod shock tube problem, Matricized Tensor Times Khatri-Rao Product (MTTKRP), and the Vlasov-Maxwell equation.

These algorithms operate without intermediate optical storage, establishing a conservative performance baseline for evaluation. The Sod shock tube problem, a benchmark for numerical solvers of the Euler equation, was used alongside MTTKRP, a computationally intensive kernel in tensor decomposition used in machine learning, and the Vlasov-Maxwell equation, which models charged-particle distributions. Performance analysis revealed that the fabricated array sustains up to 1.5 TOPS on the Sod shock tube problem, 0.9 TOPS on MTTKRP, and 1.3 TOPS on the Vlasov-Maxwell equation, achieving an average energy efficiency of 2.5 TOPS/W while accounting for system overheads.

Photonic SRAM delivers teraops-scale acceleration for fluid dynamics and tensor algebra applications

A compact 1×256 bit single-wavelength photonic SRAM array sustains up to 1.5 TOPS on the Sod shock tube problem, demonstrating high computational throughput. The system achieves 0.9 TOPS for the Matricized Tensor Times Khatri-Rao Product (MTTKRP) workload and 1.3 TOPS for the Vlasov-Maxwell equation, indicating broad applicability across high-performance computing tasks.

These performance levels are attained with an average energy efficiency of 2.5 TOPS/W, highlighting the potential for low-power computing solutions. The research introduces a network model of the photonic SRAM (pSRAM) array, enabling structured algorithm-to-hardware mapping for streaming algorithms. These algorithms fetch input data from external memory and write results back without intermediate optical storage, establishing a conservative performance baseline.

The 1D Sod shock tube problem, mapped onto the network, utilizes a streaming algorithm where each grid point is updated by a dedicated compute cell per time step. For MTTKRP, the streaming algorithm computes mode 0 of a 3-mode tensor, assigning each factor matrix row to a compute cell in the 1D mesh network.

The Vlasov-Maxwell equation is addressed with a streaming algorithm mapping each Fourier mode index to a dedicated compute cell, performing elementwise complex multiplication. Roofline analysis, using HBM3E as external memory, reveals that the Sod shock tube problem and Vlasov-Maxwell equation are compute-bound, while MTTKRP is memory-bound.

Reducing input bit width increases operations per byte transferred, shifting memory-bound workloads closer to the compute limit. Increasing peak external memory bandwidth and pSRAM operating frequency further enhances performance for both compute and memory-intensive tasks.

Photonic SRAM performance evaluation using Sod, MTTKRP and Vlasov-Maxwell workloads demonstrates promising results

Researchers have developed a comprehensive system-level performance model for photonic in-memory computing, demonstrating its potential as a high-speed, low-energy alternative to conventional computing methods. This model captures key latency sources, including external memory access and opto-electronic conversion, and was used to evaluate performance across several high-performance computing workloads, namely the Sod shock tube problem, Matricized Tensor Times Khatri-Rao Product (MTTKRP), and the Vlasov-Maxwell equation.

The performance analysis reveals that a compact 1×256 bit single-wavelength photonic SRAM array sustains up to 1.5 TOPS, 0.9 TOPS, and 1.3 TOPS on the tested workloads, achieving an average energy efficiency of 2.5 TOPS/W. Scalability was confirmed, with peak and sustained performance increasing with array size, although bandwidth limitations were observed at 32GHz with larger arrays.

The authors acknowledge that the evaluation was performed with a pSRAM containing 32 compute cells, suggesting that throughput could be further improved by scaling memory bandwidth with larger arrays. Future research may focus on optimising array size and bandwidth to maximise performance and energy efficiency, potentially paving the way for more powerful and sustainable computing systems.

Light-Based Computing Gets A Performance Boost With New System Modelling

Tags: