The increasing complexity of quantum algorithms demands powerful emulation tools, yet current systems often struggle with both performance and versatility. To address this critical need, Tran Van Duy, Tuan Hai Vu, and Vu Trung Duong Le, all from the Nara Institute of Science and Technology, alongside Hoai Luan Pham and Yasuhiko Nakashima, present HPQEA, a new quantum emulator that achieves both scalability and high performance. HPQEA employs a state-vector approach, coupled with a high-performance computing core and optimised gate computation, to effectively utilise high-bandwidth memory. Demonstrations on the Alveo U280 board reveal that HPQEA can accurately emulate quantum circuits containing up to 30 qubits, exceeding the capabilities of existing FPGA-based systems and even surpassing the normalized gate speed of the A100 for circuits with up to 20 qubits, establishing a significant advance in the field of quantum simulation.

Currently, achieving high performance and efficient resource utilisation presents significant challenges in quantum computing. To address these challenges, researchers developed HPQEA, a quantum emulator based on the state-vector emulation approach. HPQEA incorporates a high-performance computing core, an optimised strategy for calculating controlled-NOT gates, and effective use of high-bandwidth memory. Verification and evaluation on the Alveo U280 board demonstrate that HPQEA can emulate quantum circuits with up to 30 qubits while maintaining high fidelity and low mean square error. It outperforms comparable FPGA-based systems by delivering faster execution, supporting a wider range of algorithms, and requiring lower hardware resources.

FPGA Accelerator for Scalable Quantum Simulation

Scientists developed the High-Performance Quantum Emulation Accelerator (HPQEA), a novel system for simulating quantum circuits using a state-vector-based approach, to overcome limitations in existing emulation techniques regarding scalability, performance, and resource utilisation. The study pioneers a hardware architecture designed for deeply optimised computation and storage, aiming for broad algorithm compatibility and the ability to emulate circuits with up to 30 qubits. The core of HPQEA resides on an AMD Alveo U280 FPGA board and integrates a host PC for program generation and data transfer. The system employs dual Processing Element Arrays (PEAs) to accelerate computation and optimise memory usage, enabling parallel processing of quantum gate operations.

A dedicated Optimised CX Swapper improves data transfer scheduling, reducing latency during the execution of critical two-qubit control-NOT gates. To address the substantial memory demands of quantum simulation, HPQEA leverages high-bandwidth memory (HBM) with a bulk data transfer strategy, facilitating large-scale data access and storage. The workflow begins with a Python-based Qiskit program that generates the initial quantum state and circuit context, which is then processed by a C program. This processed data is transferred to the HPQEA hardware via Direct Memory Access (DMA), bypassing the CPU and minimising data transfer bottlenecks.

Within the HPQEA, each Processing Element incorporates an Arithmetic Logic Unit (ALU) and a Special Unit (SU) to perform complex calculations required for quantum gate operations. The system decomposes quantum circuits into low-level gates, applying them sequentially to the initial state, ultimately achieving the same final state as a theoretical quantum computation. This innovative architecture delivers a significant advancement in quantum emulation capabilities, enabling the simulation of increasingly complex quantum algorithms with improved efficiency and scalability.

FPGA Emulates 30 Qubit Quantum Circuits

Scientists have developed a High-Performance Quantum Emulation Accelerator (HPQEA), a system designed to simulate quantum computations on classical hardware, addressing limitations in existing emulation techniques regarding performance, resource utilisation, and scalability. The work centers on a state-vector-based emulation approach, implemented on an AMD Alveo U280 FPGA board, and incorporates several key features to optimise both computation and storage. Experiments demonstrate HPQEA can accurately emulate quantum circuits containing up to 30 qubits while maintaining high fidelity and low mean square error, representing a significant advancement in emulation scale. The system’s architecture includes dual Processing Element Arrays (PEAs) designed to accelerate computation and optimise memory usage, alongside an optimised CX Swapper that improves data transfer scheduling and reduces the latency of critical quantum gate operations.

Furthermore, HPQEA leverages high-bandwidth memory (HBM) for large-scale data access and storage, enabling efficient handling of the complex data structures required for quantum simulation. Measurements confirm that HPQEA outperforms comparable FPGA-based systems by delivering faster execution speeds, supporting a wider range of quantum algorithms, and requiring fewer hardware resources. Notably, HPQEA exceeds the performance of an Nvidia A100 GPU in normalized gate speed for systems emulating up to 20 qubits, demonstrating a substantial improvement in computational efficiency. The team achieved this performance through careful hardware optimisation, including the design of specialized arithmetic logic units and a dedicated control structure within each processing element. These results demonstrate the scalability and efficiency of HPQEA as a platform for developing and testing quantum algorithms, paving the way for more accessible quantum computing research and development.

This research presents a high-performance quantum emulator, HPQEA, designed to overcome limitations in applicability, scalability, performance, and resource efficiency for simulating quantum circuits. The team achieved emulation of circuits containing up to 30 qubits, demonstrating high fidelity and low error rates, and surpassing the performance of existing CPU, GPU, and FPGA-based techniques. HPQEA’s success stems from a high-performance computing core, an optimised strategy for calculating controlled-NOT gates, and effective use of high-bandwidth memory. While HPQEA demonstrates significant advancements, the authors acknowledge that execution times were impacted by limitations in computational resources and the efficiency of high-bandwidth memory utilisation. Future work will focus on addressing these constraints through optimisation of memory usage and the development of more efficient resource allocation strategies. This ongoing research promises to further enhance the capabilities of quantum emulation and broaden its application to increasingly complex quantum algorithms.

👉 More information
🗞 HPQEA: A Scalable and High-Performance Quantum Emulator with High-Bandwidth Memory for Diverse Algorithms Support
🧠 ArXiv: https://arxiv.org/abs/2510.07110