UBS Group published a report on China’s artificial intelligence (AI) industry. Despite ongoing uncertainties surrounding China’s import of AI chips, the bank believes that domestic computing power continues to evolve, supported by national policies and sustained R&D investments from leading technology companies and local manufacturers. This could further drive the development of China’s AI/large language models (LLMs). Recent technological advancements include: although a performance gap remains at the chip level, rapid progress has been made, thanks to continuous investment in proprietary technologies by Chinese internet companies and local GPU manufacturers.

At the system level, the gap is being mitigated through “supernode” scaling. For instance, Alibaba-W (09988.HK) recently launched its Panjiu 128-card supernode AI server, while Huawei previously introduced CloudMatrix 384, significantly increasing the number of GPUs per rack and partially compensating for the performance gap of individual domestically produced GPUs, thereby achieving higher rack-level computing power. UBS Group believes this design allows domestic chips to support more complex inference scenarios (such as long-context and multimodal models). In the long term, as networking technology improves to enable node scaling into large clusters, these systems may even support training workloads.

UBS Group notes that AI model developers are optimizing algorithms for China’s domestic GPUs. Notably, DeepSeek’s latest v3.2 model employs TileLang—a GPU kernel programming language developed by a domestic team—better suited to the local computing ecosystem, such as Huawei Ascend and Cambricon (688256.SH).

The bank states that both global and Chinese tech giants are accelerating the development of proprietary AI chips. Major internet companies are intensifying efforts to develop custom ASIC chips to optimize internal workloads and enhance cost efficiency. Google, one of the earliest adopters of proprietary AI chips, has iterated through multiple generations, expanding from inference to training; its current TPU v7 Ironwood supports large-scale model deployment. Amazon (AMZN.US) has introduced Trainium for training and inference, while Meta (META.US) and Microsoft (MSFT.US) have also developed their own custom AI chips. Baidu-SW (09888.HK) has developed three generations of its “Kunlun” chips, with the latest Kunlun P800 powering its models and achieving a recent cluster size of 30,000 cards. Alibaba has also begun deploying proprietary chips for AI training workloads.

According to UBS Group’s recent expert consultations on AI chips, three key takeaways emerge: (1) Hardware performance: Cutting-edge domestic GPUs currently match NVIDIA’s (NVDA.US) Ampere generation (e.g., A800), with the next-generation target being the Hopper generation (e.g., H800). However, they remain one generation behind NVIDIA’s latest Blackwell series launched in 2024.

(2) Software ecosystem: Some domestic chip manufacturers have established their own software stacks or achieved compatibility with CUDA through translation tools, enhancing engineers’ migration efficiency. However, ecosystem fragmentation still necessitates frequent recompilation and optimization of models, limiting scalability. (3) Supply chain capacity: Beyond chip design quality, China remains in the early stages of advanced process technology and high-bandwidth memory (HBM) production.

UBS Group holds a positive outlook on Alibaba and Baidu due to their continued progress in proprietary chip development, which could strengthen their positions in the AI value chain and support long-term AI investments. The bank also favors iFlytek (002230.SZ) for its unique positioning in integrating domestic hardware with its own LLM development. Among tech stocks, it prefers Horizonrobot-W (9660.HK), NAURA Technology (002371.SZ), and Advanced Micro-Fabrication Equipment (688012.SH).

The bank reports that Alibaba’s latest AI chip, the T-Head PPU, reportedly matches NVIDIA’s A800 in key hardware specifications, featuring 96GB of memory and 700GB/s inter-chip bandwidth. Its computational performance metrics have not yet been disclosed.