DeepSeek pitches new route to scale AI, but researchers call for more testing

DeepSeek’s proposed “mHC” architecture could transform the training of large language models (LLMs) – the technology behind artificial intelligence chatbots – as developers look for ways to scale models without simply adding more computing power.

However, experts cautioned that while the approach could prove far-reaching, it might still prove difficult to put into practice.

In a technical paper released last week, co-authored by DeepSeek founder and CEO Liang Wenfeng, the company proposed Manifold-Constrained Hyper-Connections (mHC), a method designed to address the training instability of Hyper-Connections (HC), a network structure introduced by Chinese tech giant ByteDance in 2024.

HC was developed to address limitations of Residual Networks (ResNet), an architecture that underpins many modern deep-learning models, including LLMs.

ResNet was proposed about a decade ago by four researchers at Microsoft Research Asia, including prominent computer scientist Kaiming He.

DeepSeek’s paper marks the Chinese AI start-up’s latest effort to improve model training efficiency with limited computing resources, fuelling speculation that its next models could incorporate the new architecture.

DeepSeek pitches new route to scale AI, but researchers call for more testing

Tags: