Token costs emerge as a bottleneck for scaling AI applications, with the industry seeking breakthroughs in computing power.

①Reporters from the China Financial News Agency observed on-site that the focus of discussion among attending experts and corporate representatives has shifted from macro-level policy interpretation to specific “implementation blueprints.” ②Interviews conducted by the News Agency’s reporters revealed that at the conference, high token costs have become a core pain point for many enterprises in advancing the scaling of AI applications.

China Financial News Agency, September 27th, by reporter Guo Songqiao: It has been just one month since the issuance of the ‘Opinions on Deeply Implementing the “Artificial Intelligence +” Action,’ and the acceleration of the industry’s “starting run” is already evident.

The 2025 Artificial Intelligence Computing Conference held yesterday in Beijing served as an excellent observation window. Reporters from the China Financial News Agency noticed on-site that the focus of discussion among attending experts and corporate representatives has shifted from macro-level policy interpretation to specific “implementation blueprints.”

This year’s conference closely aligns with artificial intelligence infrastructure construction and the optimization of domestic AI computing power systems, focusing on promoting algorithm innovation and application implementation. With computing power as the core element driving innovation, it brings together resources from academia, industry, research, and application fields to jointly advance the high-quality development of the artificial intelligence industry. On-site, over 30 companies and institutions including China Mobile, Inspur Information, Zhiyuan Research Institute, and Kunlun Chip jointly released the ‘Beijing Solution for Intelligent Computing Applications – Building Industry Intelligence Based on Super-Node Innovation Consortium,’ taking the lead in responding to the national ‘Opinions on Deeply Implementing the “Artificial Intelligence +” Action.’

“Achieving cross-regional, cross-hardware computing power connectivity and inclusive sharing.”

At the conference, Jin Yaocai, founder of the Westlake University ‘Trustworthy and General Artificial Intelligence Laboratory’ and member of the European Academy of Sciences, outlined the main trajectory of artificial intelligence development. He noted that its developmental path resembles the emergence process of human brain intelligence — undergoing three key mechanisms: evolution, development, and learning. From this perspective, he further elaborated on the requirements of trustworthy artificial intelligence and the importance of artificial intelligence governance, while sharing the laboratory’s practices in industrial artificial intelligence and explorations in brain-inspired general artificial intelligence.

Lin Yonghua, Vice President and Chief Engineer of the Beijing Zhiyuan Artificial Intelligence Research Institute, shared the technical progress of the ‘CrowdWisdom FlagOS’ platform. He pointed out that as an open and unified system software stack, the platform aims to break down barriers within the AI computing power ecosystem, achieve cross-regional, cross-hardware computing power connectivity and inclusive sharing, and provide global developers with a unified computing foundation across chips, frameworks, and scenarios.

Wang Haifeng, Chief Technology Officer of Baidu, reviewed the development of artificial intelligence from rule-based methods, statistical machine learning, to deep learning and large models. He highlighted that the universality and comprehensive capabilities of large model technology offer a promising outlook for achieving general artificial intelligence.

Liu Jun, Chief AI Strategist of Inspur Information, introduced two innovative systems for the era of intelligent agents. He discussed the challenges faced by sustainable AI computing power development, such as scale, electricity, and investment. He proposed shifting from a scale-oriented approach to an efficiency-oriented one, rethinking and redesigning AI computing systems, and developing specialized AI computing architectures.

Dai Beijie, Vice President of the Beijing Zhongguancun Artificial Intelligence Research Institute, introduced the project-based talent development system established by Beijing Zhongguancun College to meet the demand for cultivating unconventional AI leaders. This initiative promotes the deep integration of AI with multiple disciplines, enabling bidirectional empowerment through the implementation of scientific research achievements and feedback from industrial needs to technological innovation, thereby injecting strong momentum into the development of new quality productive forces.

Hardware Innovation Targets Token Cost Bottleneck

Reporters from Cailian Press learned on-site that at the conference, high token costs have become a core pain point for many companies in scaling AI applications.

“Our platform handles massive volumes of customer service, recommendation, and risk control scenarios every day that require calling large models, and token costs are like the ‘Sword of Damocles’ hanging over our heads.” At the conference, a technical director from the AI platform department of an e-commerce company told Cailian Press reporters, adding that he had come specifically to find cost-reduction solutions.

“As intelligent agent applications expand further, the token consumption per interaction session is surging rapidly. The current cost structure has caused many valuable innovative applications to hit a roadblock due to ‘economic feasibility’ even before reaching scale, posing significant challenges to profitability,” the aforementioned director admitted. This sentiment has become one of the most common voices heard by Cailian Press reporters at this year’s Artificial Intelligence Conference.

Guo Tao, Deputy Director of the China E-Commerce Expert Service Center, stated in an interview with Cailian Press that the AI industry is transitioning from ‘model competition’ to ‘application implementation.’ Inference costs and interaction speed have now become more critical competitive dimensions than model parameter size. The effectiveness of infrastructure in terms of ‘speed enhancement and cost reduction’ will directly determine the depth and breadth of ‘AI+’ penetration across vertical industries.

At the conference, a representative from a publicly listed company also told reporters that as Scaling Law continues to drive advancements in model capabilities, open-source models represented by DeepSeek have significantly lowered the threshold for innovation, accelerating the industrialization of intelligent agents. The three core elements of intelligent agent industrialization are capability, speed, and cost. Among these, model capability determines the upper limit of application potential, interaction speed defines commercial value, and token cost dictates profitability.

This pain point has resonated widely at the conference. In response to the industry’s universal demand, computing infrastructure providers are attempting to seek breakthroughs at the hardware level.

On the hardware front, Inspur Information unveiled its YuanNao HC1000 ultra-scalable AI server at the conference. Based on a newly developed fully symmetric DirectCom ultra-speed architecture, the lossless, ultra-scalable design aggregates a vast array of domestic AI chips and supports exceptionally high inference throughput. For the first time, the cost of inference was reduced to below RMB 1 per million tokens, providing an innovative computing power system with ultimate performance to overcome the token cost bottleneck for intelligent agents.

From a technical perspective, Liu Jun told Cailian Press that the YuanNao HC1000 achieves comprehensive optimization for cost reduction and hardware-software synergy through innovations such as a 16-card compute module design and balanced single-card designs integrating ‘compute-memory-interconnect.’ These improvements significantly reduce both individual card costs and system overhead per card. Additionally, the fully symmetric system topology supports ultra-large-scale lossless expansion. According to calculations, the YuanNao HC1000 achieves a 1.75x improvement in inference performance compared to traditional RoCE solutions, with single-card model computational efficiency increasing by up to 5.7 times via deep compute-network synergy and full-domain lossless technologies.

The exponential surge in inference computing demands brought by intelligent agents in the future is undoubtedly acknowledged by the industry.

Inspur Information revealed to reporters that it will continuously promote innovation and breakthroughs in AI computing architecture through software-hardware co-design and deep optimization. The company is committed to accelerating token generation while reducing costs, actively fostering the deep integration of artificial intelligence technologies, such as large models and intelligent agents, with the real economy. This effort aims to make artificial intelligence a driving force for productivity and innovation across various industries.

Token costs emerge as a bottleneck for scaling AI applications, with the industry seeking breakthroughs in computing power.

Tags: