①Last night, World Labs announced the RTFM real-time framework model, capable of rendering persistent and consistent 3D worlds; ②World Labs noted that larger models targeting higher inference budgets are expected to continue improving. ③On September 16, World Labs released the world generation model Marble, which achieves higher-quality geometric structures.
The STAR Market Daily reported on October 17 that last night, “Godmother of AI” Fei-Fei Li retweeted a post unveiling a new world model, RTFM (Real-Time Frame Model), which can generate interactive 3D worlds in real time.
It is reported that this model was developed by World Labs, an artificial intelligence company founded by Fei-Fei Li. According to the team, its design revolves around three key principles: efficiency, scalability, and persistence, enabling the world model RTFM to render persistent and consistent 3D worlds running solely on a single H100 GPU.
Why such a design? World Labs pointed out that with the development of world model technology, computational power demands will rise significantly, far exceeding the current computational needs of large language models (LLMs). To achieve 4K+60FPS interactive video streaming, traditional video architectures would need to generate over 100,000 tokens per second, equivalent to the text volume of *Frankenstein*. World Labs asserted, “Based on current computational infrastructure, generating tokens at such frequencies is economically unfeasible.”
Therefore, World Labs believes that simple approaches benefiting from exponentially declining computing costs will gain an advantage in the market amid the aforementioned AI development trends.
As AI applications proliferate, reducing computational costs has not only become a priority for hardware manufacturers but also an inevitable trend in the competitive landscape. On October 13, OpenAI and Broadcom announced a strategic partnership to deploy 10 gigawatts of AI accelerators designed by OpenAI. Galaxy Securities noted that following this collaboration, OpenAI now has a diversified computational system supplied by NVIDIA, AMD, and Broadcom, breaking reliance on a single provider while driving down computational costs through competition.
However, overall, the expectation of future computational demand growth remains unshaken. In their blog, World Labs stated that although RTFM’s current goal is merely to perform real-time inference on a single H100 GPU, larger models targeting higher inference budgets are expected to continue improving.
In fact, the manifestation of “Jevons Paradox” has always been inherent in the development of the AI industry — as advancements in AI models improve computational efficiency, total computational resource consumption increases instead. For example, DeepSeek R1, released earlier this year, achieved strong AI performance despite limited hardware resources, but industry insiders, including Jensen Huang, have stated that its demand for computational resources will only grow and not diminish.
Returning to world models themselves, on September 16 this year, World Labs released the world generation model Marble, which can generate 3D worlds through a single image or textual prompt. Compared to the previous model released at the end of last year, it already achieves higher-quality geometric structures and more diverse styles. Previously, Fei-Fei Li stated that the significance of world models lies in their ability not only to understand and reason with textual information but also to comprehend and reason about the operational laws of the physical world.
Currently, companies ranging from artificial intelligence developers to terminal manufacturers have begun to invest in world models. It is reported that xAI recruited several experts from NVIDIA this summer, joining competitors like Meta and Google in intensifying efforts on world models. Domestically, robotics companies including Unitree and Zhiyuan have also open-sourced their world models.
Dongwu Securities stated that when computing power becomes cheaper and more accessible, developers will adopt more complex models and systems as new benchmarks, increasing parameters, context length, and parallelism. Iterations in model architecture may reduce the computing power required for a single inference or training session, but world models such as Genie3, which generate videos, may require orders-of-magnitude improvements in computing power to meet demands. A higher ceiling for AI computing power and a better competitive landscape will support a higher valuation framework and stronger beta for AI computing power relative to 4G/5G.