{"id":425591,"date":"2026-01-22T10:06:35","date_gmt":"2026-01-22T10:06:35","guid":{"rendered":"https:\/\/www.newsbeep.com\/ca\/425591\/"},"modified":"2026-01-22T10:06:35","modified_gmt":"2026-01-22T10:06:35","slug":"advancing-ai-for-the-physical-world","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/ca\/425591\/","title":{"rendered":"Advancing AI for the physical world"},"content":{"rendered":"<p>For decades, robots have excelled in structured settings like assembly lines, where tasks are predictable and tightly scripted.<\/p>\n<p>\u201cThe emergence of vision-language-action (VLA) models for physical systems is enabling systems to perceive, reason, and act with increasing autonomy alongside humans in environments that are far less structured.\u201d<\/p>\n<p>\u2013 Ashley Llorens, Corporate Vice President and Managing Director, Microsoft Research Accelerator<\/p>\n<p>Physical AI, where agentic AI meets physical systems, is poised to redefine robotics in the same way that generative models have transformed language and vision processing.<\/p>\n<p>Today, we are announcing Rho-alpha (\u03c1\u03b1), our first robotics model derived from <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/azure.microsoft.com\/en-us\/products\/phi\/?msockid=2cc5f4526b166732274ce7816a3b66c8\" target=\"_blank\" rel=\"noopener noreferrer nofollow\">Microsoft\u2019s Phi series (opens in new tab)<\/a> of vision-language models.<\/p>\n<p>We invite organizations interested in evaluating Rho-alpha for their robots and use cases to express interest in the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/forms.office.com\/pages\/responsepage.aspx?id=v4j5cvGGr0GRqy180BHbR_oe6q893mZLsFcgjoh0ZH1UNUhTTURDTDU1RzRDREFHNFVCSTA0STQ4Vy4u&amp;route=shorturl\" target=\"_blank\" rel=\"noopener noreferrer nofollow\">Rho-alpha Research Early Access Program (opens in new tab)<\/a>. Rho-alpha will also be made available via Microsoft Foundry at a later date.<\/p>\n<p>Rho-alpha translates natural language commands into control signals for robotic systems performing bimanual manipulation tasks. It can be described as a VLA+ model in that it expands the set of perceptual and learning modalities beyond those typically used by VLAs. For perception, Rho-alpha adds tactile sensing, with efforts underway to accommodate modalities such as force. For learning, we are working toward enabling Rho-alpha to continually improve during deployment by learning from feedback provided by people.<\/p>\n<p>Through these advancements, we aim to make physical systems more easily adaptable, viewing adaptability as a hallmark of intelligence. We believe robots that can adapt more easily to dynamic situations and to human preferences will be more useful in the environments in which we live and work and more trusted by the people who deploy and operate them.<\/p>\n<p>\nPrompt: \u201cPush the green button with the right gripper\u201d\n<\/p>\n<p>\nPrompt: \u201cPull out the red wire\u201d\n<\/p>\n<p>\nPrompt: \u201cFlip the top switch on\u201d\n<\/p>\n<p>\nPrompt: \u201cTurn the knob to position 5\u201d\n<\/p>\n<p>\nPrompt: \u201cRotate the BusyBox clockwise\u201d\n<\/p>\n<p>\nPrompt: \u201cMove the top slider to position 2\u201d\n<\/p>\n<p class=\"has-text-align-center\">The footage\u00a0above\u00a0demonstrates\u00a0Rho-alpha interacting with the\u00a0BusyBox, a physical interaction benchmark recently introduced by Microsoft Research, cued by natural language instructions. (The videos show the robot operation at real-time speed.)<\/p>\n<p>Our team is working toward end-to-end optimizations of Rho-alpha\u2019s training pipeline and training data corpus for performance and efficiency on bimanual manipulation tasks of interest to Microsoft and our partners. The model is currently under evaluation on dual-arm setups and humanoid robots. We will publish a technical description in the coming months.<\/p>\n<p>Rho-alpha achieves tactile-aware behaviors infused with vision-language understanding through a process of co-training on trajectories from physical demonstrations and simulated tasks, together with web-scale visual question answering data. We plan to use the same blueprint to continue extending the model to additional sensing modalities across a variety of real-world tasks.<\/p>\n<p>\u201cWhile generating training data by teleoperating robotic systems has become a standard practice, there are many settings where teleoperation is impractical or impossible. We are working with Microsoft Research to enrich pre-training datasets collected from physical robots with diverse synthetic demonstrations using a combination of simulation and reinforcement learning.\u201d<\/p>\n<p>\u2013 Professor Abhishek Gupta, Assistant Professor, University of Washington<\/p>\n<p>Simulation plays a key role in our approach to overcome the general lack of pretraining-scale robotics data, especially data containing tactile feedback and other less common sensing modalities. Our training pipeline generates synthetic data via a multistage process based on reinforcement learning using the open <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/developer.nvidia.com\/isaac\/sim\" target=\"_blank\" rel=\"noopener noreferrer nofollow\">NVIDIA Isaac Sim (opens in new tab)<\/a> framework. We combine these simulated trajectories with commercial and openly available physical demonstration datasets.<\/p>\n<p>\u201cTraining foundation models that can reason and act requires overcoming the scarcity of diverse, real-world data. By leveraging NVIDIA Isaac Sim on Azure to generate physically accurate synthetic datasets, Microsoft Research is accelerating the development of versatile models like Rho-alpha that can master complex manipulation tasks.\u201d<\/p>\n<p>\u2013 Deepu Talla, Vice President of Robotics and Edge AI, NVIDIA<\/p>\n<p>While extending perception capabilities can enable Rho-alpha to adjust a robot\u2019s course of action during operation, robots can still make mistakes that are hard for them to recover from. Human operators can set a robot back on track using intuitive teleoperation devices such as a 3D mouse. We are focused on tooling and model adaptation techniques to enable Rho-alpha to learn from corrective feedback during system operation.<\/p>\n<p>\nPrompt: \u201cPick up the power plug and insert it into the bottom socket of the square surge protector\u201d\n<\/p>\n<p>\nPrompt: \u201cPlace the tray into the toolbox and close the toolbox\u201d\n<\/p>\n<p>\nPrompt: \u201cTake the tray out of the toolbox and put it on the table\u201d\n<\/p>\n<p class=\"has-text-align-center\">The videos above show a tactile sensor-equipped dual-UR5e-arm setup controlled by Rho-alpha performing plug insertion and toolbox packing. In the plug insertion episode, the right arm has difficulty inserting the plug into the outlet and is helped by real-time human guidance. (The videos show the robot operation at real-time speed.)<\/p>\n<p>Robotics manufacturers, integrators, and end-users have unique insights into the use-cases and scenarios where emerging physical AI technologies offer transformative potential. To empower these stakeholders, we are working toward foundational technologies like Rho-alpha, along with associated tooling, that will enable them to train, deploy, and continuously adapt their own cloud-hosted physical AI using their own data for their own robots and scenarios.<\/p>\n<p>If you\u2019re interested in experimenting with and helping shape the future of our Physical AI foundations and tools, express your interest in our Research Early Access Program.<\/p>\n<p>\nHumanoid robots are among the evaluation platforms for Rho-alpha.<\/p>\n","protected":false},"excerpt":{"rendered":"For decades, robots have excelled in structured settings like assembly lines, where tasks are predictable and tightly scripted.&hellip;\n","protected":false},"author":2,"featured_media":425592,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[62,276,277,49,48,61],"class_list":{"0":"post-425591","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-ca","12":"tag-canada","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/posts\/425591","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/comments?post=425591"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/posts\/425591\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/media\/425592"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/media?parent=425591"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/categories?post=425591"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/tags?post=425591"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}