{"id":342921,"date":"2026-03-17T00:40:11","date_gmt":"2026-03-17T00:40:11","guid":{"rendered":"https:\/\/www.newsbeep.com\/il\/342921\/"},"modified":"2026-03-17T00:40:11","modified_gmt":"2026-03-17T00:40:11","slug":"nvidia-enters-production-with-dynamo-the-broadly-adopted-inference-operating-system-for-ai-factories","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/il\/342921\/","title":{"rendered":"NVIDIA Enters Production With Dynamo, the Broadly Adopted Inference Operating System for AI Factories"},"content":{"rendered":"<p align=\"left\">News Summary:<\/p>\n<p>&#13;<br \/>\n\tNVIDIA Dynamo 1.0 provides a production-grade, open source foundation for inference at scale.&#13;<br \/>\n\tDynamo and NVIDIA TensorRT-LLM optimizations integrate natively into open source frameworks such as LangChain, llm-d, LMCache, SGLang and vLLM to boost inference performance.&#13;<br \/>\n\tDynamo boosts inference performance of NVIDIA Blackwell GPUs by up to 7x, lowering token cost and increasing revenue opportunity for millions of GPUs with free, open source software.&#13;<br \/>\n\tNVIDIA inference platform integrated by cloud service providers, Amazon Web Services (AWS), Microsoft Azure, Google Cloud and Oracle Cloud Infrastructure (OCI), along with NVIDIA cloud partners Alibaba Cloud, CoreWeave, Together AI and Nebius \u2014 and adopted by AI-native companies Cursor and Perplexity; inference endpoint providers Baseten, Deep Infra and Fireworks; and global enterprises ByteDance, Meituan, PayPal and Pinterest.&#13;<\/p>\n<p align=\"left\">GTC\u2014NVIDIA today announced NVIDIA Dynamo 1.0, open source software for generative and agentic inference at scale, with widespread global adoption. Together with the NVIDIA Blackwell platform, Dynamo 1.0 enables cloud providers, AI innovators and global enterprises to deliver high-performance AI inference with unmatched scale, efficiency and speed.<\/p>\n<p align=\"left\">As agentic AI systems move into production across industries, scaling inference within a data center has become a complex challenge of resource orchestration, with requests of varying sizes and modalities, as well as performance objectives, arriving in unpredictable bursts.<\/p>\n<p align=\"left\">Just as a computer\u2019s operating system coordinates hardware and applications, Dynamo 1.0 functions as the distributed \u201coperating system\u201d of AI factories, seamlessly orchestrating GPU and memory resources across the cluster to power complex AI workloads. In recent industry benchmarks, Dynamo boosted the inference performance of NVIDIA Blackwell GPUs by up to 7x, lowering token cost and increasing revenue opportunity for millions of GPUs with free, open source software.<\/p>\n<p align=\"left\">\u201cInference is the engine of intelligence, powering every query, every agent and every application,\u201d said Jensen Huang, founder and CEO of NVIDIA. \u201cWith NVIDIA Dynamo, we\u2019ve created the first-ever \u2018operating system\u2019 for AI factories. The rapid adoption across our ecosystem shows this next wave of agentic AI is here, and NVIDIA is powering it at global scale.\u201d<\/p>\n<p align=\"left\">Dynamo 1.0 splits inference work across GPUs by adding smarter \u201ctraffic control\u201d and the ability to move data between GPUs and lower-cost storage, reducing wasted work and easing memory limits. For agentic AI and long prompts, it can route requests to GPUs that already have the most relevant \u201cshort-term memory\u201d from earlier steps, then offload that memory when it is not needed.<\/p>\n<p align=\"left\">NVIDIA Inference Platform Gains Momentum<br \/>&#13;<br \/>\nNVIDIA is accelerating the open source ecosystem by integrating Dynamo and NVIDIA TensorRT\u2122-LLM library optimizations into popular frameworks from providers such as LangChain, llm-d, LMCache, SGLang, vLLM and more. Core Dynamo building blocks like KVBM for smarter memory management, NVIDIA NIXL for fast GPU-to-GPU data movement and NVIDIA Grove for simplified scaling are also available as standalone modules. NVIDIA also contributes TensorRT-LLM CUDA\u00ae kernels to the FlashInfer project so they can be natively integrated into open source frameworks.<\/p>\n<p align=\"left\">The NVIDIA inference platform is supported across the AI ecosystem, including:<\/p>\n<p>&#13;<br \/>\n\tCloud Service Providers: <a href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/accelerate-generative-ai-inference-with-nvidia-dynamo-and-amazon-eks\/\" rel=\"nofollow noopener\" target=\"_blank\" title=\"\">Amazon Web Services<\/a> (AWS), <a href=\"https:\/\/blog.aks.azure.com\/2025\/10\/24\/dynamo-on-aks\" rel=\"nofollow noopener\" target=\"_blank\" title=\"\">Microsoft Azure<\/a>, <a href=\"https:\/\/cloud.google.com\/blog\/products\/compute\/scaling-moe-inference-with-nvidia-dynamo-on-google-cloud-a4x\" rel=\"nofollow noopener\" target=\"_blank\" title=\"\">Google Cloud<\/a>, <a href=\"https:\/\/blogs.nvidia.com\/blog\/think-smart-dynamo-ai-inference-data-center\/\" rel=\"nofollow noopener\" target=\"_blank\" title=\"\">OCI<\/a>&#13;<br \/>\n\tNVIDIA Cloud Partners: <a href=\"https:\/\/www.alibabacloud.com\/help\/en\/ack\/cloud-native-ai-suite\/user-guide\/deploy-dynamo-pd-separated-inference-services?spm=a2c63.p38356.0.i0\" rel=\"nofollow noopener\" target=\"_blank\" title=\"\">Alibaba Cloud<\/a>, CoreWeave, Crusoe, DigitalOcean, Gcore, GMI Cloud, Lightning AI, Nebius, Nscale, Together AI, Vultr&#13;<br \/>\n\tAI-Native Companies: Cursor, Hebbia, Perplexity&#13;<br \/>\n\tInference Endpoint Providers: Baseten, Deep Infra, Fireworks&#13;<br \/>\n\tGlobal Enterprises:\u00a0AstraZeneca, BlackRock, ByteDance, Coupang, Instacart, Meituan, PayPal, Pinterest, Shopee, SoftBank Corp.&#13;<\/p>\n<p align=\"left\">Chen Goldberg, executive vice president of product and engineering at CoreWeave, said: \u201cAs AI moves from experimental pilots to continuous, large-scale production, the underlying infrastructure must be as dynamic as the models it supports. Supporting NVIDIA Dynamo allows us to offer a more seamless, resilient environment for deploying complex AI agents. This foundation provides the durability and high-performance orchestration required to move the industry\u2019s most ambitious agentic workloads into global production.\u201d<\/p>\n<p align=\"left\">Danila Shtan, chief technology officer of Nebius, said: \u201cDelivering reliable AI inference at scale isn\u2019t just about powerful GPUs, it\u2019s about the software that turns that performance into real customer outcomes. We value how NVIDIA\u2019s software stack, from Dynamo to TensorRT-LLM, brings deep optimization, predictable performance and faster time to deployment, helping us offer customers a simpler, higher-performance path to production AI.\u201d<\/p>\n<p align=\"left\">Matt Madrigal, chief technology officer of Pinterest, said: \u201cDelivering an intuitive, multimodal AI experience to hundreds of millions of users requires real-time intelligence at global scale. As a significant adopter in open source, we\u2019re committed to building scalable AI technologies. With NVIDIA Dynamo optimizing our deployment, we\u2019re expanding the seamless and personalized experiences we deliver, powered by high-performance AI infrastructure.\u201d<\/p>\n<p align=\"left\">Vipul Ved Prakash, cofounder and CEO of Together AI, said: \u201cAI natives require inference that can reliably and efficiently scale with their application. NVIDIA Dynamo 1.0, combined with cutting-edge inference research from Together AI, helps us deliver a high-performance stack to offer accelerated, cost-effective inference for large-scale production workloads.\u201d<\/p>\n<p align=\"left\">Dynamo 1.0 is available today to developers worldwide. To learn more and get started, read the <a href=\"https:\/\/developer.nvidia.com\/blog\/nvidia-dynamo-1-production-ready\/\" rel=\"nofollow noopener\" target=\"_blank\" title=\"\">blog<\/a> and visit the <a href=\"https:\/\/developer.nvidia.com\/dynamo\" rel=\"nofollow noopener\" target=\"_blank\" title=\"\">Dynamo webpage<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"News Summary: &#13; NVIDIA Dynamo 1.0 provides a production-grade, open source foundation for inference at scale.&#13; Dynamo and&hellip;\n","protected":false},"author":2,"featured_media":342922,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[345,343,344,85,46,125],"class_list":{"0":"post-342921","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-il","12":"tag-israel","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/posts\/342921","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/comments?post=342921"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/posts\/342921\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/media\/342922"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/media?parent=342921"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/categories?post=342921"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/tags?post=342921"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}