{"id":226036,"date":"2026-01-07T21:23:12","date_gmt":"2026-01-07T21:23:12","guid":{"rendered":"https:\/\/www.newsbeep.com\/il\/226036\/"},"modified":"2026-01-07T21:23:12","modified_gmt":"2026-01-07T21:23:12","slug":"nvidia-ceo-jensen-huang-explains-why-sram-isnt-here-to-eat-hbms-lunch-high-bandwidth-memory-offers-more-flexibility-in-ai-deployments-across-a-range-of-workloads","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/il\/226036\/","title":{"rendered":"Nvidia CEO Jensen Huang explains why SRAM isn&#8217;t here to eat HBM&#8217;s lunch \u2014 high bandwidth memory offers more flexibility in AI deployments across a range of workloads"},"content":{"rendered":"<p>Tom&#8217;s Hardware Premium Roadmaps<\/p>\n<p class=\"vanilla-image-block\" style=\"padding-top:56.25%;\">\n<p><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/il\/wp-content\/uploads\/2025\/12\/JY32VXJVXoHUR8NRV2Kveb.png\" alt=\"a snippet from the HBM roadmap article\"   loading=\"lazy\" data-new-v2-image=\"true\" data-original-mos=\"https:\/\/www.newsbeep.com\/il\/wp-content\/uploads\/2025\/12\/JY32VXJVXoHUR8NRV2Kveb.png\" data-pin-media=\"https:\/\/www.newsbeep.com\/il\/wp-content\/uploads\/2025\/12\/JY32VXJVXoHUR8NRV2Kveb.png\" class=\"pinterest-pin-exclude pinterest-pin-exclude pinterest-pin-exclude\"\/>\n<\/p>\n<p>(Image credit: Future)<\/p>\n<p id=\"ab7f5e46-002b-4baf-8bfb-82dc79c8b51a\">During a <a data-analytics-id=\"inline-link\" href=\"https:\/\/www.tomshardware.com\/tag\/ces\" data-auto-tag-linker=\"true\" data-mrf-recirculation=\"inline-link\" data-before-rewrite-localise=\"https:\/\/www.tomshardware.com\/tag\/ces\" rel=\"nofollow noopener\" target=\"_blank\">CES<\/a> 2026 Q&amp;A in Las Vegas, Nvidia CEO Jensen Huang was thrown a bit of a curveball. With <a data-analytics-id=\"inline-link\" href=\"https:\/\/www.tomshardware.com\/tech-industry\/semiconductors\/nvidia-confirms-20-billion-groq-deal-to-bolster-ai-inference-dominance\" data-mrf-recirculation=\"inline-link\" data-before-rewrite-localise=\"https:\/\/www.tomshardware.com\/tech-industry\/semiconductors\/nvidia-confirms-20-billion-groq-deal-to-bolster-ai-inference-dominance\" rel=\"nofollow noopener\" target=\"_blank\">SRAM-heavy accelerators<\/a>, cheaper memory, and open weight AI models gaining traction, could Nvidia eventually ease its dependence on expensive HBM and the margins that come along with it?<\/p>\n<p class=\"paywall\" aria-hidden=\"true\">During an Analyst Q&amp;A at CES 2026, captured by<a data-analytics-id=\"inline-link\" href=\"https:\/\/x.com\/rwang07\/status\/2008346238094094495?s=20\" target=\"_blank\" data-url=\"https:\/\/x.com\/rwang07\/status\/2008346238094094495?s=20\" referrerpolicy=\"no-referrer-when-downgrade\" data-hl-processed=\"none\" data-mrf-recirculation=\"inline-link\" rel=\"nofollow\"> SemiAnalysis \/ Ray Wang<\/a>, what Huang responded with was <a data-analytics-id=\"inline-link\" href=\"https:\/\/www.tomshardware.com\/tech-industry\/samsung-earns-nvidias-certification-for-its-hbm3-memory-stock-jumps-5-percent-as-company-finally-catches-up-to-sk-hynix-and-micron-in-hbm3e-production\" data-mrf-recirculation=\"inline-link\" data-before-rewrite-localise=\"https:\/\/www.tomshardware.com\/tech-industry\/samsung-earns-nvidias-certification-for-its-hbm3-memory-stock-jumps-5-percent-as-company-finally-catches-up-to-sk-hynix-and-micron-in-hbm3e-production\" rel=\"nofollow noopener\" target=\"_blank\">not a roadmap away from HBM<\/a>, nor an endorsement of leaner, cheaper inference hardware. Instead, he laid out his view of AI workloads as inherently unstable, constantly reshaped by new model architectures, new modalities, and new deployment patterns. Against that backdrop, he suggested, efficiency gains achieved by tuning hardware for a single problem tend to be short-lived.<\/p>\n<p><a id=\"elk-3c5737c6-d88e-4d67-9b3e-147bfed6c3f4\" class=\"paywall\" aria-hidden=\"true\" data-url=\"\" href=\"\" target=\"_blank\" referrerpolicy=\"no-referrer-when-downgrade\" data-hl-processed=\"none\"\/>Why SRAM looks attractive<a id=\"elk-seasonal\" class=\"paywall\" aria-hidden=\"true\" data-url=\"\" href=\"\" target=\"_blank\" referrerpolicy=\"no-referrer-when-downgrade\" data-hl-processed=\"none\"\/><\/p>\n<p id=\"a45d875b-92a2-4e1f-9864-f518b8e1eb71-0\">Let\u2019s take a step back for a moment and consider what Huang is getting at here. The industry, by and large, is actively searching for ways to make AI cheaper. SRAM accelerators, GDDR inference, and <a data-analytics-id=\"inline-link\" href=\"https:\/\/www.tomshardware.com\/tech-industry\/artificial-intelligence\/openai-intros-two-lightweight-open-model-language-models-that-can-run-on-consumer-gpus-optimized-to-run-on-devices-with-just-16gb-of-memory\" data-mrf-recirculation=\"inline-link\" data-before-rewrite-localise=\"https:\/\/www.tomshardware.com\/tech-industry\/artificial-intelligence\/openai-intros-two-lightweight-open-model-language-models-that-can-run-on-consumer-gpus-optimized-to-run-on-devices-with-just-16gb-of-memory\" rel=\"nofollow noopener\" target=\"_blank\">open weight models<\/a> are all being pitched as pressure valves on Nvidia\u2019s most expensive components, and Huang\u2019s remarks are a reminder that while these ideas work in isolation, they collide with reality once they\u2019re exposed to production-scale AI systems.<\/p>\n<p>You may like<\/p>\n<p>Huang did not dispute the performance advantages of SRAM-centric designs. In fact, he was explicit about their speed. &#8220;For some workloads, it could be insanely fast,&#8221; he said, noting that SRAM access avoids the latency penalties of even the fastest external memory. &#8220;SRAM\u2019s a lot faster than going off to even HBM memories.&#8221;<\/p>\n<p>This is why SRAM-heavy accelerators look so compelling in <a data-analytics-id=\"inline-link\" href=\"https:\/\/www.tomshardware.com\/tag\/benchmark\" data-auto-tag-linker=\"true\" data-mrf-recirculation=\"inline-link\" data-before-rewrite-localise=\"https:\/\/www.tomshardware.com\/tag\/benchmark\" rel=\"nofollow noopener\" target=\"_blank\">benchmarks<\/a> and controlled demos. Designs that favor on-chip SRAM can deliver high throughput in constrained scenarios, but they run up against capacity limits in production AI workloads because SRAM cannot match the bandwidth-density balance provided by HBM, which is why most modern AI accelerators continue to pair compute with high-bandwidth DRAM packages.<\/p>\n<p>But, Huang repeatedly returned to scale and variation as the breaking point. SRAM capacity simply does not grow fast enough to accommodate modern models once they leave the lab. Even within a single deployment, models can exceed on-chip memory as they add context length, routing logic, or additional modalities.<\/p>\n<p>The moment a model spills beyond SRAM, the efficiency advantage collapses. At that point, the system either stalls or requires external memory, at which point the specialized design loses its edge. Huang\u2019s argument was grounded in how production AI systems evolve after deployment. &#8220;If I keep everything on SRAM, then of course I don\u2019t need HBM memory,&#8221; he said, adding that &#8220;&#8230;the problem is the size of my model that I can keep inside these SRAMs is like 100 times smaller.&#8221;<\/p>\n<p class=\"vanilla-image-block\" style=\"padding-top:56.25%;\">\n<p><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/il\/wp-content\/uploads\/2026\/01\/vfLy5PcKHn93qzbpMvFh34.jpg\" alt=\"Jensen Huang\"   loading=\"lazy\" data-new-v2-image=\"true\" data-original-mos=\"https:\/\/www.newsbeep.com\/il\/wp-content\/uploads\/2026\/01\/vfLy5PcKHn93qzbpMvFh34.jpg\" data-pin-media=\"https:\/\/www.newsbeep.com\/il\/wp-content\/uploads\/2026\/01\/vfLy5PcKHn93qzbpMvFh34.jpg\" class=\"inline\"\/>\n<\/p>\n<p>(Image credit: Tom&#8217;s Hardware)<a id=\"elk-455b0b85-6b96-424b-adc9-b1f33923b662\" class=\"paywall\" aria-hidden=\"true\" data-url=\"\" href=\"\" target=\"_blank\" referrerpolicy=\"no-referrer-when-downgrade\" data-hl-processed=\"none\"\/>Workloads that refuse to stay still<\/p>\n<p id=\"a5d8efa5-be7e-44ab-a558-ec8a75ca1bb4\">Some of Huang\u2019s more revealing comments came when he described how modern AI workloads behave in the wild. &#8220;Workloads are changing shape all the time,&#8221; he said. &#8220;Sometimes you have MOEs. (Mixture of Experts). Sometimes you have multimodality stuff. Sometimes you\u2019ve got diffusion models. Sometimes you have autoregressive models. Sometimes you have SSMs. (Sequential Server Management)&#8221;<\/p>\n<p>Each of those architectures stresses hardware differently. Some are memory-bound, while others push interconnect bandwidth. Some demand low latency, while others tolerate batching and delay. &#8220;These models are all slightly different in shape and size,&#8221; Huang summed it up bluntly. More importantly, those pressures shift dynamically. &#8220;Sometimes they move the pressure on the NVLink. Sometimes they move the pressure on HBM memory. Sometimes they move the pressure on all three,&#8221; he said.<\/p>\n<p>This is the core argument supporting Nvidia\u2019s case for flexibility. A platform optimized narrowly for one memory pattern or execution model risks leaving expensive silicon idle when the workload changes. In shared data centers, where utilization across weeks and months determines <a data-analytics-id=\"inline-link\" href=\"https:\/\/www.tomshardware.com\/pc-components\/storage\/perfect-storm-of-demand-and-supply-driving-up-storage-costs\" data-mrf-recirculation=\"inline-link\" data-before-rewrite-localise=\"https:\/\/www.tomshardware.com\/pc-components\/storage\/perfect-storm-of-demand-and-supply-driving-up-storage-costs\" rel=\"nofollow noopener\" target=\"_blank\">whether it\u2019s economically viable<\/a>, that is a serious liability.<\/p>\n<p>You may like<\/p>\n<p>&#8220;You might be able to take one particular workload and push it to the extreme,&#8221; Huang said. &#8220;But that 10% of the workload, or even 5% of the workload, if it\u2019s not being used, then all of a sudden that part of the data center could have been used for something else.&#8221; In other words, Huang is arguing that peak efficiency on a single task matters less than consistent usefulness across many.<\/p>\n<p><a id=\"elk-e9e0463b-2d10-401f-9b07-9c641013d4ff\" class=\"paywall\" aria-hidden=\"true\" data-url=\"\" href=\"\" target=\"_blank\" referrerpolicy=\"no-referrer-when-downgrade\" data-hl-processed=\"none\"\/>Open models still run into memory limits<\/p>\n<p class=\"vanilla-image-block\" style=\"padding-top:56.25%;\">\n<p><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/il\/wp-content\/uploads\/2026\/01\/hKZBTZM3y3iii7QUCZpjhD.jpg\" alt=\"Nvidia Keynote\"   loading=\"lazy\" data-new-v2-image=\"true\" data-original-mos=\"https:\/\/www.newsbeep.com\/il\/wp-content\/uploads\/2026\/01\/hKZBTZM3y3iii7QUCZpjhD.jpg\" data-pin-media=\"https:\/\/www.newsbeep.com\/il\/wp-content\/uploads\/2026\/01\/hKZBTZM3y3iii7QUCZpjhD.jpg\" class=\"inline\"\/>\n<\/p>\n<p>(Image credit: Tom&#8217;s Hardware)<\/p>\n<p id=\"cc103d21-5505-4e37-99aa-c5c7bee438ed\">The original question also touched on open AI models and whether they might reduce <a data-analytics-id=\"inline-link\" href=\"https:\/\/www.tomshardware.com\/tech-industry\/nvidia-skips-new-gpus-at-ces-2026-as-its-roadmap-shifts-toward-rack-scale-ai-systems\" data-mrf-recirculation=\"inline-link\" data-before-rewrite-localise=\"https:\/\/www.tomshardware.com\/tech-industry\/nvidia-skips-new-gpus-at-ces-2026-as-its-roadmap-shifts-toward-rack-scale-ai-systems\" rel=\"nofollow noopener\" target=\"_blank\">Nvidia\u2019s leverage over the AI stack<\/a>. The suggestion was that open models, combined with SRAM-heavy designs and cheaper memory, could reduce reliance on Nvidia\u2019s most expensive GPUs and improve margins across the stack.<\/p>\n<p>While Huang has praised open models publicly and Nvidia has released its own open weights and datasets, his CES remarks made clear that openness does not eliminate infrastructure constraints. Training and serving competitive models still require enormous compute and memory resources, regardless of licensing. Open weights do not eliminate the need for large memory pools, fast interconnects, or flexible execution engines; they just change who owns the model.<\/p>\n<p>This is important because many open models are evolving rapidly and, as they incorporate larger context windows, more experts, and multimodal inputs, their memory footprints will grow. Huang\u2019s emphasis on flexibility applies here as well; supporting open models at scale does not reduce the importance of HBM or general-purpose GPUs. In many cases, it increases it.<\/p>\n<p>The implication is that open source AI and alternative memory strategies are not existential threats to Nvidia\u2019s platform. They are additional variables that increase workload diversity. That diversity, in Nvidia\u2019s view, strengthens the case for hardware that can adapt rather than specialize.<\/p>\n<p><a id=\"elk-48de0c70-364e-4a08-a356-edcc59530067\" class=\"paywall\" aria-hidden=\"true\" data-url=\"\" href=\"\" target=\"_blank\" referrerpolicy=\"no-referrer-when-downgrade\" data-hl-processed=\"none\"\/>Why Nvidia keeps choosing HBM<\/p>\n<p class=\"vanilla-image-block\" style=\"padding-top:56.25%;\">\n<p><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/il\/wp-content\/uploads\/2026\/01\/WPsDAmkaFLUsYpETvNW3n6.jpg\" alt=\"SK hynix HBM4 s'mores\"   loading=\"lazy\" data-new-v2-image=\"true\" data-original-mos=\"https:\/\/www.newsbeep.com\/il\/wp-content\/uploads\/2026\/01\/WPsDAmkaFLUsYpETvNW3n6.jpg\" data-pin-media=\"https:\/\/www.newsbeep.com\/il\/wp-content\/uploads\/2026\/01\/WPsDAmkaFLUsYpETvNW3n6.jpg\" class=\"inline\"\/>\n<\/p>\n<p>(Image credit: SK hynix)<\/p>\n<p id=\"646e2ffd-0239-4a46-baca-720170d68610\">Ultimately, Huang\u2019s CES comments amount to a clear statement of priorities. Nvidia is willing to accept higher bill of materials costs, <a data-analytics-id=\"inline-link\" href=\"https:\/\/www.tomshardware.com\/tech-industry\/chip-scarcity-assaults-auto-industry-amid-the-worsening-nexperia-and-dram-crisis\" data-mrf-recirculation=\"inline-link\" data-before-rewrite-localise=\"https:\/\/www.tomshardware.com\/tech-industry\/chip-scarcity-assaults-auto-industry-amid-the-worsening-nexperia-and-dram-crisis\" rel=\"nofollow noopener\" target=\"_blank\">reliance on scarce HBM<\/a>, and complex system designs because they preserve optionality. That optionality protects customers from being locked into a narrow performance envelope and protects Nvidia from sudden shifts in model architecture that could devalue a more rigid accelerator lineup.<\/p>\n<p>This stance also helps explain why Nvidia is less aggressive than some rivals in pushing single-purpose inference chips or extreme SRAM-heavy designs. Those approaches can win benchmarks and attract attention, but they assume a level of workload predictability that the current AI ecosystem no longer offers.<\/p>\n<p>Huang\u2019s argument is not that specialized hardware has no place. Rather, it is that in shared data centers, flexibility remains the dominant economic factor. As long as AI research continues to explore new architectures and hybrid pipelines, that logic is unlikely to change.<\/p>\n<p>For now, Huang seems confident that customers will continue to pay for that flexibility, even as they complain about the cost of HBM and the <a data-analytics-id=\"inline-link\" href=\"https:\/\/www.tomshardware.com\/news\/lowest-gpu-prices\" data-mrf-recirculation=\"inline-link\" data-before-rewrite-localise=\"https:\/\/www.tomshardware.com\/news\/lowest-gpu-prices\" rel=\"nofollow noopener\" target=\"_blank\">price of GPUs<\/a>. His remarks suggest the company sees no contradiction there. That view may be challenged if AI models stabilize or fragment into predictable tiers, but, right now, Huang made it clear that Nvidia does not believe that moment has arrived yet.<\/p>\n","protected":false},"excerpt":{"rendered":"Tom&#8217;s Hardware Premium Roadmaps (Image credit: Future) During a CES 2026 Q&amp;A in Las Vegas, Nvidia CEO Jensen&hellip;\n","protected":false},"author":2,"featured_media":226037,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[345,343,344,85,46,125],"class_list":{"0":"post-226036","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-il","12":"tag-israel","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/posts\/226036","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/comments?post=226036"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/posts\/226036\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/media\/226037"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/media?parent=226036"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/categories?post=226036"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/tags?post=226036"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}