{"id":530125,"date":"2026-03-11T23:11:13","date_gmt":"2026-03-11T23:11:13","guid":{"rendered":"https:\/\/www.newsbeep.com\/ca\/530125\/"},"modified":"2026-03-11T23:11:13","modified_gmt":"2026-03-11T23:11:13","slug":"introducing-nemotron-3-super-an-open-hybrid-mamba-transformer-moe-for-agentic-reasoning","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/ca\/530125\/","title":{"rendered":"Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning"},"content":{"rendered":"<p>Agentic AI systems need models with the specialized depth to solve dense technical problems autonomously. They must excel at <a href=\"https:\/\/www.nvidia.com\/en-us\/glossary\/ai-reasoning\/\" data-wpel-link=\"internal\" target=\"_self\" rel=\"follow nofollow noopener\">reasoning<\/a>, coding, and long-context analysis, while remaining efficient enough to run continuously at scale.\u00a0<\/p>\n<p><a href=\"https:\/\/www.nvidia.com\/en-us\/glossary\/multi-agent-systems\/\" data-wpel-link=\"internal\" target=\"_self\" rel=\"follow nofollow noopener\">Multi-agent systems<\/a> generate up to 15x the <a href=\"https:\/\/blogs.nvidia.com\/blog\/ai-tokens-explained\/\" data-wpel-link=\"internal\" target=\"_self\" rel=\"follow nofollow noopener\">tokens<\/a> of standard chats, re-sending history, tool outputs, and reasoning steps at every turn. Over long tasks, this \u201ccontext explosion\u201d causes goal drift, where agents gradually lose alignment with the original objective. And using massive reasoning models for every sub-task\u2014the \u201cthinking tax\u201d\u2014makes multi-agent applications too expensive and sluggish for practical use.<\/p>\n<p>Today, we are releasing <a href=\"https:\/\/huggingface.co\/nvidia\/NVIDIA-Nemotron-3-Super-120B-A12B-FP8\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">Nemotron 3 Super<\/a> to address these limitations. The new Super model is a 120B total, 12B active-parameter model that delivers maximum compute efficiency and accuracy for complex multi-agent applications such as software development and cybersecurity triaging. This model follows <a href=\"https:\/\/developer.nvidia.com\/blog\/inside-nvidia-nemotron-3-techniques-tools-and-data-that-make-it-efficient-and-accurate\/\" data-wpel-link=\"internal\" target=\"_self\" rel=\"follow nofollow noopener\">the introduction of Nemotron 3 Nano<\/a> in December.<\/p>\n<p>Super addresses the \u201cthinking tax\u201d with its hybrid mixture-of-experts (<a href=\"https:\/\/www.nvidia.com\/en-us\/glossary\/mixture-of-experts\/\" data-wpel-link=\"internal\" target=\"_self\" rel=\"follow nofollow noopener\">MoE<\/a>) architecture. It delivers over 5x throughput than the previous Nemotron Super. This model tackles the \u201ccontext explosion\u201d with a native 1M-token context window that gives agents long-term memory for aligned, high-accuracy reasoning. The model is fully open with open weights, datasets, and recipes so developers can easily customize, optimize, and deploy it on their own <a href=\"https:\/\/www.nvidia.com\/en-us\/glossary\/ai-infrastructure\/\" data-wpel-link=\"internal\" target=\"_self\" rel=\"follow nofollow noopener\">infrastructure<\/a>.<\/p>\n<p>What makes Nemotron 3 Super different<a href=\"#what_makes_nemotron_3_super_different\" class=\"heading-anchor-link\"><\/a><\/p>\n<p>Nemotron 3 Super isn\u2019t just a bigger Nano. It introduces architectural innovations that allow the model to mitigate some of the typical efficiency-accuracy tradeoffs for high-capacity reasoning models:<\/p>\n<p>Latent MoE that calls 4x as many expert specialists for the same inference cost, by compressing tokens before they reach the experts.<\/p>\n<p>Multi-token prediction (MTP) that predicts multiple future tokens in one forward pass, dramatically reducing generation time for long sequences and enabling built-in speculative decoding.<\/p>\n<p>Hybrid Mamba-Transformer backbone integrating Mamba layers for sequence efficiency with Transformer layers for precision reasoning, delivering higher throughput with 4x improved memory and compute efficiency.<\/p>\n<p>Native NVFP4 pretraining optimized for NVIDIA Blackwell, significantly cutting memory requirements and speeding up inference by 4x on NVIDIA B200 compared to FP8 on NVIDIA H100, while maintaining accuracy.<\/p>\n<p>Multi-environment reinforcement-learning (RL) post-trained with <a href=\"https:\/\/www.nvidia.com\/en-us\/glossary\/reinforcement-learning\/\" data-wpel-link=\"internal\" target=\"_self\" rel=\"follow nofollow noopener\">RL<\/a> across 21 environment configurations using NVIDIA <a href=\"https:\/\/docs.nvidia.com\/nemo\/gym\/latest\/index.html\" data-wpel-link=\"internal\" target=\"_self\" rel=\"follow nofollow noopener\">NeMo Gym<\/a> and NVIDIA <a href=\"https:\/\/docs.nvidia.com\/nemo\/rl\/latest\/index.html\" data-wpel-link=\"internal\" target=\"_self\" rel=\"follow nofollow noopener\">NeMo RL<\/a>, trained with more than 1.2 million environment rollouts.<\/p>\n<p>These advantages come together to create a model that is well suited for long-running autonomous agents. On <a href=\"https:\/\/pinchbench.com\/?score=best\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">PinchBench<\/a>\u2014a new benchmark for determining how well LLM models perform as the brain of an OpenClaw agent\u2014Nemotron 3 Super scores 85.6% across the full test suite, making it the best open model in its class.<\/p>\n<p>See it in action<a href=\"#see_it_in_action\" class=\"heading-anchor-link\"><\/a><\/p>\n<p>If you want to go hands on with Nemotron 3 Super, follow the tutorial video below. This will walk you through how to use the model from <a href=\"http:\/\/build.nvidia.com\" data-wpel-link=\"internal\" target=\"_self\" rel=\"follow nofollow noopener\">build.nvidia.com<\/a> to OpenCode.<\/p>\n<p>Video 1. A tutorial walkthrough of Nemotron 3 Super<\/p>\n<p>Diving deep into the architecture<a href=\"#diving_deep_into_the_architecture\" class=\"heading-anchor-link\"><\/a><\/p>\n<p>Hybrid Mamba-Transformer MoE backbone<a href=\"#hybrid_mamba-transformer_moe_backbone\" class=\"heading-anchor-link\"><\/a><\/p>\n<p>Super builds on the same hybrid philosophy as Nano but at a fundamentally different scale. The backbone interleaves three layer types:<\/p>\n<p>Mamba-2 layers handle the majority of sequence processing. State space models (SSMs) provide linear-time complexity with respect to sequence length, which is what makes the 1M-token context window practical rather than theoretical. When an agent needs to reason over an entire codebase, a long conversation history, or a stack of retrieved documents, Mamba layers keep the memory footprint manageable.<\/p>\n<p>Transformer attention layers are interleaved at key depths. Pure SSMs can struggle with precise associative recall\u2014the kind of task where you need to find one specific fact buried in a long context. The attention layers preserve this capability, ensuring that Super maintains high-fidelity retrieval even when the \u201cneedle\u201d sits in the middle of a haystack of conflicting information.<\/p>\n<p>MoE layers scale effective parameter count without the cost of dense computation. Only a subset of experts activates per token, keeping latency low and throughput high\u2014critical when many agents are running concurrently in a shared deployment.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1696\" height=\"313\" src=\"https:\/\/www.newsbeep.com\/ca\/wp-content\/uploads\/2026\/03\/image3-3.webp.webp\" alt=\"Architecture diagram of Nemotron-3-Super-120B-A12B showing five groups of repeating layer blocks connected in sequence. Each block contains six layers in order: Mamba-2, Latent MoE, Mamba-2, Attention, Mamba-2, Latent MoE.\" class=\"lazyload wp-image-113812\"  data-\/>Figure 1. A layer pattern diagram showing repeating blocks of Mamba-2\/MoE pairs interleaved with attention layers<\/p>\n<p>Latent MoE<a href=\"#latent_moe\" class=\"heading-anchor-link\"><\/a><\/p>\n<p>Standard MoE architectures route tokens directly from the model\u2019s full hidden dimension to the experts. As models grow, this routing layer becomes a bottleneck\u2014it increases compute costs and limits how many experts you can practically deploy.<\/p>\n<p>Super introduces latent MoE: Before routing decisions are made, token embeddings are projected into a compressed, low-rank latent space. Expert computation happens in this smaller dimension, and results are projected back to the full model dimension afterward.<\/p>\n<p>Why this matters in practice:<\/p>\n<p>More experts, same cost. By compressing tokens before they reach the experts, latent MoE enables the model to consult 4x as many experts for the exact same computational cost as running one.<\/p>\n<p>Finer-grained specialization. With more experts available, the model can afford highly specialized routing\u2014for example, activating distinct experts for Python syntax versus SQL logic\u2014that are only activated when strictly necessary. This granularity is especially valuable in agentic settings where a single conversation may span tool calls, code generation, data analysis, and conversational reasoning within a few turns.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1175\" height=\"770\" src=\"https:\/\/www.newsbeep.com\/ca\/wp-content\/uploads\/2026\/03\/image2-2.webp.webp\" alt=\"A diagram comparing standard MoE and Latent MoE transformer architectures side by side.\" class=\"lazyload wp-image-113728\"  data-\/>Figure 2. Side-by-side comparison of standard MoE vs. latent MoE architectures<\/p>\n<p>Multi-token prediction (MTP)<a href=\"#multi-token_prediction_mtp\" class=\"heading-anchor-link\"><\/a><\/p>\n<p>Standard <a href=\"https:\/\/www.nvidia.com\/en-us\/glossary\/large-language-models\/\" data-wpel-link=\"internal\" target=\"_self\" rel=\"follow nofollow noopener\">language models<\/a> are trained to predict one token at a time\u2014a fundamentally myopic objective. Super is trained with MTP, where specialized prediction heads forecast several future tokens simultaneously from each position.<\/p>\n<p>This has two concrete benefits:<\/p>\n<p>Stronger reasoning during training. Predicting multiple future tokens forces the model to internalize longer-range structure and logical dependencies. Rather than learning to guess plausible next words, the model must learn to anticipate coherent sequences. This produces measurable gains on chain-of-thought tasks where each step must follow logically from the last.<\/p>\n<p>Built-in speculative decoding at inference. By predicting multiple future tokens simultaneously in one forward pass, MTP dramatically reduces the time required to generate long sequences. The MTP heads provide draft predictions that can be verified in parallel, enabling up to 3x wall-clock speedups for structured generation tasks like code and tool calls\u2014without requiring a separate draft model.<\/p>\n<p>Both benefits stem from the same design decision. Unlike architectures that train independent prediction heads per offset, Super uses a shared-weight design across all MTP heads. This keeps the parameter overhead minimal while improving training stability\u2014the heads learn to agree on coherent continuations rather than diverging into offset-specific shortcuts. The same weight sharing also makes the speculative drafts more consistent at longer draft lengths, which is where independently trained heads typically degrade.<\/p>\n<p>Native NVFP4 pretraining<a href=\"#native_nvfp4_pretraining\" class=\"heading-anchor-link\"><\/a><\/p>\n<p>Most quantized models start as full-precision and get compressed after training, which inevitably introduces accuracy loss. Super takes a different approach: The majority of floating-point multiply-accumulate operations during pretraining run in <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvfp4-for-efficient-and-accurate-low-precision-inference\/\" data-wpel-link=\"internal\" target=\"_self\" rel=\"follow nofollow noopener\">NVFP4,<\/a> the NVIDIA 4-bit floating-point format. Optimized for Blackwell, this significantly cuts memory requirements and speeds up inference compared to FP8, while maintaining accuracy.<\/p>\n<p>Training natively in reduced precision means the model learns to be accurate within the constraints of 4-bit arithmetic from the very first gradient update. The result is a model that is mathematically stable and accurate despite running on a significantly reduced memory footprint.<\/p>\n<p>How we trained Nemotron 3 Super<a href=\"#how_we_trained_nemotron_3_super\" class=\"heading-anchor-link\"><\/a><\/p>\n<p>Nemotron 3 Super is trained in three sequential phases, each building on the last. <a href=\"https:\/\/huggingface.co\/collections\/nvidia\/nemotron-pre-training-datasets\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">Pretraining<\/a> establishes broad world knowledge and language understanding at scale. Supervised fine-tuning shapes the model\u2019s behavior across the task types it will encounter in deployment. Reinforcement learning then refines that behavior against verifiable outcomes across diverse agentic environments.<\/p>\n<p>Pretraining<a href=\"#pretraining\" class=\"heading-anchor-link\"><\/a><\/p>\n<p>Super is pretrained on 25 trillion tokens using NVFP4, the NVIDIA 4-bit floating-point format optimized for NVIDIA Blackwell. Rather than quantizing a full-precision model after the fact, Super trains natively in reduced precision from the first gradient update\u2014meaning the model learns to be accurate within the constraints of 4-bit arithmetic throughout pretraining, not just at inference. The pretraining corpus spans 10 trillion unique curated tokens, with the model seeing 25 trillion total tokens across the run, including additional compute focused on reasoning and coding.<\/p>\n<p>Supervised fine-tuning<a href=\"#supervised_fine-tuning\" class=\"heading-anchor-link\"><\/a><\/p>\n<p>Before reinforcement learning, Super undergoes supervised fine-tuning on about 7 million SFT samples. They\u2019re drawn from a broader post-training corpus of 40 million samples, which cover reasoning, instruction following, coding, safety, and multi-step agent tasks. This stage establishes the behavioral foundation that RL then refines. The model learns the format and structure of correct responses across task types, giving the subsequent RL phase a stable starting point rather than optimizing from a raw pretrained checkpoint.<\/p>\n<p>Multi-environment reinforcement learning<a href=\"#multi-environment_reinforcement_learning\" class=\"heading-anchor-link\"><\/a><\/p>\n<p>To align Super with real agentic behavior, the model is post-trained using reinforcement learning across diverse environments in <a href=\"https:\/\/docs.nvidia.com\/nemo\/gym\/0.1.0\/about\/index.html\" data-wpel-link=\"internal\" target=\"_self\" rel=\"follow nofollow noopener\">NeMo Gym,<\/a> the NVIDIA open source library for building and scaling RL training environments. These environments evaluate the model\u2019s ability to perform sequences of actions\u2014generating correct tool calls, writing functional code, producing multi-part plans that satisfy verifiable criteria\u2014not just providing satisfying single-turn responses. These trajectories form the core training data to run reinforcement learning at scale with the <a href=\"https:\/\/docs.nvidia.com\/nemo\/rl\/latest\/index.html\" data-wpel-link=\"internal\" target=\"_self\" rel=\"follow nofollow noopener\">NeMo RL<\/a> open library.<\/p>\n<p>This trajectory-based reinforcement produces a model that behaves reliably under multi-step workflows, reduces reasoning drift, and handles the kinds of structured operations common in agentic pipelines.<\/p>\n<p>Benchmarking Nemotron 3 Super<a href=\"#benchmarking_nemotron_3_super\" class=\"heading-anchor-link\"><\/a><\/p>\n<p>Nemotron 3 Super achieves leading accuracy across a number of important agentic benchmarks while maintaining incredible throughput.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1531\" height=\"566\" src=\"https:\/\/www.newsbeep.com\/ca\/wp-content\/uploads\/2026\/03\/image4-1.webp.webp\" alt=\"A bar chart benchmarking Nemotron 3 Super 120B against GPT OSS 120B and Qwen3 122B across accuracy and throughput metrics.\" class=\"lazyload wp-image-113821\"  data-\/>Figure 3. A chart comparing Nemotron 3 Super accuracy on key benchmarks against similarly sized open models.<\/p>\n<p>The \u201cSuper + Nano\u201d deployment pattern<a href=\"#the_\u201csuper_+_nano\u201d_deployment_pattern\" class=\"heading-anchor-link\"><\/a><\/p>\n<p>Nemotron 3 Nano is an excellent choice for achieving high accuracy in executing targeted, individual steps within an agentic workflow. However, when multi-agent applications escalate to complex, multi-step activities, they require a high-capacity model for superior planning and reasoning.\u00a0 Think of a computer use agent that needs to make decisions between different modalities of tools in order to, say, create a presentation with 10 high-quality slides.<\/p>\n<p>Nemotron 3 Super is ideal in this use. For instance, in software development, simple merge requests can be addressed by Nemotron 3 Nano while complex coding tasks that require deeper understanding of the code base can be handled by Nemotron 3 Super. And expert-level coding tasks can be addressed by proprietary models.<\/p>\n<p>Building with Super\u2019s open resources<a href=\"#building_with_super\u2019s_open_resources\" class=\"heading-anchor-link\"><\/a><\/p>\n<p><a href=\"https:\/\/huggingface.co\/nvidia\/NVIDIA-Nemotron-3-Super-120B-A12B-FP8\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">Nemotron 3 Super<\/a> is fully open\u2014weights, datasets, and recipes\u2014so developers can easily customize, optimize, and deploy the model on their own infrastructure for maximum privacy and security.<\/p>\n<p>Model weights<a href=\"#model_weights\" class=\"heading-anchor-link\"><\/a><\/p>\n<p>Full parameter checkpoints for Nemotron 3 Super are available on <a href=\"https:\/\/huggingface.co\/nvidia\/NVIDIA-Nemotron-3-Super-120B-A12B-FP8\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">Hugging Face<\/a> and through <a href=\"https:\/\/build.nvidia.com\/nvidia\/nemotron-3-super-120b-a12b\" data-wpel-link=\"internal\" target=\"_self\" rel=\"follow nofollow noopener\">NVIDIA NIM<\/a>. The <a href=\"https:\/\/www.nvidia.com\/en-us\/agreements\/enterprise-software\/nvidia-nemotron-open-model-license\/\" data-wpel-link=\"internal\" target=\"_self\" rel=\"follow nofollow noopener\">NVIDIA Nemotron Open Model License<\/a> gives enterprises the flexibility to maintain data control and deploy anywhere.<\/p>\n<p>End-to-end training and evaluation recipes<a href=\"#end-to-end_training_and_evaluation_recipes\" class=\"heading-anchor-link\"><\/a><\/p>\n<p>We are releasing the complete training and evaluation recipe for Nemotron 3 Super, covering the full pipeline from pretraining through alignment. This enables developers to reproduce Super\u2019s training, adapt the recipe for domain-specific variants, or use it as a starting point for their own hybrid architecture research.<\/p>\n<p>Deployment cookbooks<a href=\"#deployment_cookbooks\" class=\"heading-anchor-link\"><\/a><\/p>\n<p>We\u2019ve built <a href=\"https:\/\/github.com\/NVIDIA-NeMo\/Nemotron\/tree\/main\/usage-cookbook\/Nemotron-3-Super\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">ready-to-use cookbooks<\/a> for major inference engines, each with configuration templates, performance tuning guidance, and reference scripts:<\/p>\n<p><a href=\"https:\/\/github.com\/NVIDIA-NeMo\/Nemotron\/blob\/main\/usage-cookbook\/Nemotron-3-Super\/vllm_cookbook.ipynb\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">vLLM Cookbook<\/a>: High-throughput continuous batching and streaming for Super.<\/p>\n<p><a href=\"https:\/\/github.com\/NVIDIA-NeMo\/Nemotron\/blob\/main\/usage-cookbook\/Nemotron-3-Super\/sglang_cookbook.ipynb\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">SGLang Cookbook<\/a>: Fast, lightweight inference optimized for multi-agent tool-calling workloads.<\/p>\n<p><a href=\"https:\/\/github.com\/NVIDIA-NeMo\/Nemotron\/blob\/main\/usage-cookbook\/Nemotron-3-Super\/trtllm_cookbook.ipynb\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">NVIDIA TensorRT LLM Cookbook<\/a>: Fully optimized TensorRT LLM engines with latent MoE kernels for production-grade, low-latency deployment.<\/p>\n<p>Fine-tuning cookbooks<a href=\"#fine-tuning_cookbooks\" class=\"heading-anchor-link\"><\/a><\/p>\n<p>Explore our Nemotron 3 Super customization cookbooks to efficiently fine-tune for your domain (LoRA\/SFT) or advance its agentic reasoning capabilities (GRPO\/DAPO):<\/p>\n<p>Open datasets<a href=\"#open_datasets\" class=\"heading-anchor-link\"><\/a><\/p>\n<p>Nemotron 3 Super is built on a fully open, end-to-end data pipeline that spans pretraining, post-training, and interactive reinforcement learning\u2014giving developers reproducible building blocks for agentic AI.<\/p>\n<p><a href=\"https:\/\/huggingface.co\/collections\/nvidia\/nemotron-pre-training-datasets\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">Pretraining corpora<\/a>: 10 trillion curated tokens, trained over 25 trillion total seen tokens, plus an additional 10 billion tokens focused on reasoning and 15 million coding problems. All aggressively deduplicated and quality-filtered to maximize signal-to-noise.<\/p>\n<p><a href=\"https:\/\/huggingface.co\/collections\/nvidia\/nemotron-post-training-v3\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">Post-training datasets<\/a>: 40 million new supervised and alignment samples, covering reasoning, instruction following, coding, safety, and multi-step agent tasks across supervised fine-tuning, preference data, and RL trajectories (about 7 million used directly for SFT)<\/p>\n<p><a href=\"https:\/\/huggingface.co\/collections\/nvidia\/nemo-gym\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">RL tasks and environments<\/a>: Interactive RL across 21 environment configurations and 37 datasets (~10 of which are being released) including software engineer-style agent training and tool-augmented search\/planning tasks\u2014moving beyond static text into dynamic, verifiable execution workflows and generating ~1.2 million environment rollouts during training.<\/p>\n<p>Open training and evaluation infrastructure<a href=\"#open_training_and_evaluation_infrastructure\" class=\"heading-anchor-link\"><\/a><\/p>\n<p>NVIDIA publishes development techniques and tools, giving researchers and enterprises the flexibility to customize Nemotron 3 Super or build their own reasoning models. All recipes integrate with the Nemotron GitHub repository, <a href=\"https:\/\/docs.nvidia.com\/nemo\/gym\/latest\/index.html\" data-wpel-link=\"internal\" target=\"_self\" rel=\"follow nofollow noopener\">NeMo Gym<\/a>, <a href=\"https:\/\/docs.nvidia.com\/nemo\/rl\/latest\/index.html\" data-wpel-link=\"internal\" target=\"_self\" rel=\"follow nofollow noopener\">NeMo RL<\/a>, <a href=\"https:\/\/nvidia-nemo.github.io\/DataDesigner\/latest\/\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">NVIDIA NeMo Data Designer<\/a>, <a href=\"https:\/\/docs.nvidia.com\/nemo\/curator\/latest\/\" data-wpel-link=\"internal\" target=\"_self\" rel=\"follow nofollow noopener\">NVIDIA NeMo Curator<\/a>, and <a href=\"https:\/\/docs.nvidia.com\/nemo\/evaluator\/latest\/\" data-wpel-link=\"internal\" target=\"_self\" rel=\"follow nofollow noopener\">NVIDIA NeMo Evaluator<\/a>\u2014providing a complete, reproducible pipeline from data to deployment.<\/p>\n<p>All Nemotron models are released with an open evaluation approach, including a published <a href=\"https:\/\/github.com\/NVIDIA-NeMo\/Nemotron\/blob\/main\/docs\/nemotron\/super3\/evaluate.md\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">evaluation recipe<\/a> that enables anyone to rerun and inspect the full evaluation pipeline from Nemotron 3 Super.<\/p>\n<p>Get started<a href=\"#get_started\" class=\"heading-anchor-link\"><\/a><\/p>\n<p>Nemotron 3 Super is live now. Available across leading inference platforms and packaged as NVIDIA NIM, Super can run anywhere from the workstation to the cloud. Try it on <a href=\"https:\/\/www.perplexity.ai\/\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">Perplexity<\/a> with a Pro subscription or through API, <a href=\"https:\/\/openrouter.ai\/nvidia\/nemotron-3-super-120b-a12b:free\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">OpenRouter<\/a>, or <a href=\"https:\/\/build.nvidia.com\/nvidia\/nemotron-3-super-120b-a12b\" data-wpel-link=\"internal\" target=\"_self\" rel=\"follow nofollow noopener\">build.nvidia.com<\/a>.<\/p>\n<p>Download the weights from <a href=\"https:\/\/huggingface.co\/nvidia\/NVIDIA-Nemotron-3-Super-120B-A12B-FP8\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">Hugging Face<\/a>, launch an optimized instance through NVIDIA NIM, fine-tune with <a href=\"https:\/\/unsloth.ai\/docs\/models\/nemotron-3-super\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">Unsloth<\/a>, or start with the cookbooks to get running in minutes.<\/p>\n<p>Super is also available through <a href=\"https:\/\/www.baseten.co\/library\/nvidia-nemotron-super\/\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">Baseten<\/a>, <a href=\"https:\/\/developers.cloudflare.com\/changelog\/post\/2026-03-11-nemotron-3-super-workers-ai\/\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">Cloudflare<\/a>, <a href=\"https:\/\/wandb.ai\/site\/inference-model\/cw_nvidia_nemotron-3-super-120b-a12b\/\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">CoreWeave<\/a>, <a href=\"https:\/\/deepinfra.com\/nvidia\/NVIDIA-Nemotron-3-Super-120B-A12B\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">DeepInfra<\/a>, <a href=\"https:\/\/app.fireworks.ai\/models\/fireworks\/nvidia-nemotron-3-super-120b-a12b-fp8\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">Fireworks AI,<\/a> <a href=\"https:\/\/friendli.ai\/blog\/nvidia-nemotron-3-super\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">FriendliAI, <\/a><a href=\"http:\/\/inference.net\/blog\/nemotron-finetuning\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">Inference.net<\/a>, <a href=\"https:\/\/lightning.ai\/models\/lightning-ai-nvidia-nemotron-3-super-120b\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">Lightning AI<\/a>, <a href=\"https:\/\/modal.com\/docs\/examples\/nemotron_inference\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">Modal,<\/a> <a href=\"https:\/\/tokenfactory.nebius.com\/playground?models=nvidia\/nemotron-3-super-120b-a12b\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">Nebius<\/a>, and <a href=\"http:\/\/www.together.ai\/blog\/nvidia-nemotron-3-super\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">Together AI<\/a>.<\/p>\n<p>Check out our <a href=\"https:\/\/github.com\/NVIDIA-NeMo\/Nemotron\/tree\/main\/usage-cookbook\/Nemotron-3-Super\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">GitHub repository<\/a> which has getting started instructions for platforms like OpenCode, OpenHands, and OpenClaw.<\/p>\n<p>For the full technical details, read the <a href=\"https:\/\/research.nvidia.com\/labs\/nemotron\/files\/NVIDIA-Nemotron-3-Super-Technical-Report.pdf\" data-wpel-link=\"internal\" target=\"_self\" rel=\"follow nofollow noopener\">Nemotron 3 Super technical report<\/a>.<\/p>\n<p>Stay up-to-date on<a href=\"https:\/\/www.nvidia.com\/en-us\/ai-data-science\/foundation-models\/nemotron\/\" data-wpel-link=\"internal\" target=\"_self\" rel=\"follow nofollow noopener\"> NVIDIA Nemotron<\/a> by subscribing to <a href=\"https:\/\/www.nvidia.com\/en-us\/ai-data-science\/generative-ai\/news\/\" data-wpel-link=\"internal\" target=\"_self\" rel=\"follow nofollow noopener\">NVIDIA news<\/a> and following NVIDIA AI on <a href=\"https:\/\/www.linkedin.com\/showcase\/nvidia-ai\/posts\/?feedView=all\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">LinkedIn<\/a>, <a href=\"https:\/\/x.com\/NVIDIAAIDev\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow\">X<\/a>, <a href=\"https:\/\/discord.com\/invite\/nvidiadeveloper\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">Discord<\/a>, and <a href=\"https:\/\/www.youtube.com\/@NVIDIADeveloper\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">YouTube<\/a>. Visit the <a href=\"https:\/\/developer.nvidia.com\/nemotron\" data-wpel-link=\"internal\" target=\"_self\" rel=\"follow nofollow noopener\">Nemotron developer page<\/a> for resources to get started. Explore open Nemotron models and datasets on <a href=\"https:\/\/huggingface.co\/collections\/nvidia\/nvidia-nemotron-v3\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">Hugging Face<\/a> and <a href=\"https:\/\/build.nvidia.com\/blueprints\" data-wpel-link=\"internal\" target=\"_self\" rel=\"follow nofollow noopener\">Blueprints<\/a> on <a href=\"http:\/\/build.nvidia.com\" data-wpel-link=\"internal\" target=\"_self\" rel=\"follow nofollow noopener\">build.nvidia.com<\/a>. And engage with <a href=\"https:\/\/www.youtube.com\/playlist?list=PL5B692fm6--vEL0FwctKghCpyEnBGAQJA\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">Nemotron livestreams<\/a>, <a href=\"https:\/\/www.youtube.com\/playlist?list=PL5B692fm6--vdRKB14FImVi7MTJ77zjn4\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">tutorials<\/a>, and the developer community on the <a href=\"https:\/\/forums.developer.nvidia.com\/c\/ai-data-science\/nvidia-nemotron\/669\" data-wpel-link=\"internal\" target=\"_self\" rel=\"follow nofollow noopener\">NVIDIA forum<\/a> and <a href=\"https:\/\/discord.com\/invite\/nvidiadeveloper\" data-wpel-link=\"external\" target=\"_blank\" rel=\"follow nofollow noopener\">Discord<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"Agentic AI systems need models with the specialized depth to solve dense technical problems autonomously. They must excel&hellip;\n","protected":false},"author":2,"featured_media":530126,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[62,276,277,49,48,61],"class_list":{"0":"post-530125","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-ca","12":"tag-canada","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/posts\/530125","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/comments?post=530125"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/posts\/530125\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/media\/530126"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/media?parent=530125"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/categories?post=530125"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/tags?post=530125"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}