Jensen Huang CES2026 Latest Speech: Three Key Topics, One "Chip Monster"

Author: Li Hailun and Su Yang

Beijing Time January 6, NVIDIA CEO Jensen Huang once again took the main stage at CES2026, dressed in his signature leather jacket.

At CES 2025, NVIDIA showcased mass-produced Blackwell chips and a complete physical AI technology stack. During the event, Jensen Huang emphasized that an “Era of Physical AI” is beginning. He painted an imaginative future: autonomous vehicles with reasoning capabilities, robots that can understand and think, and AIAgents capable of handling long-context tasks with millions of tokens.

A year has passed in the blink of an eye, and the AI industry has undergone significant transformative evolution. Huang Huang revisited this year’s changes during the release, with a focus on open-source models.

He said that open-source reasoning models like DeepSeek R1 have made the entire industry realize: once open and global collaboration truly kicks off, AI diffusion will accelerate rapidly. Although open-source models still lag behind cutting-edge models by about half a year in overall capability, they are catching up every six months, and their download and usage volumes have already exploded.

Compared to 2025, which showcased more visions and possibilities, this time NVIDIA is systematically aiming to solve the “how to achieve” problem: around reasoning AI, filling in the long-term operational infrastructure of computing power, networking, and storage, significantly lowering reasoning costs, and embedding these capabilities directly into real-world scenarios like autonomous driving and robotics.

In Jensen Huang’s CES speech, three main themes were discussed:

● At the system and infrastructure level, NVIDIA has reconstructed the computing, networking, and storage architecture around long-term reasoning needs. Centered on the Rubin platform, NVLink 6, Spectrum-X Ethernet, and reasoning context memory storage platform, these updates directly address bottlenecks such as high reasoning costs, difficulty maintaining context, and limited scalability, solving problems like “more thinking,” affordability, and long-term operation.

● At the model level, NVIDIA places reasoning AI (Reasoning / Agentic AI) at the core. Through models and tools like Alpamayo, Nemotron, Cosmos Reason, it promotes AI from “content generation” to continuous thinking, shifting from “single-response models” to “long-term working agents.”

● At the application and deployment level, these capabilities are directly integrated into physical AI scenarios such as autonomous driving and robotics. Whether it’s the Alpamayo-driven autonomous driving system or the GR00T and Jetson robot ecosystem, they are advancing large-scale deployment through collaborations with cloud providers and enterprise platforms.

01 From Roadmap to Mass Production: Rubin Discloses Performance Data for the First Time

At this CES, NVIDIA fully disclosed the technical details of the Rubin architecture for the first time.

In his speech, Huang Huang started with Test-time Scaling, a concept that can be understood as: to make AI smarter, it’s no longer just about making it “read more diligently,” but about “thinking a bit longer when encountering problems.”

In the past, AI capability improvements mainly relied on increasing computational power during training, making models larger and larger; now, the new change is that even if models no longer grow, giving them more time and compute power during each use to think can also significantly improve results.

How to make “AI think a bit longer” economically feasible? The new generation AI computing platform of the Rubin architecture is designed to solve this problem.

Huang Huang introduced that this is a complete next-generation AI computing system, achieved through the collaborative design of Vera CPU, Rubin GPU, NVLink 6, ConnectX-9, BlueField-4, Spectrum-6, enabling a revolutionary reduction in reasoning costs.

NVIDIA Rubin GPU is the core chip responsible for AI computation within the Rubin architecture, aiming to significantly reduce the unit costs of reasoning and training.

Simply put, the core task of Rubin GPU is “making AI more cost-effective and smarter to use.”

The core capability of Rubin GPU is: the same GPU can do more work. It can handle more reasoning tasks at once, remember longer contexts, and communicate faster with other GPUs. This means many scenarios that previously relied on “multi-GPU stacking” can now be done with fewer GPUs.

As a result, reasoning becomes not only faster but also markedly cheaper.

Huang Huang reviewed the hardware parameters of the Rubin architecture’s NVL72: containing 220 trillion transistors, with a bandwidth of 260 TB/sec, it is the industry’s first platform supporting rack-scale confidential computing.

Overall, compared to Blackwell, Rubin GPU achieves cross-generational leaps in key metrics: NVFP4 reasoning performance up to 50 PFLOPS (5 times), training performance up to 35 PFLOPS (3.5 times), HBM4 memory bandwidth up to 22 TB/sec (2.8 times), and single GPU NVLink interconnect bandwidth doubled to 3.6 TB/sec.

These improvements collectively enable a single GPU to handle more reasoning tasks and longer contexts, fundamentally reducing dependence on the number of GPUs.

Vera CPU is a core component designed specifically for data movement and Agentic processing, equipped with 88 NVIDIA self-developed Olympus cores, with 1.5 TB of system memory (three times that of the previous Grace CPU), and achieves coherent memory access between CPU and GPU via 1.8 TB/sec NVLink-C2C technology.

Unlike traditional general-purpose CPUs, Vera focuses on data scheduling and multi-step reasoning logic in AI inference scenarios. Essentially, it is a system coordinator that enables “AI to think a bit longer” efficiently.

NVLink 6, with 3.6 TB/sec bandwidth and compute capability within the network, allows the 72 GPUs in the Rubin architecture to work together like a super GPU, which is key infrastructure for reducing reasoning costs.

This setup allows data and intermediate results needed for reasoning to flow rapidly between GPUs, avoiding repeated waiting, copying, or recomputation.

In the Rubin architecture, NVLink-6 handles internal GPU collaboration, BlueField-4 manages context and data scheduling, and ConnectX-9 provides high-speed external network connectivity. This ensures that the Rubin system can communicate efficiently with other racks, data centers, and cloud platforms, which is essential for large-scale training and reasoning tasks.

Compared to the previous generation, NVIDIA provides specific, intuitive data: relative to the NVIDIA Blackwell platform, reasoning token costs can be reduced by up to 10 times, and the number of GPUs needed for training mixture-of-experts (MoE) models can be reduced to one-quarter.

NVIDIA states that Microsoft has committed to deploying hundreds of thousands of Vera Rubin chips in the next-generation Fairwater AI superfactory, and cloud providers like CoreWeave will offer Rubin instances in the second half of 2026. This infrastructure that “lets AI think a bit longer” is moving from technical demonstration to large-scale commercial deployment.

02 How to Solve the “Storage Bottleneck”?

Allowing AI to “think a bit longer” still faces a key technical challenge: where to store the context data?

When AI handles complex tasks requiring multiple rounds of dialogue and multi-step reasoning, it generates a large amount of context data (KV Cache). Traditional architectures either cram these into expensive, capacity-limited GPU memory or store them in regular storage (which is too slow to access). If this “storage bottleneck” isn’t solved, even the strongest GPU will be hampered.

To address this, NVIDIA fully disclosed at CES the inference context memory storage platform driven by BlueField-4, with the core goal of creating a “third layer” between GPU memory and traditional storage. It is designed to be fast enough, have ample capacity, and support long-term AI operation.

From a technical perspective, this platform isn’t just a single component but a result of coordinated design:

BlueField-4 accelerates management and access to context data at the hardware level, reducing data movement and system overhead;

Spectrum-X Ethernet provides high-performance networking, supporting RDMA-based high-speed data sharing;

Software components like DOCA, NIXL, and Dynamo optimize scheduling, reduce latency, and improve overall throughput at the system level.

We can understand this platform as extending the context data, originally only stored in GPU memory, into an independent, high-speed, shareable “memory layer.” This relieves GPU pressure while enabling rapid sharing of context information across multiple nodes and AI agents.

In practical terms, NVIDIA states that in specific scenarios, this approach can increase tokens processed per second by up to 5 times and achieve similar levels of energy efficiency.

Huang Huang repeatedly emphasized that AI is evolving from “single-use chatbots” to true intelligent collaborators: they need to understand the real world, perform continuous reasoning, call tools to complete tasks, and retain both short-term and long-term memories. This is the core feature of Agentic AI. The reasoning context memory storage platform is designed for this long-term, iterative thinking AI, expanding context capacity and accelerating cross-node sharing, making multi-turn conversations and multi-agent collaboration more stable and no longer “slowing down over time.”

03

Next-Generation DGX SuperPOD: 576 GPUs Collaborate

NVIDIA announced at CES the new generation DGX SuperPOD based on the Rubin architecture, extending Rubin from a single rack to a complete data center solution.

What is DGX SuperPOD?

If Rubin NVL72 is a “super rack” equipped with 72 GPUs, then DGX SuperPOD is a connected array of multiple such racks, forming a larger AI computing cluster. The version announced consists of 8 Vera Rubin NVL72 racks, totaling 576 GPUs working together.

As AI task scales continue to grow, a single rack’s 576 GPUs may no longer suffice—for training ultra-large models, serving thousands of Agentic AIs simultaneously, or handling complex tasks with hundreds of millions of tokens of context. Multiple racks working in concert are needed, and DGX SuperPOD is a standardized solution designed for such scenarios.

For enterprises and cloud providers, DGX SuperPOD offers a “ready-to-use” large-scale AI infrastructure solution. No need to research how to connect hundreds of GPUs, configure networks, or manage storage.

The new DGX SuperPOD’s five core components:

○ 8 Vera Rubin NVL72 racks – the core of computing power, each with 72 GPUs, totaling 576 GPUs;

○ NVLink 6 expanded network – enabling these 8 racks’ 576 GPUs to work together like a super GPU;

○ Spectrum-X Ethernet expanded network – connecting different SuperPODs and linking to storage and external networks;

○ Inference context memory storage platform – providing shared context data storage for long-duration reasoning tasks;

○ NVIDIA Mission Control software – managing system scheduling, monitoring, and optimization.

This upgrade centers the SuperPOD infrastructure on the DGX Vera Rubin NVL72 rack system. Each NVL72 is a complete AI supercomputer, internally connected via NVLink 6, capable of large-scale reasoning and training within a single rack. The new DGX SuperPOD, composed of multiple NVL72 racks, forms a system-level cluster capable of long-term operation.

When scaling from “single rack” to “multi-rack,” a new bottleneck emerges: how to reliably and efficiently transfer massive data between racks. To address this, NVIDIA simultaneously announced a new generation Ethernet switch based on Spectrum-6 chips, and for the first time introduced “co-packaged optics” (CPO) technology.

In simple terms, this means integrating the optical modules directly into the switch chip, shortening signal transmission distances from meters to millimeters, significantly reducing power consumption and latency, and improving overall system stability.

04 NVIDIA Opens Source for the Entire AI “Family”: From Data to Code

At CES, Huang Huang announced expanding the open-source ecosystem (Open Model Universe), adding and updating a series of models, datasets, codebases, and tools. This ecosystem covers six major fields: biomedical AI (Clara), physics simulation (Earth-2), Agentic AI (Nemotron), physical AI (Cosmos), robotics (GR00T), and autonomous driving (Alpamayo).

Training an AI model requires not only computational power but also high-quality datasets, pre-trained models, training code, evaluation tools, and a complete infrastructure. For most companies and research institutions, building all this from scratch is too time-consuming.

Specifically, NVIDIA open-sourced six levels of content: computing platforms (DGX, HGX, etc.), domain-specific training datasets, pre-trained foundational models, inference and training code libraries, complete training scripts, and end-to-end solution templates.

Nemotron series is a key update, covering four application directions.

In inference, including Nemotron 3 Nano, Nemotron 2 Nano VL, small-scale reasoning models, and reinforcement learning tools like NeMo RL and NeMo Gym. For RAG (retrieval-augmented generation), providing Nemotron Embed VL (vector embedding model), Nemotron Rerank VL (re-ranking model), relevant datasets, and NeMo Retriever Library. In safety, including Nemotron Content Safety models and datasets, and NeMo Guardrails libraries.

In speech, including Nemotron ASR automatic speech recognition, Granary Dataset speech data, and NeMo speech processing libraries. This means enterprises wanting to build RAG-enabled AI customer service systems can directly use NVIDIA’s pre-trained, open-source code without retraining embedding or re-ranking models.

05 Physical AI: Moving Toward Commercialization

The physical AI field also sees model updates—Cosmos for understanding and generating physical world videos, the general robot foundation model Isaac GR00T, and the autonomous driving visual-language-action model Alpamayo.

Huang Huang claimed at CES that the “ChatGPT moment” for physical AI is near, but many challenges remain: the physical world is too complex and variable, collecting real data is slow and expensive, and always insufficient.

What to do? Synthetic data is a solution. NVIDIA launched Cosmos.

This is an open-source foundational model for physical AI worlds, pre-trained with massive videos, real driving and robotics data, and 3D simulations. It can understand how the world works, linking language, images, 3D, and actions.

Huang Huang said Cosmos can realize many physical AI skills, such as content generation, reasoning, and trajectory prediction (even with just a single image). It can generate realistic videos based on 3D scenes, produce physically consistent movements from driving data, and generate panoramic videos from simulators, multi-camera footage, or textual descriptions. Even rare scenarios can be reconstructed.

Huang Huang also officially released Alpamayo. Alpamayo is an open-source toolchain for autonomous driving, and the first open-source visual-language-action (VLA) reasoning model. Unlike previous open-source code, NVIDIA is releasing the complete development resources from data to deployment.

The biggest breakthrough of Alpamayo is that it is a “reasoning” autonomous driving model. Traditional autonomous systems follow a “perception-planning-control” pipeline—braking at red lights, slowing down for pedestrians, following preset rules. Alpamayo introduces reasoning capabilities, understanding causal relationships in complex scenes, predicting the intentions of other vehicles and pedestrians, and even handling multi-step decision-making.

For example, at an intersection, it doesn’t just recognize “a car ahead,” but can reason “that car might turn left, so I should wait for it to pass first.” This ability upgrades autonomous driving from “rule-based driving” to “thinking like a human.”

Huang Huang announced that NVIDIA’s DRIVE system has entered mass production, with the first application being the all-new Mercedes-Benz CLA, scheduled to hit the roads in the US in 2026. This vehicle will feature L2++ level autonomous driving, using a “hybrid architecture of end-to-end AI models + traditional pipelines.”

In robotics, substantial progress has also been made.

Huang Huang stated that leading global robotics companies including Boston Dynamics, Franka Robotics, LEM Surgical, LG Electronics, Neura Robotics, and XRlabs are developing products based on the NVIDIA Isaac platform and the GR00T foundational model, covering industrial robots, surgical robots, humanoid robots, and consumer robots.

On the release stage, Huang Huang was surrounded by robots of various forms and purposes, displayed on a layered stage: from humanoid robots, bipedal and wheeled service robots, to industrial arms, construction machinery, drones, and surgical assistants, forming a “robot ecosystem picture.”

From physical AI applications to RubinAI computing platform, to reasoning context memory storage platform, and open-source AI “family bucket.”

The actions NVIDIA showcased at CES form a narrative about AI infrastructure for the reasoning era. As Huang Huang repeatedly emphasized, when physical AI requires continuous thinking, long-term operation, and real-world integration, the question is no longer just about whether the computing power is sufficient, but about who can truly build the entire system.

At CES 2026, NVIDIA has already provided an answer.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
  • Pin

Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)