E22: NVIDIA'S HUGE AI Announcements Will Change Everything

NVIDIA’s AI Infrastructure: Blackwell, Reuben, and Beyond – A Deep Dive

Key Concepts:

Blackwell: NVIDIA’s current generation data center architecture, focused on high performance and efficiency.
Reuben: The next generation data center architecture, emphasizing extreme co-design of six specialized chips for optimal performance.
GB300/MBL72: A specific rack configuration utilizing Blackwell Ultra, containing 72 GPUs.
Kyber: A future rack architecture based on Reuben Ultra, significantly increasing compute density.
Extreme Co-Design: The simultaneous design and manufacturing of multiple chips optimized for specific data center requirements.
Mixture of Experts (MoE): AI model architecture driving increased compute demand due to complex reasoning and token generation.
North-South Traffic: Data transfer between the compute rack and external storage.
East-West Traffic: Data transfer between multiple GPU racks.
DPU (Data Processing Unit): Handles data processing tasks like compression, encryption, and offloading from CPUs/GPUs.
NVLink: High-speed interconnect technology for communication between GPUs.
Telemetry: System health and status monitoring data.
Goodput: The actual productive time of a system, maximizing token generation.

1. Introduction & The Shift Beyond GPUs

The interview focuses on NVIDIA’s evolution beyond being solely a GPU manufacturer, highlighting its comprehensive AI infrastructure approach. Joe Delair, Product Lead of AI Infrastructure at NVIDIA, details the intricacies of their latest and upcoming technologies, including Blackwell, Reuben, and the future Kyber architecture. The discussion emphasizes the importance of understanding NVIDIA’s ecosystem for investors, particularly with the upcoming GTC conference. A giveaway is announced: attendees of any free online GTC session using a provided link can enter to win an NVIDIA RTX 5090 graphics card by submitting a screenshot as proof of attendance.

2. Reuben: Extreme Co-Design for Next-Gen AI

Jensen Huang’s keynote at a recent event detailed NVIDIA’s co-design of six distinct chips for the Reuben generation. This “extreme co-design” is the fundamental principle behind Reuben, driven by analyzing data center requirements and optimizing for performance, energy efficiency, and cost. The primary driver for these requirements is the increasing complexity of AI models, specifically Mixture of Experts (MoE) models, which generate significantly more tokens due to their advanced reasoning capabilities and growing model sizes. Reuben is specifically designed to address this escalating compute demand.

3. Blackwell vs. Reuben: Performance Gains

For inference workloads, Reuben is projected to deliver up to 10x better performance compared to Blackwell. This improvement isn’t simply a result of Moore’s Law (approximately 70% more transistors between generations) but stems from the holistic co-design approach. The 10x performance gain is measured at the rack scale, meaning across the entire system architecture.

4. The Blackwell Rack Architecture (GB300/MBL72)

The Blackwell Ultra generation compute tray consists of two “super chips,” each containing two Blackwell Ultra GPUs and one Gray CPU. Two such super chips comprise a single tray, resulting in four GPUs and two CPUs per tray. Crucially, the tray also includes ConnectX8 Supernics for high-speed networking. The rack utilizes a hybrid cooling system, employing liquid cooling for the super chips and air cooling for other components.

North-South Traffic: Managed by a Bluefield DPU, handling data transfer between the compute rack and storage.
East-West Traffic: Facilitated by ConnectX8 Supernics, connecting multiple racks.
Network Traffic Distinction: North-South traffic is within the same rack, while East-West traffic connects multiple racks.

5. The Role of CPUs, DPUs, and ConnectX

NVIDIA’s Grace CPU isn’t intended to replace GPUs but to complement them. CPUs handle tasks GPUs are less suited for, such as running applications generated by AI models (e.g., Python applications) and database analytics. The CPU and GPU work in tandem, leveraging each chip’s strengths.

The DPU (Bluefield DPU) offloads tasks like compression and encryption from the CPU and GPU, accelerating data access and improving overall performance. ConnectX8 Supernics manage East-West traffic, enabling high-speed communication between GPU racks.

6. The Six Chips of Reuben: A Detailed Breakdown

The six chips co-designed for Reuben are:

GPU: The core compute engine.
CPU: Handles management tasks and CPU-optimized workloads.
DPU (Bluefield DPU): Manages North-South traffic, compression, and encryption.
ConnectX8: Facilitates East-West connectivity with inline encryption.
MVLink Switch (MVLink 5): Provides high-speed (1.8 terabytes per second) communication between GPUs within a rack. Also performs collective operations (all-reduce) to accelerate training.
Spectrum X: Handles high-speed networking and photonics co-packaging.

7. Compute Fabric & The MVLink Switch

The interconnected network of GPUs, CPUs, and other components is referred to as a “compute fabric.” The MVLink switch is central to this fabric, enabling all-to-all connectivity between the 72 GPUs in a rack at a bandwidth of 1.8 terabytes per second. The switch also performs some computational tasks, such as collective operations, further accelerating processing.

8. Telemetry & Rack Management

A dedicated top-of-rack switch handles telemetry data – system health, status, and diagnostics – providing essential monitoring and management capabilities. This is separate from the high-bandwidth data processing network.

9. Reuben Ultra & The Kyber Architecture: Scaling to New Heights

Reuben Ultra, slated for 2026-2027, introduces a new rack architecture (Kyber) with significantly increased compute density. Kyber utilizes 18 compute blades per canister, with four canisters per rack, resulting in a total of 288 GPUs – a 4x increase compared to the 72 GPUs in a Blackwell/Reuben rack. This shift necessitates a blade-based architecture instead of the traditional tray-based design.

10. Photonics Co-Packaging & Future Efficiency Gains

A key innovation in Reuben is the integration of optics directly onto the chip (photonics co-packaging) through Spectrum X. This eliminates the need for separate fiber optic transceivers, resulting in significant improvements in energy efficiency and reliability (estimated 10x better reliability). This technology is also being applied to Infiniband connectivity.

11. The Future of Co-Design & Innovation

NVIDIA plans to continue its extreme co-design approach with each new generation of chips. While not every generation will involve the co-design of all six chips, the flagship releases like Reuben will maintain this strategy. The core driver of performance gains is the synergistic interaction of these co-designed chips, exceeding the limitations of traditional Moore’s Law scaling.

Conclusion:

NVIDIA is strategically positioning itself as a provider of complete AI infrastructure solutions, not just GPU manufacturers. The Reuben architecture, with its emphasis on extreme co-design, represents a significant leap forward in performance and efficiency. The future Kyber architecture promises even greater compute density and scalability. Understanding these advancements is crucial for investors seeking to capitalize on the rapidly evolving AI landscape. The company’s commitment to innovation and holistic system design will likely continue to drive its leadership in the AI hardware and software space.

E22: NVIDIA'S HUGE AI Announcements Will Change Everything

NVIDIA’s AI Infrastructure: Blackwell, Reuben, and Beyond – A Deep Dive

Chat with this Video

Related Videos

Ready to summarize another video?