NVIDIA's unveiling of the Blackwell architecture marks a pivotal moment in the evolution of artificial intelligence (AI) and accelerated computing. Named in honor of mathematician David Harold Blackwell, this new generation of GPUs is engineered to meet the escalating demands of AI workloads, offering unprecedented performance, efficiency, and scalability.
Above: (Nvidia Blackwell AI supercomputer)
At the heart of the Blackwell architecture is the GB200 GPU, a powerhouse containing 208 billion transistors. This massive transistor count is achieved by integrating two reticle-limited dies connected via a 10 terabytes per second (TB/s) chip-to-chip interconnect, enabling the GPU to function as a unified entity. Manufactured using a custom-built TSMC 4NP process, the GB200 delivers up to 20 petaflops of FP4 processing power, setting new standards in AI computation. A significant advancement within Blackwell is the second-generation Transformer Engine. This engine employs custom Tensor Core technology combined with NVIDIA's TensorRT-LLM and NeMo Framework innovations to accelerate both inference and training for large language models (LLMs) and Mixture-of-Experts (MoE) models. By optimizing neuron representation from eight bits to four, the engine effectively doubles computing capacity, bandwidth, and model size, facilitating the handling of increasingly complex AI models.
The GB200 GPU represents a monumental leap in both efficiency and performance. Compared to its predecessor, the H100, the GB200 delivers up to 30 times the performance in LLM inference workloads. This substantial increase is attributed to the GPU's enhanced architecture and the integration of advanced technologies. Moreover, the GB200 is designed to significantly reduce operational costs and energy consumption, aligning with the growing demand for sustainable and cost-effective computing solutions in AI.
To address the need for swift, seamless communication among GPUs within a server cluster, Blackwell introduces the fifth-generation NVIDIA NVLink interconnect. This technology can scale up to 576 GPUs, unleashing accelerated performance for trillion-parameter AI models. The NVIDIA NVLink Switch Chip enables 130TB/s of GPU bandwidth in a 72-GPU NVLink domain (NVL72) and delivers four times bandwidth efficiency with NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) FP8 support. This infrastructure is crucial for managing the ever-increasing complexity and scale of AI models, ensuring that NVIDIA stays ahead in the AI race.
Recognizing the importance of data security in AI applications, Blackwell incorporates NVIDIA Confidential Computing. This feature protects sensitive data and AI models from unauthorized access with strong hardware-based security. Blackwell is the first GPU in the industry to offer Trusted Execution Environment Input/Output (TEE-I/O) capabilities, providing a performant confidential compute solution with TEE-I/O capable hosts and inline protection over NVIDIA NVLink. This ensures that enterprises can secure even the largest models in a performant way, protecting AI intellectual property and enabling confidential AI training, inference, and federated learning.
NVIDIA's Blackwell architecture is set to power a range of new products. The GeForce RTX 50 Series Desktop and Laptop GPUs, designed for gamers, creators, and developers, are among the first consumer products to feature this architecture. The flagship RTX 5090 model, boasting 92 billion transistors and over 3,352 trillion AI operations per second (TOPS) of computing power, will be available on January 30, 2025, for $1,999. The RTX 5070 is slated for launch in February 2025 at $549. For developers and AI enthusiasts, NVIDIA introduced Project DIGITS, a $3,000 desktop computer powered by the new Blackwell chip. Set to launch in May 2025, this machine allows users to run AI models with up to 200 billion parameters locally, models that previously required expensive cloud infrastructure.
The introduction of Blackwell is poised to have a profound impact across various industries. NVIDIA has announced partnerships with major corporations to integrate Blackwell-powered solutions into their operations. For instance, Japanese automaker Toyota plans to build its next-generation autonomous vehicles using NVIDIA's DriveOS operating system, powered by Blackwell technology. Similarly, Aurora, a company specializing in autonomous shipping trucks, intends to launch its driverless trucks with NVIDIA's hardware commercially in April 2025. These collaborations underscore Blackwell's versatility and its potential to drive innovation in sectors ranging from automotive to logistics. By providing the computational power necessary for advanced AI applications, Blackwell enables companies to develop more sophisticated and efficient solutions, thereby accelerating the adoption of AI technologies across the board.
NVIDIA's Blackwell architecture represents a significant milestone in the field of AI and accelerated computing. With its groundbreaking performance, enhanced efficiency, robust security features, and scalable design, Blackwell is set to redefine the capabilities of AI hardware. As industries continue to integrate AI into their operations, NVIDIA's Blackwell architecture provides the foundation upon which the next generation of AI-driven innovations will be built.