3.0 KiB
| id | title | status | source_sections | related_topics | key_equations | key_terms | images | examples | open_questions |
|---|---|---|---|---|---|---|---|---|---|
| gb10-superchip | NVIDIA GB10 Grace Blackwell Superchip | established | Web research: NVIDIA newsroom, WCCFTech, Phoronix, The Register, Arm | [memory-and-storage ai-frameworks ai-workloads connectivity physical-specs] | [flops-fp4 nvlink-c2c-bandwidth] | [gb10 grace-blackwell superchip cortex-x925 cortex-a725 blackwell-gpu tensor-core cuda-core nvlink-c2c soc] | [] | [] | [Exact clock speeds for CPU and GPU dies under sustained load Detailed per-precision TFLOPS breakdown (FP4/FP8/FP16/FP32/FP64) Thermal throttling behavior and sustained vs. peak performance] |
NVIDIA GB10 Grace Blackwell Superchip
The GB10 is a system-on-a-chip (SoC) that combines an NVIDIA Grace CPU and an NVIDIA Blackwell GPU on a single package, connected via NVLink Chip-to-Chip (NVLink-C2C) interconnect. It is the core silicon in the Dell Pro Max GB10 and the NVIDIA DGX Spark.
1. Architecture Overview
The GB10 is composed of two distinct compute dies:
- CPU tile: Designed by MediaTek, based on ARM architecture v9.2
- GPU tile: Designed by NVIDIA, based on the Blackwell architecture
These are stitched together using TSMC's 2.5D advanced packaging technology and connected via NVIDIA's proprietary NVLink-C2C interconnect, which provides 600 GB/s of bidirectional bandwidth between the CPU and GPU dies.
2. CPU: Grace (ARM)
The Grace CPU portion contains 20 cores in a big.LITTLE-style configuration:
- 10x ARM Cortex-X925 — high-performance cores
- 10x ARM Cortex-A725 — efficiency cores
Architecture: ARMv9.2
This is the same Grace CPU lineage used in NVIDIA's data center Grace Hopper and Grace Blackwell products, adapted for desktop power envelopes.
3. GPU: Blackwell
The Blackwell GPU portion features:
- 6,144 CUDA cores (comparable to the RTX 5070 core count)
- 5th-generation Tensor Cores — optimized for AI inference and training
- Peak performance: 1 PFLOP (1,000 TFLOPS) at FP4 precision
The Tensor Cores are the key differentiator for AI workloads, providing hardware acceleration for mixed-precision matrix operations used in deep learning.
4. NVLink-C2C Interconnect
The CPU and GPU communicate via NVLink Chip-to-Chip:
- Bidirectional bandwidth: 600 GB/s
- Enables unified coherent memory — both CPU and GPU see the same 128GB LPDDR5X pool
- Eliminates the PCIe bottleneck found in traditional discrete GPU systems
This coherent memory architecture means there is no need to explicitly copy data between "host" and "device" memory, simplifying AI development workflows.
5. Power Envelope
- System TDP: ~140W (from related specifications)
- External PSU: 280W USB Type-C adapter (headroom for storage, networking, peripherals)
Key Relationships
- Provides compute for: ai-workloads, ai-frameworks
- Memory subsystem: memory-and-storage
- Housed in: physical-specs
- Connected externally via: connectivity
- Scales via: multi-unit-stacking