From 1a2e908a149ec458c9bfca627801a34868c91d8a Mon Sep 17 00:00:00 2001 From: Joe DiPrima Date: Sat, 14 Feb 2026 13:37:25 -0600 Subject: [PATCH] Initial knowledge base for Dell Pro Max GB10 expert agent Bootstrap expert agent context system with 12 topic files, glossary, equations/bounds reference, open questions tracker, worked example, and CLAUDE.md agent operating manual. Co-Authored-By: Claude Opus 4.6 --- .claude/settings.local.json | 18 ++ CLAUDE.md | 167 ++++++++++++++ context/ai-frameworks.md | 76 +++++++ context/ai-workloads.md | 78 +++++++ context/connectivity.md | 61 +++++ context/dgx-os-software.md | 81 +++++++ context/equations-and-bounds.md | 113 +++++++++ context/gb10-superchip.md | 72 ++++++ context/memory-and-storage.md | 50 ++++ context/multi-unit-stacking.md | 52 +++++ context/open-questions.md | 125 ++++++++++ context/physical-specs.md | 62 +++++ context/setup-and-config.md | 83 +++++++ context/skus-and-pricing.md | 62 +++++ examples/llm-memory-estimation.md | 48 ++++ phases/phase-01-initial-build.md | 48 ++++ reference/glossary.yaml | 366 ++++++++++++++++++++++++++++++ 17 files changed, 1562 insertions(+) create mode 100644 .claude/settings.local.json create mode 100644 CLAUDE.md create mode 100644 context/ai-frameworks.md create mode 100644 context/ai-workloads.md create mode 100644 context/connectivity.md create mode 100644 context/dgx-os-software.md create mode 100644 context/equations-and-bounds.md create mode 100644 context/gb10-superchip.md create mode 100644 context/memory-and-storage.md create mode 100644 context/multi-unit-stacking.md create mode 100644 context/open-questions.md create mode 100644 context/physical-specs.md create mode 100644 context/setup-and-config.md create mode 100644 context/skus-and-pricing.md create mode 100644 examples/llm-memory-estimation.md create mode 100644 phases/phase-01-initial-build.md create mode 100644 reference/glossary.yaml diff --git a/.claude/settings.local.json b/.claude/settings.local.json new file mode 100644 index 0000000..4adefbf --- /dev/null +++ b/.claude/settings.local.json @@ -0,0 +1,18 @@ +{ + "permissions": { + "allow": [ + "Bash", + "Edit", + "Read", + "Write", + "Glob", + "Grep", + "WebFetch", + "WebSearch", + "Skill(constraint-lookup)", + "Skill(phase-analysis)" + ], + "deny": [], + "ask": [] + } +} diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..a640c2b --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,167 @@ +# Dell Pro Max GB10 - Expert Knowledge Base + +**Project:** Domain expert agent for the Dell Pro Max with NVIDIA GB10 Grace Blackwell desktop AI system +**Format:** Linked context files (Markdown + YAML) with cross-references +**Status:** Active research + +## YOU ARE THE EXPERT AGENT + +**You (Claude) are the Dell Pro Max GB10 expert.** The `context/` files, `reference/glossary.yaml`, +`examples/`, and source materials are YOUR knowledge base. They exist so you can give accurate, +deeply-sourced answers to technical questions about the Dell Pro Max GB10 hardware, software, +configuration, AI development workflows, and troubleshooting. + +**ALWAYS consult the context system before answering any Dell Pro Max GB10 question or proposing +new ideas.** Do not rely on your training data alone — the context files contain curated, +cross-validated data that is more precise and more specific than general knowledge. + +--- + +## How to Answer a Question + +1. **Identify the topic(s).** Use the Quick Topic Lookup table (below) to determine + which context file(s) are relevant. Most questions touch 1-3 topics. + +2. **Read the relevant context file(s).** Each file in `context/` is a self-contained + deep dive on one topic. Read the full file — don't guess from the filename. + +3. **Follow cross-references.** Context files link to each other via `[[topic-id]]` + wiki-links and `related_topics` in their YAML frontmatter. If a question spans + topics, follow these links. + +4. **Check equations-and-bounds.md for numbers.** If the question involves a number, + formula, or physical bound, check here first. + +5. **Check glossary.yaml for definitions.** Use this when the user asks "what is X?" + or when you need to verify a term's meaning. + +6. **Check open-questions.md for known unknowns.** If the question touches something + uncertain, this file catalogs what is known vs. unknown. + +7. **Cite your sources.** Reference the specific context file and section. If data + came from external literature, include the citation. + +--- + +## Quick Topic Lookup + +| User asks about... | Read this file | +|---------------------------------------------------|-----------------------------------------| +| GB10 chip, Grace Blackwell, SoC, CPU, GPU cores | `context/gb10-superchip.md` | +| Memory, LPDDR5X, unified memory, bandwidth | `context/memory-and-storage.md` | +| SSD, NVMe, storage options, 2TB, 4TB | `context/memory-and-storage.md` | +| Ports, USB-C, HDMI, ethernet, QSFP, connectivity | `context/connectivity.md` | +| Network, 10GbE, ConnectX-7, SmartNIC, Wi-Fi 7 | `context/connectivity.md` | +| DGX OS, Ubuntu, Linux, OS setup, drivers | `context/dgx-os-software.md` | +| CUDA, PyTorch, NeMo, RAPIDS, AI frameworks | `context/ai-frameworks.md` | +| LLM, model inference, Llama, 200B parameters | `context/ai-workloads.md` | +| Stacking, multi-unit, ConnectX-7, 400B models | `context/multi-unit-stacking.md` | +| Physical size, dimensions, weight, form factor | `context/physical-specs.md` | +| Power, 280W adapter, TDP, thermals | `context/physical-specs.md` | +| Price, SKUs, configurations, purchasing | `context/skus-and-pricing.md` | +| Setup, first boot, initial config, wizard | `context/setup-and-config.md` | +| Troubleshooting, reinstall OS, recovery | `context/setup-and-config.md` | +| Formulas, bounds, constants, performance numbers | `context/equations-and-bounds.md` | +| What we don't know, gaps, unknowns | `context/open-questions.md` | +| Term definitions, units, acronyms | `reference/glossary.yaml` | +| Worked calculations, example workflows | `examples/*.md` | + +--- + +## How to Formulate New Ideas + +When the user asks you to reason about something novel: + +1. **Ground it in existing data.** Read relevant context files first. +2. **Check the bounds.** Verify reasoning doesn't violate known constraints + (e.g., memory limits, TFLOPS ceilings, power envelope). +3. **Cross-validate.** Multiple sources often cover the same quantity — use them as + cross-checks. +4. **Flag uncertainty honestly.** If reasoning depends on uncertain parameters, say so. +5. **Preserve new insights.** If reasoning produces a genuinely new finding, offer to + add it to the appropriate context file so it persists for future sessions. + +--- + +## Conventions (CRITICAL) + +- **Architecture is ARM, not x86.** The GB10 uses ARMv9.2 cores. Never assume x86 compatibility. +- **Memory is unified.** CPU and GPU share 128GB LPDDR5X — there is no separate VRAM pool. +- **OS is Linux only.** DGX OS 7 is based on Ubuntu 24.04. Windows is not supported. +- **Power is via USB-C.** The 280W adapter connects over USB Type-C, not a barrel jack or ATX PSU. +- **Units:** Use metric (mm, kg) for physical specs. Use binary (GB, TB) for memory/storage. +- **Model names:** "Dell Pro Max GB10" or "Dell Pro Max with GB10" — this is the Dell-branded product. "DGX Spark" is NVIDIA's own-brand equivalent using the same GB10 superchip. +- **TFLOPS figures:** 1 PFLOP (1,000 TFLOPS) is at FP4 precision. Always state the precision when quoting performance. + +## DO NOT + +- Do not assume x86 software compatibility — this is an ARM system +- Do not confuse the Dell Pro Max GB10 with Dell's other Pro Max desktops (which use Intel/AMD) +- Do not state the 1 PFLOP figure without specifying FP4 precision +- Do not assume Windows can be installed +- Do not confuse "unified memory" with "system RAM + VRAM" — it is a single shared pool +- Do not assume standard PCIe GPU upgrades are possible — the GPU is part of the SoC +- Do not quote bandwidth numbers without specifying the interface (NVLink-C2C, memory bus, network) + +--- + +## Evidence Tiers + +| Tier | Label | Meaning | +|------|---------------|------------------------------------------------------------| +| T0 | Spec Sheet | Official Dell/NVIDIA published specifications | +| T1 | Documented | In official manuals, user guides, or support articles | +| T2 | Benchmarked | Independent review measurements (Phoronix, etc.) | +| T3 | Inferred | Grounded reasoning from known specs, not directly tested | +| T4 | Speculative | Consistent with architecture but no confirming data | + +- Tag individual claims, not sections. One paragraph can mix tiers. +- A derivation inherits the highest (least certain) tier of its inputs. +- Mention the tier to the user when presenting T3 or T4 claims. + +--- + +## Key Concepts Quick Map + +``` +Dell Pro Max GB10 (product) + │ + ├── GB10 Superchip (SoC) ──── Grace CPU (ARM), Blackwell GPU, NVLink-C2C + │ │ + │ ├── Memory System ──── 128GB unified LPDDR5X, 273 GB/s + │ │ + │ └── AI Compute ──── 1 PFLOP FP4, Tensor Cores (5th gen), CUDA cores + │ │ + │ ├── AI Frameworks ──── PyTorch, NeMo, RAPIDS, CUDA + │ │ + │ └── AI Workloads ──── LLM inference (up to 200B), fine-tuning + │ + ├── Connectivity ──── USB-C, HDMI 2.1b, 10GbE, ConnectX-7 QSFP + │ │ + │ └── Multi-Unit Stacking ──── 2x units via ConnectX-7, up to 400B models + │ + ├── DGX OS 7 ──── Ubuntu 24.04, NVIDIA drivers, CUDA toolkit + │ + ├── Physical ──── 150x150x51mm, 1.31kg, 280W USB-C PSU + │ + └── SKUs ──── 2TB ($3,699) / 4TB ($3,999) +``` + +--- + +## How to Add Content + +- **New findings on existing topic:** Edit the relevant `context/*.md` file +- **New topic:** Create a new file in `context/`, add cross-references to related topics, and add a row to the Quick Topic Lookup table above +- **Split a topic:** When a context file exceeds ~500 lines, decompose into subtopics +- **New research phase:** Create a new file in `phases/` +- **New worked example:** Add to `examples/` +- **Archive, never delete:** Move superseded files to `_archive/` + +--- + +## History + +| Phase | Date | Summary | +|-------|------------|------------------------------------------------------| +| 1 | 2026-02-14 | Initial knowledge base created from web research | diff --git a/context/ai-frameworks.md b/context/ai-frameworks.md new file mode 100644 index 0000000..2397be4 --- /dev/null +++ b/context/ai-frameworks.md @@ -0,0 +1,76 @@ +--- +id: ai-frameworks +title: "AI Frameworks and Development Tools" +status: established +source_sections: "Web research: NVIDIA newsroom, Arm learning paths" +related_topics: [dgx-os-software, gb10-superchip, ai-workloads] +key_equations: [] +key_terms: [pytorch, nemo, rapids, cuda, ngc, jupyter, tensorrt, llama-cpp] +images: [] +examples: [] +open_questions: + - "TensorFlow support status on ARM GB10 (official vs. community)" + - "Full NGC catalog availability — which containers work on GB10?" + - "vLLM or other inference server support on ARM Blackwell" + - "JAX support status" +--- + +# AI Frameworks and Development Tools + +The Dell Pro Max GB10 supports a broad AI software ecosystem, pre-configured through DGX OS. + +## 1. Core Frameworks + +### PyTorch +- Primary deep learning framework +- ARM64-native builds available +- Full CUDA support on Blackwell GPU + +### NVIDIA NeMo +- Framework for fine-tuning and customizing large language models +- Supports supervised fine-tuning (SFT), RLHF, and other alignment techniques +- Optimized for NVIDIA hardware + +### NVIDIA RAPIDS +- GPU-accelerated data science libraries +- Includes cuDF (DataFrames), cuML (machine learning), cuGraph (graph analytics) +- Drop-in replacements for pandas, scikit-learn, and NetworkX + +## 2. Inference Tools + +### CUDA Toolkit +- Low-level GPU compute API +- Compiler (nvcc) for custom CUDA kernels +- Profiling and debugging tools + +### llama.cpp +- Quantized LLM inference engine +- ARM-optimized builds available for GB10 +- Supports GGUF model format +- Documented in [Arm Learning Path](https://learn.arm.com/learning-paths/laptops-and-desktops/dgx_spark_llamacpp/) + +### TensorRT (expected) +- NVIDIA's inference optimizer +- Blackwell architecture support expected + +## 3. Development Environment + +- **Jupyter Notebooks** — pre-installed for interactive development +- **Python** — system Python with AI/ML package ecosystem +- **NVIDIA NGC Catalog** — library of pre-trained models, containers, and SDKs +- **Containers** — Docker/container support for reproducible environments + +## 4. Software Compatibility Notes + +Since the GB10 is an ARM system: + +- All Python packages must have ARM64 wheels or be compilable from source +- Most popular ML libraries (PyTorch, NumPy, etc.) have ARM64 support +- Some niche packages may require building from source +- x86-only binary packages will not work + +## Key Relationships + +- Runs on: [[dgx-os-software]] +- Accelerated by: [[gb10-superchip]] +- Powers: [[ai-workloads]] diff --git a/context/ai-workloads.md b/context/ai-workloads.md new file mode 100644 index 0000000..2a694d7 --- /dev/null +++ b/context/ai-workloads.md @@ -0,0 +1,78 @@ +--- +id: ai-workloads +title: "AI Workloads and Model Capabilities" +status: established +source_sections: "Web research: NVIDIA newsroom, Dell product page, WCCFTech" +related_topics: [gb10-superchip, memory-and-storage, ai-frameworks, multi-unit-stacking] +key_equations: [model-memory-estimate] +key_terms: [llm, inference, fine-tuning, quantization, fp4, fp8, fp16, parameter-count] +images: [] +examples: [llm-memory-estimation.md] +open_questions: + - "Actual tokens/sec benchmarks for common models (Llama 3.3 70B, Mixtral, etc.)" + - "Maximum batch size for inference at various model sizes" + - "Fine-tuning performance — how long to SFT a 7B model on this hardware?" + - "Stable Diffusion / image generation performance" + - "Training from scratch — is it practical for any meaningful model size?" +--- + +# AI Workloads and Model Capabilities + +The Dell Pro Max GB10 is designed primarily for **local AI inference and fine-tuning**, bringing capabilities previously requiring cloud or data center hardware to a desktop form factor. + +## 1. Headline Capabilities + +- **Up to 200 billion parameter models** locally (with quantization) +- **1 PFLOP (1,000 TFLOPS)** at FP4 precision +- **Llama 3.3 70B** confirmed to run locally (single unit) +- **Up to 400B parameter models** with two-unit stacking (see [[multi-unit-stacking]]) + +## 2. Model Size vs. Memory + +With 128 GB of unified memory, the system can hold: + +| Precision | Bytes/Param | Max Params (approx) | Example Models | +|-----------|-------------|----------------------|---------------------------| +| FP4 | 0.5 B | ~200B+ | Large quantized models | +| FP8/INT8 | 1 B | ~100B | Llama 3.3 70B, Mixtral | +| FP16 | 2 B | ~50-55B | Medium models at full prec | +| FP32 | 4 B | ~25-28B | Small models, debugging | + +*Note: Actual usable capacity is less than 128 GB due to OS, KV cache, framework overhead, and activation memory. Estimates assume ~85-90% of memory available for model weights.* + +## 3. Primary Use Cases + +### Local LLM Inference +- Run large language models privately, no cloud dependency +- Interactive chat, code generation, document analysis +- Privacy-sensitive applications (medical, legal, financial) + +### Fine-Tuning +- Supervised fine-tuning (SFT) of models using NVIDIA NeMo +- LoRA/QLoRA for parameter-efficient fine-tuning of larger models +- Custom domain adaptation + +### AI Prototyping +- Rapid iteration on model architectures +- Dataset preprocessing with RAPIDS +- Experiment tracking and evaluation + +### Data Science +- GPU-accelerated analytics with RAPIDS +- Large-scale data processing +- Graph analytics + +## 4. Target Users + +- AI researchers and developers +- Privacy-conscious organizations +- Academic institutions +- AI prototyping teams +- Independent developers building AI applications + +## Key Relationships + +- Compute provided by: [[gb10-superchip]] +- Memory constraints: [[memory-and-storage]] +- Frameworks used: [[ai-frameworks]] +- Scaling beyond single unit: [[multi-unit-stacking]] diff --git a/context/connectivity.md b/context/connectivity.md new file mode 100644 index 0000000..679a134 --- /dev/null +++ b/context/connectivity.md @@ -0,0 +1,61 @@ +--- +id: connectivity +title: "Connectivity and Networking" +status: established +source_sections: "Web research: Dell product page, WCCFTech, Phoronix" +related_topics: [gb10-superchip, multi-unit-stacking, physical-specs, setup-and-config] +key_equations: [] +key_terms: [usb-c, hdmi, connectx-7, smartnic, qsfp, wifi-7, bluetooth, displayport-alt-mode, 10gbe] +images: [] +examples: [] +open_questions: + - "Which USB-C ports support DisplayPort Alt Mode (all or specific ones)?" + - "Maximum display resolution and refresh rate via HDMI 2.1b and DP Alt Mode" + - "Can the QSFP ports be used for general networking or only for multi-unit stacking?" +--- + +# Connectivity and Networking + +The Dell Pro Max GB10 provides extensive I/O for a system of its size, including high-speed networking for multi-unit configurations. + +## 1. USB Ports + +- **1x USB Type-C (20 Gbps)** — power input port (280W adapter connects here) +- **3x USB Type-C (20 Gbps)** — general purpose +- USB-C ports support **DisplayPort Alt Mode** for display output + +## 2. Display Output + +- **1x HDMI 2.1b** — dedicated display output +- **USB-C DisplayPort Alt Mode** — additional display(s) via USB-C + +## 3. Wired Networking + +- **1x 10 GbE Ethernet** (RJ45) — standard network connectivity +- **2x QSFP 200 Gbps ports** — via NVIDIA ConnectX-7 SmartNIC + - Each port supports 200 Gbps + - Primary use: [[multi-unit-stacking]] for scaling to 2-unit configurations + - Based on ConnectX-7 SmartNIC technology + +## 4. Wireless + +- **Wi-Fi 7** (IEEE 802.11be) +- **Bluetooth 5.4** + +## 5. Port Summary Table + +| Port | Count | Speed/Spec | Notes | +|--------------------|-------|----------------|--------------------------| +| USB-C (power) | 1 | 20 Gbps | 280W power delivery | +| USB-C (data) | 3 | 20 Gbps | DP Alt Mode supported | +| HDMI | 1 | 2.1b | Display output | +| RJ45 Ethernet | 1 | 10 GbE | Standard networking | +| QSFP | 2 | 200 Gbps each | ConnectX-7 SmartNIC | +| Wi-Fi | 1 | Wi-Fi 7 | 802.11be | +| Bluetooth | 1 | 5.4 | Integrated | + +## Key Relationships + +- Enables: [[multi-unit-stacking]] +- Setup guide: [[setup-and-config]] +- Physical port locations: [[physical-specs]] diff --git a/context/dgx-os-software.md b/context/dgx-os-software.md new file mode 100644 index 0000000..18cd1ad --- /dev/null +++ b/context/dgx-os-software.md @@ -0,0 +1,81 @@ +--- +id: dgx-os-software +title: "DGX OS and System Software" +status: established +source_sections: "Web research: NVIDIA DGX OS 7 User Guide, Dell support articles, Phoronix" +related_topics: [ai-frameworks, setup-and-config, gb10-superchip] +key_equations: [] +key_terms: [dgx-os, ubuntu, cuda, nvidia-driver, dgx-spark, kernel] +images: [] +examples: [] +open_questions: + - "Can a stock Ubuntu 24.04 ARM be installed instead of DGX OS?" + - "Full list of pre-installed NVIDIA packages and versions" + - "OTA update mechanism and cadence for DGX OS" + - "Does DGX OS include Docker/container runtime by default?" +--- + +# DGX OS and System Software + +The Dell Pro Max GB10 ships with NVIDIA DGX OS 7, a purpose-built Linux distribution for AI development. + +## 1. DGX OS 7 Overview + +- **Base:** Ubuntu 24.04 LTS (Noble Numbat) +- **Kernel:** Linux 6.8 +- **Architecture:** ARM64 (aarch64) +- **NVIDIA branding:** Also called "DGX OS for DGX Spark" + +DGX OS is not a separate distribution — it is Ubuntu 24.04 with NVIDIA's customizations layered on top: + +- Pre-configured NVIDIA GPU drivers +- CUDA toolkit and libraries +- Platform-specific optimizations and configurations +- Diagnostic and monitoring tools +- System-specific firmware management + +## 2. Pre-installed Software Stack + +The system ships ready to run AI workloads with: + +- **CUDA toolkit** — GPU compute API and compiler +- **NVIDIA drivers** — optimized for GB10 Blackwell GPU +- **Python** — system Python plus development environments +- **GCC** — ARM-native compiler toolchain +- **OpenJDK** — Java runtime +- **Jupyter notebooks** — interactive development environment + +For AI frameworks, see [[ai-frameworks]]. + +## 3. First Boot and Setup + +DGX OS uses a **setup wizard** on first boot that handles: + +- User account creation +- Network configuration +- System preferences +- Software configuration + +The process is designed for fast onboarding. See [[setup-and-config]] for detailed walkthrough. + +## 4. OS Reinstallation + +Dell provides a documented process for reinstalling DGX OS: + +- Boot to GRUB menu +- Select "Install DGX OS 7.2.1 for DGX Spark" from DGX Spark Installation Options +- Installation takes approximately **25-30 minutes** + +Source: [Dell Support KB Article](https://www.dell.com/support/kbdoc/en-us/000382042/how-to-reinstall-the-nvidia-dgx-operating-system-on-dell-pro-max-with-grace-blackwell-systems) + +## 5. Important Notes + +- **ARM-only:** All software must be ARM64/aarch64 compatible. x86 binaries will not run natively. +- **No Windows:** This system does not support Windows installation. +- **Package management:** Standard Ubuntu `apt` package manager, plus NVIDIA's own repositories. + +## Key Relationships + +- Runs on: [[gb10-superchip]] +- Provides platform for: [[ai-frameworks]] +- Setup process: [[setup-and-config]] diff --git a/context/equations-and-bounds.md b/context/equations-and-bounds.md new file mode 100644 index 0000000..471203a --- /dev/null +++ b/context/equations-and-bounds.md @@ -0,0 +1,113 @@ +--- +id: equations-and-bounds +title: "Equations and Bounds" +status: established +source_sections: "Derived from context files and official specifications" +related_topics: [gb10-superchip, memory-and-storage, ai-workloads, connectivity] +key_equations: [flops-fp4, memory-bandwidth, model-memory-estimate, nvlink-c2c-bandwidth, storage-throughput] +key_terms: [tflops, pflop, bandwidth, throughput, fp4, fp8, fp16, fp32] +images: [] +examples: [llm-memory-estimation.md] +open_questions: + - "Sustained vs. peak TFLOPS under real workloads" + - "Actual memory bandwidth under mixed CPU+GPU access patterns" +--- + +# Equations and Bounds + +Reference for all quantitative specifications, formulas, and validation ranges for the Dell Pro Max GB10. + +## 1. Compute Performance + +### Peak TFLOPS by Precision + +| Precision | Peak TFLOPS | Source | Notes | +|-----------|-------------|----------|------------------------------------| +| FP4 | 1,000 | T0 Spec | Headline figure, 1 PFLOP | +| FP8 | ~500 | T3 Infer | Typical 2:1 ratio from FP4 | +| FP16 | ~250 | T3 Infer | Typical 4:1 ratio from FP4 | +| FP32 | ~125 | T3 Infer | Typical 8:1 ratio from FP4 | + +*Note: FP8/FP16/FP32 values are inferred from typical Blackwell architecture ratios. Actual values not yet independently confirmed.* + +### GPU Cores +- **CUDA cores:** 6,144 (T0 Spec) +- **Tensor Cores:** 5th generation (count TBD) + +## 2. Memory + +### Bandwidth +- **Memory bandwidth:** 273 GB/s (T0 Spec, LPDDR5X at 9,400 MT/s) +- **NVLink-C2C bandwidth:** 600 GB/s bidirectional (T0 Spec, CPU-GPU interconnect) + +### Capacity +- **Total unified memory:** 128 GB LPDDR5X (T0 Spec) +- **Usable for models:** ~109-115 GB (T3 Infer, after OS/framework/KV cache overhead) + +## 3. Model Memory Estimation + +### Formula: Memory Required for Model Weights + +``` +Memory (GB) = Parameters (billions) × Bytes_per_parameter +``` + +| Precision | Bytes/Param | Formula | +|-----------|-------------|-----------------------------------| +| FP4 | 0.5 | Params_B × 0.5 | +| FP8/INT8 | 1.0 | Params_B × 1.0 | +| FP16 | 2.0 | Params_B × 2.0 | +| FP32 | 4.0 | Params_B × 4.0 | + +### Total Inference Memory (approximate) + +``` +Total Memory ≈ Model_Weights + KV_Cache + Activation_Memory + Framework_Overhead +``` + +Rule of thumb: budget **1.2-1.5x** the raw model weight size for total inference memory. + +### Maximum Model Sizes (single unit, 128 GB) + +| Precision | Max Params (raw) | Max Params (practical, ~110 GB usable) | +|-----------|-------------------|----------------------------------------| +| FP4 | 256B | ~200B | +| FP8/INT8 | 128B | ~100B | +| FP16 | 64B | ~55B | +| FP32 | 32B | ~27B | + +## 4. Networking Bounds + +| Interface | Bandwidth | Direction | +|---------------------|--------------------|-----------------| +| NVLink-C2C | 600 GB/s | Bidirectional | +| LPDDR5X memory | 273 GB/s | System memory | +| QSFP (per port) | 200 Gbps (25 GB/s) | Network | +| QSFP (total) | 400 Gbps (50 GB/s) | 2 ports combined| +| 10 GbE Ethernet | 10 Gbps (1.25 GB/s)| Network | +| USB-C (per port) | 20 Gbps (2.5 GB/s) | I/O | + +## 5. Power Bounds + +| Parameter | Value | +|---------------------|---------| +| PSU rating | 280W | +| System TDP | ~140W | +| Power delivery | USB-C PD| + +## 6. Physical Bounds + +| Parameter | Value | +|---------------|---------------| +| Volume | ~1.15 L | +| Weight | 1.31 kg | +| Footprint | 150 × 150 mm | +| Height | 51 mm | + +## 7. Validation Rules + +When checking calculations: +- Model size estimates should not exceed 128 GB (single) or 256 GB (stacked) +- TFLOPS claims must specify precision — reject unqualified "1 PFLOP" statements +- Memory bandwidth (273 GB/s) is the system memory bus, NOT the NVLink-C2C (600 GB/s) +- Network bandwidth (QSFP) is in Gbps, not GB/s — divide by 8 for bytes diff --git a/context/gb10-superchip.md b/context/gb10-superchip.md new file mode 100644 index 0000000..dbfa281 --- /dev/null +++ b/context/gb10-superchip.md @@ -0,0 +1,72 @@ +--- +id: gb10-superchip +title: "NVIDIA GB10 Grace Blackwell Superchip" +status: established +source_sections: "Web research: NVIDIA newsroom, WCCFTech, Phoronix, The Register, Arm" +related_topics: [memory-and-storage, ai-frameworks, ai-workloads, connectivity, physical-specs] +key_equations: [flops-fp4, nvlink-c2c-bandwidth] +key_terms: [gb10, grace-blackwell, superchip, cortex-x925, cortex-a725, blackwell-gpu, tensor-core, cuda-core, nvlink-c2c, soc] +images: [] +examples: [] +open_questions: + - "Exact clock speeds for CPU and GPU dies under sustained load" + - "Detailed per-precision TFLOPS breakdown (FP4/FP8/FP16/FP32/FP64)" + - "Thermal throttling behavior and sustained vs. peak performance" +--- + +# NVIDIA GB10 Grace Blackwell Superchip + +The GB10 is a system-on-a-chip (SoC) that combines an NVIDIA Grace CPU and an NVIDIA Blackwell GPU on a single package, connected via NVLink Chip-to-Chip (NVLink-C2C) interconnect. It is the core silicon in the Dell Pro Max GB10 and the NVIDIA DGX Spark. + +## 1. Architecture Overview + +The GB10 is composed of two distinct compute dies: + +- **CPU tile:** Designed by MediaTek, based on ARM architecture v9.2 +- **GPU tile:** Designed by NVIDIA, based on the Blackwell architecture + +These are stitched together using TSMC's 2.5D advanced packaging technology and connected via NVIDIA's proprietary NVLink-C2C interconnect, which provides **600 GB/s of bidirectional bandwidth** between the CPU and GPU dies. + +## 2. CPU: Grace (ARM) + +The Grace CPU portion contains **20 cores** in a big.LITTLE-style configuration: + +- **10x ARM Cortex-X925** — high-performance cores +- **10x ARM Cortex-A725** — efficiency cores + +Architecture: ARMv9.2 + +This is the same Grace CPU lineage used in NVIDIA's data center Grace Hopper and Grace Blackwell products, adapted for desktop power envelopes. + +## 3. GPU: Blackwell + +The Blackwell GPU portion features: + +- **6,144 CUDA cores** (comparable to the RTX 5070 core count) +- **5th-generation Tensor Cores** — optimized for AI inference and training +- Peak performance: **1 PFLOP (1,000 TFLOPS) at FP4 precision** + +The Tensor Cores are the key differentiator for AI workloads, providing hardware acceleration for mixed-precision matrix operations used in deep learning. + +## 4. NVLink-C2C Interconnect + +The CPU and GPU communicate via NVLink Chip-to-Chip: + +- **Bidirectional bandwidth:** 600 GB/s +- Enables **unified coherent memory** — both CPU and GPU see the same 128GB LPDDR5X pool +- Eliminates the PCIe bottleneck found in traditional discrete GPU systems + +This coherent memory architecture means there is no need to explicitly copy data between "host" and "device" memory, simplifying AI development workflows. + +## 5. Power Envelope + +- **System TDP:** ~140W (from related specifications) +- **External PSU:** 280W USB Type-C adapter (headroom for storage, networking, peripherals) + +## Key Relationships + +- Provides compute for: [[ai-workloads]], [[ai-frameworks]] +- Memory subsystem: [[memory-and-storage]] +- Housed in: [[physical-specs]] +- Connected externally via: [[connectivity]] +- Scales via: [[multi-unit-stacking]] diff --git a/context/memory-and-storage.md b/context/memory-and-storage.md new file mode 100644 index 0000000..c923e5a --- /dev/null +++ b/context/memory-and-storage.md @@ -0,0 +1,50 @@ +--- +id: memory-and-storage +title: "Memory and Storage" +status: established +source_sections: "Web research: Dell product page, WCCFTech, Phoronix" +related_topics: [gb10-superchip, ai-workloads, skus-and-pricing] +key_equations: [memory-bandwidth, storage-throughput] +key_terms: [lpddr5x, unified-memory, nvme, pcie-gen4, sed] +images: [] +examples: [] +open_questions: + - "Is the M.2 SSD user-replaceable or soldered?" + - "Exact sequential and random IOPS for the included NVMe drives" + - "Memory channel configuration (number of channels)" +--- + +# Memory and Storage + +The Dell Pro Max GB10 features a unified memory architecture and NVMe solid-state storage. + +## 1. System Memory + +- **Capacity:** 128 GB LPDDR5X +- **Speed:** Up to 9,400 MT/s (megatransfers per second) +- **Bandwidth:** 273 GB/s +- **Architecture:** Unified coherent memory shared between CPU and GPU via [[gb10-superchip|NVLink-C2C]] + +### Unified Memory Model + +Unlike traditional desktop systems with separate system RAM and GPU VRAM, the GB10's memory is a **single shared pool**. Both the Grace CPU and Blackwell GPU access the same 128 GB with full cache coherence. This means: + +- No PCIe transfer bottleneck between CPU and GPU memory +- AI models up to ~200B parameters can fit in memory (with quantization) +- Frameworks see the full 128 GB as available device memory + +The LPDDR5X is likely soldered to the SoC package (not user-upgradeable), consistent with the compact form factor. + +## 2. Storage + +- **Interface:** PCIe Gen 4 M.2 NVMe +- **Options:** 2 TB or 4 TB +- **SED-ready:** Self-Encrypting Drive support available on 4 TB option + +Storage configurations map to SKU pricing — see [[skus-and-pricing]]. + +## Key Relationships + +- Accessed by: [[gb10-superchip]] +- Determines model capacity: [[ai-workloads]] +- SKU differentiation: [[skus-and-pricing]] diff --git a/context/multi-unit-stacking.md b/context/multi-unit-stacking.md new file mode 100644 index 0000000..d08cf2b --- /dev/null +++ b/context/multi-unit-stacking.md @@ -0,0 +1,52 @@ +--- +id: multi-unit-stacking +title: "Multi-Unit Stacking" +status: provisional +source_sections: "Web research: WCCFTech, NVIDIA newsroom" +related_topics: [connectivity, gb10-superchip, ai-workloads, memory-and-storage] +key_equations: [] +key_terms: [connectx-7, smartnic, qsfp, stacking, nvlink] +images: [] +examples: [] +open_questions: + - "Exact cable/interconnect required between units (QSFP type, length limits)" + - "Software configuration steps for multi-unit mode" + - "Performance overhead of inter-unit communication vs. single unit" + - "Does stacking appear as a single device to frameworks or require explicit multi-node code?" + - "Can more than 2 units be stacked?" +--- + +# Multi-Unit Stacking + +Two Dell Pro Max GB10 units can be connected together to create a more powerful combined system, effectively doubling the available compute and memory. + +## 1. How It Works + +Each Dell Pro Max GB10 has **2x QSFP 200 Gbps ports** powered by the NVIDIA ConnectX-7 SmartNIC. These ports enable direct unit-to-unit connection: + +- **Combined memory:** 256 GB unified (128 GB per unit) +- **Combined compute:** 2 PFLOP FP4 (1 PFLOP per unit) +- **Interconnect bandwidth:** Up to 400 Gbps (2x 200 Gbps QSFP) + +## 2. Model Capacity + +| Configuration | Memory | Max Model Size (approx) | +|---------------|---------|-------------------------| +| Single unit | 128 GB | ~200B parameters (FP4) | +| Dual stacked | 256 GB | ~400B parameters (FP4) | + +This enables running models like **Llama 3.1 405B** (with quantization) that would not fit in a single unit's memory. + +## 3. Physical Configuration + +The compact form factor (150x150x51mm per unit) is designed to be **stackable** — two units can sit on top of each other on a desk, connected via short QSFP cables. + +## 4. Open Areas + +This feature is one of the less-documented aspects of the system. Key unknowns include the exact software configuration, whether it presents as a single logical device, and inter-node communication overhead. See open questions in frontmatter. + +## Key Relationships + +- Connected via: [[connectivity]] (QSFP/ConnectX-7 ports) +- Extends capacity of: [[ai-workloads]] +- Doubles resources from: [[gb10-superchip]], [[memory-and-storage]] diff --git a/context/open-questions.md b/context/open-questions.md new file mode 100644 index 0000000..84567ca --- /dev/null +++ b/context/open-questions.md @@ -0,0 +1,125 @@ +--- +id: open-questions +title: "Open Questions" +status: active +source_sections: "Aggregated from all context files" +related_topics: [gb10-superchip, memory-and-storage, connectivity, dgx-os-software, ai-frameworks, ai-workloads, multi-unit-stacking, physical-specs, setup-and-config, skus-and-pricing] +--- + +# Open Questions + +Catalog of known unknowns, research gaps, and unresolved questions about the Dell Pro Max GB10. + +## Hardware + +### GB10 Superchip +- **Q:** What are the exact clock speeds for CPU and GPU dies under sustained load? + - *Status:* Unknown. No official boost/base clocks published. + - *Would resolve:* Performance prediction, thermal modeling +- **Q:** What is the detailed per-precision TFLOPS breakdown (FP4/FP8/FP16/FP32/FP64)? + - *Status:* Only FP4 (1,000 TFLOPS) is officially published. Others are inferred. + - *Would resolve:* Accurate workload performance estimation +- **Q:** What is the thermal throttling behavior? + - *Status:* Unknown. Sustained vs. peak performance delta not documented. + - *Would resolve:* Real-world performance expectations + +### Memory +- **Q:** Is the LPDDR5X soldered or socketed? + - *Status:* Almost certainly soldered (given LPDDR5X and form factor), but not confirmed. + - *Would resolve:* Upgradeability +- **Q:** What is the memory channel configuration? + - *Status:* Unknown. Number of channels not published. + - *Would resolve:* Memory performance modeling + +### Storage +- **Q:** Is the M.2 SSD user-replaceable? + - *Status:* Unknown. Owner's manual may clarify. + - *Would resolve:* Storage upgrade path +- **Q:** What are the exact sequential and random IOPS? + - *Status:* Unknown. Drive model not publicly identified. + - *Would resolve:* Storage performance expectations + +## Software + +### DGX OS +- **Q:** Can stock Ubuntu 24.04 ARM be installed instead of DGX OS? + - *Status:* Likely possible but unsupported. Not documented. + - *Would resolve:* OS flexibility +- **Q:** Full list of pre-installed NVIDIA packages and versions? + - *Status:* Partially known. Full manifest not published. + - *Would resolve:* Development environment baseline +- **Q:** Does DGX OS include Docker/container runtime by default? + - *Status:* Unknown. + - *Would resolve:* Container workflow setup +- **Q:** OTA update mechanism and cadence? + - *Status:* Unknown. + - *Would resolve:* Maintenance planning + +### AI Frameworks +- **Q:** TensorFlow support status on ARM GB10? + - *Status:* Unknown. Official vs. community builds unclear. + - *Would resolve:* Framework selection for TF users +- **Q:** Full NGC catalog availability for GB10? + - *Status:* Unknown. Which containers have ARM builds. + - *Would resolve:* Software ecosystem breadth +- **Q:** vLLM or other inference server support on ARM Blackwell? + - *Status:* Unknown. + - *Would resolve:* Production inference deployment options +- **Q:** JAX support status? + - *Status:* Unknown. + - *Would resolve:* Framework selection for JAX users + +## Networking / Multi-Unit + +- **Q:** What cable/interconnect is required for multi-unit stacking? + - *Status:* QSFP cables, but exact type/spec not documented. + - *Would resolve:* Multi-unit setup purchasing +- **Q:** Software configuration steps for multi-unit mode? + - *Status:* Not documented publicly. + - *Would resolve:* Multi-unit deployment +- **Q:** Does stacking appear as a single logical device to frameworks? + - *Status:* Unknown. May require explicit multi-node code. + - *Would resolve:* Development complexity for stacked setups +- **Q:** Can more than 2 units be stacked? + - *Status:* Only 2-unit configuration documented. + - *Would resolve:* Maximum scaling potential +- **Q:** Can QSFP ports be used for general networking? + - *Status:* Unknown. May be reserved for stacking. + - *Would resolve:* Network architecture options + +## Physical / Environmental + +- **Q:** Noise levels under load? + - *Status:* No dB measurements published. + - *Would resolve:* Office/desk suitability +- **Q:** Operating temperature range? + - *Status:* Unknown. + - *Would resolve:* Deployment environment requirements +- **Q:** VESA mount compatibility? + - *Status:* Unknown. + - *Would resolve:* Mounting options +- **Q:** Cooling solution details (fan count, heatsink type)? + - *Status:* Unknown. + - *Would resolve:* Thermal management understanding + +## Performance Benchmarks + +- **Q:** Actual tokens/sec for common LLMs (Llama 3.3 70B, Mixtral, etc.)? + - *Status:* No published benchmarks from Dell or independent reviewers yet. + - *Would resolve:* Real-world inference performance expectations +- **Q:** Fine-tuning time estimates for common model sizes? + - *Status:* Unknown. + - *Would resolve:* Training workflow planning +- **Q:** Stable Diffusion / image generation performance? + - *Status:* Unknown. + - *Would resolve:* Non-LLM AI workload suitability + +--- + +## Resolved Questions + +*(Move questions here as they get answered, with date and resolution)* + +| Date | Question | Resolution | Source | +|------|----------|------------|--------| +| — | — | — | — | diff --git a/context/physical-specs.md b/context/physical-specs.md new file mode 100644 index 0000000..e2ae708 --- /dev/null +++ b/context/physical-specs.md @@ -0,0 +1,62 @@ +--- +id: physical-specs +title: "Physical Specifications" +status: established +source_sections: "Web research: Dell product page, WCCFTech" +related_topics: [connectivity, gb10-superchip, skus-and-pricing] +key_equations: [volume-calculation] +key_terms: [form-factor, micro-desktop, usb-c-psu, tdp] +images: [] +examples: [] +open_questions: + - "Noise levels under load (dB)" + - "Operating temperature range" + - "VESA mount compatibility" + - "Cooling solution details (fan count, heatsink type)" +--- + +# Physical Specifications + +The Dell Pro Max GB10 is an ultra-compact mini desktop designed to sit on or near a desk. + +## 1. Dimensions and Weight + +| Spec | Value | +|---------------|----------------------------| +| Width | 150 mm (5.9 in) | +| Depth | 150 mm (5.9 in) | +| Height | 51 mm (2.0 in) | +| Volume | ~1.15 liters | +| Weight | 1.31 kg (2.89 lbs) base | + +For reference, the footprint is roughly the size of a large coaster or small book. + +## 2. Power Supply + +- **External adapter:** 280W USB Type-C +- **Connection:** USB-C power delivery +- **System TDP:** ~140W + +The PSU is external, keeping the unit itself compact and cool. The 280W rating provides headroom beyond the ~140W system TDP for peripherals, storage, and networking. + +## 3. Form Factor + +- **Classification:** Micro desktop / Mini PC +- **Design:** Stackable (for [[multi-unit-stacking]]) +- **Chassis:** Compact rectangular enclosure + +## 4. Scale Comparison + +| Compared to... | Dell Pro Max GB10 | +|-------------------------|----------------------------| +| Mac Mini M4 Pro | Similar footprint, thinner | +| NVIDIA DGX Spark | Identical hardware | +| Traditional desktop | ~20x smaller by volume | +| Laptop | Comparable weight | + +## Key Relationships + +- Houses: [[gb10-superchip]] +- External ports: [[connectivity]] +- Stacking design: [[multi-unit-stacking]] +- Pricing: [[skus-and-pricing]] diff --git a/context/setup-and-config.md b/context/setup-and-config.md new file mode 100644 index 0000000..22e6643 --- /dev/null +++ b/context/setup-and-config.md @@ -0,0 +1,83 @@ +--- +id: setup-and-config +title: "Setup and Configuration" +status: provisional +source_sections: "Web research: NVIDIA DGX OS 7 User Guide, Dell support KB" +related_topics: [dgx-os-software, connectivity, physical-specs] +key_equations: [] +key_terms: [first-boot, setup-wizard, grub, reinstall, dgx-os] +images: [] +examples: [] +open_questions: + - "Full first-boot wizard steps with screenshots" + - "BIOS/firmware update procedure" + - "Network boot (PXE) capabilities" + - "Remote management / BMC / IPMI availability" + - "Factory reset procedure beyond OS reinstall" +--- + +# Setup and Configuration + +Guide for initial setup, configuration, and recovery of the Dell Pro Max GB10. + +## 1. Initial Setup (First Boot) + +### Physical Setup +1. Place the unit on a stable surface (stackable design allows multiple units) +2. Connect the **280W USB-C power adapter** to the designated power USB-C port +3. Connect a display via **HDMI 2.1b** or **USB-C DisplayPort Alt Mode** +4. Connect keyboard and mouse (USB-C or Bluetooth) +5. Optionally connect **10GbE Ethernet** for wired networking + +### First Boot Wizard +On first power-on, DGX OS presents a setup wizard: +1. Language and locale selection +2. User account creation +3. Network configuration (Wi-Fi 7 or Ethernet) +4. System preferences +5. Software configuration + +The wizard is designed for fast onboarding — the system is ready to use shortly after. + +## 2. OS Reinstallation + +If you need to reinstall DGX OS from scratch: + +1. Power on or reboot the system +2. Access the **GRUB boot menu** +3. Navigate to **DGX Spark Installation Options** +4. Select **"Install DGX OS 7.2.1 for DGX Spark"** +5. Follow on-screen prompts +6. Installation takes approximately **25-30 minutes** + +Source: [Dell Support — How to Reinstall DGX OS](https://www.dell.com/support/kbdoc/en-us/000382042/how-to-reinstall-the-nvidia-dgx-operating-system-on-dell-pro-max-with-grace-blackwell-systems) + +## 3. Post-Setup Configuration + +### Recommended Steps +- Update DGX OS packages: `sudo apt update && sudo apt upgrade` +- Verify GPU is detected: `nvidia-smi` +- Verify CUDA toolkit: `nvcc --version` +- Configure SSH for remote access +- Set up development environment (Jupyter, conda/venv, etc.) + +### Network Configuration +- **Wi-Fi 7:** Configure via Network Manager or `nmcli` +- **10GbE Ethernet:** Auto-configured via DHCP or manual static IP +- **QSFP ports:** For [[multi-unit-stacking]] configuration + +## 4. Troubleshooting + +| Symptom | Check | +|-----------------------------|----------------------------------------------| +| No display output | Try both HDMI and USB-C DP Alt Mode | +| GPU not detected | Run `nvidia-smi`, check driver installation | +| Network not connecting | Verify cable/Wi-Fi config, run `ip addr` | +| System won't boot | Access GRUB menu, try OS reinstall | +| Slow AI performance | Check `nvidia-smi` for thermal throttling | + +## Key Relationships + +- Operating system: [[dgx-os-software]] +- Physical ports: [[connectivity]] +- Hardware: [[physical-specs]] diff --git a/context/skus-and-pricing.md b/context/skus-and-pricing.md new file mode 100644 index 0000000..a0f1665 --- /dev/null +++ b/context/skus-and-pricing.md @@ -0,0 +1,62 @@ +--- +id: skus-and-pricing +title: "SKUs and Pricing" +status: established +source_sections: "Web research: Dell product page, WCCFTech, Phoronix" +related_topics: [memory-and-storage, physical-specs] +key_equations: [] +key_terms: [fcm1253, sku] +images: [] +examples: [] +open_questions: + - "Are there additional SKU variants beyond 2TB/4TB?" + - "Enterprise/volume pricing" + - "Warranty and support tiers available" + - "Availability by region" +--- + +# SKUs and Pricing + +The Dell Pro Max GB10 is available in two primary storage configurations. + +## 1. Available Models + +| Model | Storage | SED | Price (USD) | +|-------------------|---------|------|-------------| +| FCM1253 (2TB) | 2 TB | No | $3,699 | +| FCM1253 (4TB) | 4 TB | Yes | $3,999 | + +Both models share identical compute and memory specifications: + +- NVIDIA GB10 Superchip +- 128 GB LPDDR5X +- All connectivity options + +The only differentiator between SKUs is storage capacity and SED (Self-Encrypting Drive) support. + +## 2. Model Number + +- **Dell model identifier:** Dell Pro Max FCM1253 +- **Form factor designation:** Micro + +## 3. Release Timeline + +- **Announced:** CES 2025 (as NVIDIA Project DIGITS) +- **Available:** October 15, 2025 +- **Current status:** Shipping + +## 4. Competitive Positioning + +| Product | Price | Memory | AI Compute | +|---------------------------|--------|--------|----------------| +| Dell Pro Max GB10 (2TB) | $3,699 | 128 GB | 1 PFLOP FP4 | +| Dell Pro Max GB10 (4TB) | $3,999 | 128 GB | 1 PFLOP FP4 | +| NVIDIA DGX Spark | $2,999 | 128 GB | 1 PFLOP FP4 | +| Mac Studio M4 Ultra | $3,999 | 192 GB | ~55 TOPS (ANE) | + +*Note: The NVIDIA DGX Spark uses the same GB10 hardware at a lower price point. The Dell version adds Dell's enterprise support, warranty, and supply chain.* + +## Key Relationships + +- Storage options: [[memory-and-storage]] +- Physical form factor: [[physical-specs]] diff --git a/examples/llm-memory-estimation.md b/examples/llm-memory-estimation.md new file mode 100644 index 0000000..f4b759f --- /dev/null +++ b/examples/llm-memory-estimation.md @@ -0,0 +1,48 @@ +# Worked Example: LLM Memory Estimation on Dell Pro Max GB10 + +## Problem + +Estimate whether Llama 3.3 70B can run on a single Dell Pro Max GB10, and at what precision. + +## Given + +- **Model:** Llama 3.3 70B (70 billion parameters) +- **Available memory:** 128 GB unified LPDDR5X +- **Usable memory:** ~110 GB (after OS, framework, overhead) + +## Calculation + +### Step 1: Raw Model Weight Memory + +| Precision | Bytes/Param | Memory for 70B | +|-----------|-------------|-----------------------| +| FP4 | 0.5 | 70 × 0.5 = 35 GB | +| FP8/INT8 | 1.0 | 70 × 1.0 = 70 GB | +| FP16 | 2.0 | 70 × 2.0 = 140 GB | +| FP32 | 4.0 | 70 × 4.0 = 280 GB | + +### Step 2: Total Memory with Overhead (1.3x multiplier) + +| Precision | Weights | Total (~1.3x) | Fits in 110 GB? | +|-----------|---------|----------------|-----------------| +| FP4 | 35 GB | ~46 GB | Yes | +| FP8/INT8 | 70 GB | ~91 GB | Yes | +| FP16 | 140 GB | ~182 GB | No | +| FP32 | 280 GB | ~364 GB | No | + +### Step 3: Conclusion + +- **FP4 quantized:** Fits comfortably (46/110 GB = 42% utilization). Plenty of room for large KV cache and batch sizes. +- **FP8/INT8 quantized:** Fits (91/110 GB = 83% utilization). Tight but workable for single-request inference. +- **FP16 (half precision):** Does NOT fit in a single unit. Would require 2-unit stacking (see [[multi-unit-stacking]]). +- **FP32 (full precision):** Does NOT fit even with stacking. + +## Verification + +NVIDIA confirms Llama 3.3 70B runs locally on a single GB10 unit. This is consistent with FP8 or FP4 quantized inference, which our calculation shows fitting within memory bounds. + +## Sources + +- Memory specs: [[memory-and-storage]] +- Estimation formulas: [[equations-and-bounds]] +- Model capabilities: [[ai-workloads]] diff --git a/phases/phase-01-initial-build.md b/phases/phase-01-initial-build.md new file mode 100644 index 0000000..a98378a --- /dev/null +++ b/phases/phase-01-initial-build.md @@ -0,0 +1,48 @@ +# Phase 1: Initial Knowledge Base Build + +**Date:** 2026-02-14 +**Goal:** Bootstrap the expert agent context system for the Dell Pro Max GB10 + +## What Was Done + +1. Created full directory structure following the expert agent template +2. Researched Dell Pro Max GB10 specifications from multiple sources +3. Created 10 context files covering all major topics: + - `gb10-superchip.md` — SoC architecture, CPU/GPU details, NVLink-C2C + - `memory-and-storage.md` — 128GB LPDDR5X, NVMe storage options + - `connectivity.md` — All ports, networking, wireless + - `dgx-os-software.md` — DGX OS 7, Ubuntu 24.04, software stack + - `ai-frameworks.md` — PyTorch, NeMo, RAPIDS, CUDA, llama.cpp + - `ai-workloads.md` — LLM inference, fine-tuning, model capacity + - `multi-unit-stacking.md` — Dual-unit configuration via ConnectX-7 + - `physical-specs.md` — Dimensions, weight, power supply + - `skus-and-pricing.md` — 2TB/4TB models, pricing, competitive positioning + - `setup-and-config.md` — First boot, OS reinstall, troubleshooting +4. Created `equations-and-bounds.md` with formulas and validation ranges +5. Created `open-questions.md` with 25+ tracked unknowns +6. Created `reference/glossary.yaml` with 35 term definitions +7. Created worked example: LLM memory estimation +8. Created `CLAUDE.md` with full agent operating manual + +## Sources Used + +- Dell product page (dell.com) +- NVIDIA newsroom (nvidianews.nvidia.com) +- WCCFTech review/specs article +- Phoronix Linux benchmarking preview +- NVIDIA DGX OS 7 User Guide (docs.nvidia.com) +- Dell Support KB articles +- Arm Learning Paths (learn.arm.com) +- The Register GB10 architecture article + +## What Changed + +- All files are new (initial build) + +## Known Gaps + +- No independent benchmark data yet (Phoronix review in progress) +- Multi-unit stacking details are sparse +- Some TFLOPS figures are inferred (only FP4 officially published) +- Owner's manual details not yet integrated (403 from Dell support) +- No hands-on configuration walkthrough yet diff --git a/reference/glossary.yaml b/reference/glossary.yaml new file mode 100644 index 0000000..749b2f1 --- /dev/null +++ b/reference/glossary.yaml @@ -0,0 +1,366 @@ +terms: + - term: "gb10" + full_name: "NVIDIA GB10 Superchip" + definition: | + System-on-chip combining an NVIDIA Grace CPU and Blackwell GPU + connected via NVLink-C2C. The core silicon in the Dell Pro Max GB10 + and NVIDIA DGX Spark. + unit: null + typical_range: null + related_terms: ["grace-blackwell", "superchip", "nvlink-c2c"] + related_topics: ["gb10-superchip"] + + - term: "grace-blackwell" + full_name: "Grace Blackwell Architecture" + definition: | + NVIDIA's combined CPU+GPU architecture pairing a Grace ARM CPU + with a Blackwell GPU via NVLink-C2C coherent interconnect. + unit: null + typical_range: null + related_terms: ["gb10", "blackwell-gpu", "grace-cpu"] + related_topics: ["gb10-superchip"] + + - term: "superchip" + full_name: "Superchip" + definition: | + NVIDIA's term for a system-on-chip that integrates both CPU and GPU + dies on a single package with high-bandwidth interconnect. + unit: null + typical_range: null + related_terms: ["gb10", "soc"] + related_topics: ["gb10-superchip"] + + - term: "soc" + full_name: "System-on-Chip" + definition: | + An integrated circuit that combines multiple components (CPU, GPU, + memory controller, I/O) on a single die or package. + unit: null + typical_range: null + related_terms: ["gb10", "superchip"] + related_topics: ["gb10-superchip"] + + - term: "cortex-x925" + full_name: "ARM Cortex-X925" + definition: | + ARM's high-performance CPU core design (ARMv9.2 architecture). + The GB10 contains 10 of these as its "big" cores. + unit: null + typical_range: null + related_terms: ["cortex-a725", "gb10"] + related_topics: ["gb10-superchip"] + + - term: "cortex-a725" + full_name: "ARM Cortex-A725" + definition: | + ARM's efficiency-focused CPU core design (ARMv9.2 architecture). + The GB10 contains 10 of these as its "LITTLE" cores. + unit: null + typical_range: null + related_terms: ["cortex-x925", "gb10"] + related_topics: ["gb10-superchip"] + + - term: "blackwell-gpu" + full_name: "NVIDIA Blackwell GPU" + definition: | + NVIDIA's GPU architecture generation. In the GB10, it provides + 6,144 CUDA cores and 5th-gen Tensor Cores. + unit: null + typical_range: null + related_terms: ["cuda-core", "tensor-core", "gb10"] + related_topics: ["gb10-superchip"] + + - term: "cuda-core" + full_name: "CUDA Core" + definition: | + NVIDIA's basic parallel processing unit for general-purpose GPU + computing. The GB10 has 6,144 CUDA cores. + unit: "cores" + typical_range: "6,144 in GB10" + related_terms: ["blackwell-gpu", "tensor-core"] + related_topics: ["gb10-superchip"] + + - term: "tensor-core" + full_name: "Tensor Core (5th Generation)" + definition: | + Specialized GPU cores for matrix multiply-accumulate operations, + critical for deep learning inference and training. 5th-gen Tensor + Cores in Blackwell support FP4, FP8, FP16, and other precisions. + unit: "cores" + typical_range: null + related_terms: ["blackwell-gpu", "fp4", "fp8"] + related_topics: ["gb10-superchip", "ai-workloads"] + + - term: "nvlink-c2c" + full_name: "NVLink Chip-to-Chip" + definition: | + NVIDIA's proprietary die-to-die interconnect connecting the Grace CPU + and Blackwell GPU within the GB10 superchip. Provides 600 GB/s + bidirectional bandwidth and enables unified coherent memory. + unit: "GB/s" + typical_range: "600 GB/s bidirectional" + related_terms: ["gb10", "unified-memory"] + related_topics: ["gb10-superchip", "memory-and-storage"] + + - term: "unified-memory" + full_name: "Unified Coherent Memory" + definition: | + Memory architecture where CPU and GPU share the same physical memory + pool with hardware cache coherence. Eliminates explicit host-device + memory copies. In the GB10, both processors see the full 128 GB. + unit: "GB" + typical_range: "128 GB in GB10" + related_terms: ["lpddr5x", "nvlink-c2c"] + related_topics: ["memory-and-storage", "gb10-superchip"] + + - term: "lpddr5x" + full_name: "Low-Power DDR5X" + definition: | + Latest generation of low-power DRAM. In the GB10, runs at up to + 9,400 MT/s providing 273 GB/s of memory bandwidth. + unit: "MT/s" + typical_range: "9,400 MT/s in GB10" + related_terms: ["unified-memory"] + related_topics: ["memory-and-storage"] + + - term: "tflops" + full_name: "Tera Floating-Point Operations Per Second" + definition: | + Unit of compute performance. 1 TFLOPS = 10^12 floating-point + operations per second. ALWAYS specify the precision (FP4, FP8, + FP16, FP32) when quoting TFLOPS figures. + unit: "TFLOPS" + typical_range: "1,000 TFLOPS FP4 for GB10" + related_terms: ["pflop", "fp4"] + related_topics: ["gb10-superchip", "equations-and-bounds"] + + - term: "pflop" + full_name: "Peta Floating-Point Operations Per Second" + definition: | + 1 PFLOP = 1,000 TFLOPS = 10^15 floating-point operations per second. + The GB10's headline figure is 1 PFLOP at FP4 precision. + unit: "PFLOP" + typical_range: "1 PFLOP FP4 for GB10" + related_terms: ["tflops", "fp4"] + related_topics: ["gb10-superchip", "equations-and-bounds"] + + - term: "fp4" + full_name: "4-bit Floating Point" + definition: | + Ultra-low precision numerical format using 4 bits per value. + Used for quantized inference. The GB10's 1 PFLOP headline + is measured at FP4 precision. + unit: "bits" + typical_range: null + related_terms: ["fp8", "fp16", "quantization", "tflops"] + related_topics: ["ai-workloads", "equations-and-bounds"] + + - term: "fp8" + full_name: "8-bit Floating Point" + definition: | + Low-precision numerical format using 8 bits per value. Common + for quantized LLM inference with good accuracy/performance tradeoff. + unit: "bits" + typical_range: null + related_terms: ["fp4", "fp16", "quantization"] + related_topics: ["ai-workloads", "equations-and-bounds"] + + - term: "fp16" + full_name: "16-bit Floating Point (Half Precision)" + definition: | + Standard training precision for many deep learning models. + Good balance of range, precision, and memory efficiency. + unit: "bits" + typical_range: null + related_terms: ["fp4", "fp8", "fp32"] + related_topics: ["ai-workloads", "equations-and-bounds"] + + - term: "quantization" + full_name: "Model Quantization" + definition: | + Technique for reducing model memory footprint by using lower-precision + number formats (FP4, FP8, INT4, INT8) for model weights. Enables + running larger models in limited memory at some accuracy cost. + unit: null + typical_range: null + related_terms: ["fp4", "fp8", "parameter-count"] + related_topics: ["ai-workloads"] + + - term: "parameter-count" + full_name: "Model Parameter Count" + definition: | + The number of trainable weights in a neural network, typically + expressed in billions (B). Determines memory requirements and + roughly correlates with model capability. + unit: "billions (B)" + typical_range: "7B-200B on single GB10, up to 400B stacked" + related_terms: ["quantization", "unified-memory"] + related_topics: ["ai-workloads", "memory-and-storage"] + + - term: "dgx-os" + full_name: "NVIDIA DGX OS 7" + definition: | + NVIDIA's customized Linux distribution based on Ubuntu 24.04 LTS. + Includes pre-configured GPU drivers, CUDA toolkit, and platform + optimizations for DGX/DGX Spark hardware. + unit: null + typical_range: null + related_terms: ["ubuntu", "cuda"] + related_topics: ["dgx-os-software"] + + - term: "dgx-spark" + full_name: "NVIDIA DGX Spark" + definition: | + NVIDIA's own-branded desktop AI computer using the GB10 superchip. + Same hardware as the Dell Pro Max GB10, different branding and + support channel. Priced at $2,999. + unit: null + typical_range: null + related_terms: ["gb10"] + related_topics: ["skus-and-pricing"] + + - term: "connectx-7" + full_name: "NVIDIA ConnectX-7 SmartNIC" + definition: | + High-performance network interface card integrated into the + Dell Pro Max GB10. Provides 2x QSFP 200 Gbps ports, primarily + used for multi-unit stacking. + unit: "Gbps" + typical_range: "200 Gbps per port" + related_terms: ["qsfp", "smartnic"] + related_topics: ["connectivity", "multi-unit-stacking"] + + - term: "qsfp" + full_name: "Quad Small Form-factor Pluggable" + definition: | + High-speed networking connector standard. The Dell Pro Max GB10 + has 2x QSFP ports supporting 200 Gbps each via ConnectX-7. + unit: "Gbps" + typical_range: "200 Gbps per port in GB10" + related_terms: ["connectx-7"] + related_topics: ["connectivity", "multi-unit-stacking"] + + - term: "smartnic" + full_name: "Smart Network Interface Card" + definition: | + Network adapter with onboard processing capability for offloading + network tasks from the main CPU. The ConnectX-7 in the GB10 is + a SmartNIC. + unit: null + typical_range: null + related_terms: ["connectx-7", "qsfp"] + related_topics: ["connectivity"] + + - term: "10gbe" + full_name: "10 Gigabit Ethernet" + definition: | + Standard Ethernet networking at 10 Gbps. The Dell Pro Max GB10 + includes one 10GbE RJ45 port for general network connectivity. + unit: "Gbps" + typical_range: "10 Gbps" + related_terms: [] + related_topics: ["connectivity"] + + - term: "pytorch" + full_name: "PyTorch" + definition: | + Open-source deep learning framework. Primary ML framework + supported on the GB10 with ARM64-native builds and full + CUDA acceleration. + unit: null + typical_range: null + related_terms: ["cuda", "nemo"] + related_topics: ["ai-frameworks"] + + - term: "nemo" + full_name: "NVIDIA NeMo" + definition: | + NVIDIA's framework for building, customizing, and deploying + generative AI models. Supports fine-tuning (SFT, RLHF) and + is optimized for NVIDIA hardware. + unit: null + typical_range: null + related_terms: ["pytorch", "cuda"] + related_topics: ["ai-frameworks"] + + - term: "rapids" + full_name: "NVIDIA RAPIDS" + definition: | + Suite of GPU-accelerated data science libraries including cuDF + (DataFrames), cuML (ML), and cuGraph (graph analytics). Drop-in + replacements for pandas, scikit-learn, and NetworkX. + unit: null + typical_range: null + related_terms: ["cuda"] + related_topics: ["ai-frameworks"] + + - term: "cuda" + full_name: "Compute Unified Device Architecture" + definition: | + NVIDIA's parallel computing platform and API for GPU-accelerated + computing. Pre-installed on the GB10 via DGX OS. + unit: null + typical_range: null + related_terms: ["cuda-core", "pytorch", "nemo"] + related_topics: ["ai-frameworks", "dgx-os-software"] + + - term: "ngc" + full_name: "NVIDIA NGC Catalog" + definition: | + NVIDIA's hub for GPU-optimized AI software including pre-trained + models, containers, SDKs, and Helm charts. + unit: null + typical_range: null + related_terms: ["cuda", "nemo"] + related_topics: ["ai-frameworks"] + + - term: "llama-cpp" + full_name: "llama.cpp" + definition: | + Open-source C/C++ inference engine for running quantized LLMs. + Supports ARM-optimized builds for GB10 and GGUF model format. + unit: null + typical_range: null + related_terms: ["quantization"] + related_topics: ["ai-frameworks", "ai-workloads"] + + - term: "fcm1253" + full_name: "Dell Pro Max FCM1253" + definition: | + Dell's model number for the Pro Max with GB10 desktop system. + Available in 2TB and 4TB storage configurations. + unit: null + typical_range: null + related_terms: ["gb10"] + related_topics: ["skus-and-pricing"] + + - term: "sed" + full_name: "Self-Encrypting Drive" + definition: | + Storage drive with built-in hardware encryption. Available + on the 4TB configuration of the Dell Pro Max GB10. + unit: null + typical_range: null + related_terms: [] + related_topics: ["memory-and-storage", "skus-and-pricing"] + + - term: "tdp" + full_name: "Thermal Design Power" + definition: | + Maximum amount of heat a cooling system must dissipate. + The GB10 system TDP is approximately 140W. + unit: "watts" + typical_range: "~140W for GB10 system" + related_terms: [] + related_topics: ["physical-specs", "gb10-superchip"] + + - term: "displayport-alt-mode" + full_name: "DisplayPort Alternate Mode" + definition: | + Protocol allowing DisplayPort video signals to be carried + over a USB Type-C connector. Used for display output on + the GB10's USB-C ports. + unit: null + typical_range: null + related_terms: ["usb-c", "hdmi"] + related_topics: ["connectivity"]