Phase 4: NVIDIA Spark Playbooks Integration

Date: 2026-02-14 Goal: Integrate official NVIDIA playbooks from build.nvidia.com/spark into knowledge base

Source

CUDA compute capability: sm_121 — required for compiling CUDA kernels on Blackwell GB10 (-DCMAKE_CUDA_ARCHITECTURES="121")
CUDA toolkit version: 13.0 — PyTorch wheels use cu130 index
DGX Dashboard runs on port 11000 — JupyterLab ports in /opt/nvidia/dgx-dashboard-service/jupyterlab_ports.yaml
TensorRT-LLM confirmed — container tensorrt-llm/release:1.2.0rc6
PyTorch NGC container: nvcr.io/nvidia/pytorch:25.11-py3
RAPIDS container: version 25.10
UMA buffer cache flush: sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'

llama.cpp: Build with CUDA sm_121, provides OpenAI-compatible API (streaming, function calling)
Nemotron-3-Nano 30B: MoE (3B active), ~38 GB at Q8, built-in reasoning/tool-calling
Speculative Decoding: EAGLE-3 (built-in drafting) and Draft-Target (8B+70B, FP4)
Ollama + Open WebUI: Docker container, ports 12000 (Sync) or 8080 (direct)

scRNA-seq: RAPIDS-singlecell, ~130s full pipeline, exact nearest-neighbor graph
Portfolio Optimization: cuOpt + cuML, Mean-CVaR model, ~7 min pipeline

VS Code: ARM64 .deb install or remote SSH via Sync
Cursor: Remote SSH via Sync
NVIDIA AI Workbench: Launchable via Sync
NVIDIA Sync: Full details documented (SSH key automation, mDNS, port forwarding)

context/gb10-superchip.md — sm_121 CUDA architecture
context/ai-frameworks.md — Major expansion: CUDA 13.0, TensorRT-LLM, Ollama, ComfyUI, NGC containers, UMA tip
context/ai-workloads.md — Fine-tuning scripts, Nemotron, speculative decoding, image gen, scientific computing
context/dgx-os-software.md — NVIDIA Sync §8 (full detail), DGX Dashboard §9 (port, features)
context/setup-and-config.md — NVIDIA Sync cross-reference
context/equations-and-bounds.md — sm_121, CUDA 13.0
context/open-questions.md — 11 new resolved questions, 1 new open question
CLAUDE.md — Phase 4 added to history

Quantitative speculative decoding speedup (tokens/sec improvement not published)
ComfyUI image generation benchmarks (images/sec)
Fine-tuning wall-clock times
Full list of Ollama-compatible models tested on GB10