You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
3.0 KiB
3.0 KiB
Phase 4: NVIDIA Spark Playbooks Integration
Date: 2026-02-14 Goal: Integrate official NVIDIA playbooks from build.nvidia.com/spark into knowledge base
Source
- https://build.nvidia.com/spark (main page, 9 playbooks + connection guide)
Key Discoveries
Critical Technical Facts (previously unknown)
- CUDA compute capability:
sm_121— required for compiling CUDA kernels on Blackwell GB10 (-DCMAKE_CUDA_ARCHITECTURES="121") - CUDA toolkit version: 13.0 — PyTorch wheels use
cu130index - DGX Dashboard runs on port 11000 — JupyterLab ports in
/opt/nvidia/dgx-dashboard-service/jupyterlab_ports.yaml - TensorRT-LLM confirmed — container
tensorrt-llm/release:1.2.0rc6 - PyTorch NGC container:
nvcr.io/nvidia/pytorch:25.11-py3 - RAPIDS container: version 25.10
- UMA buffer cache flush:
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
Fine-Tuning (fully documented)
- Full SFT: Llama 3.2 3B (all parameters, bfloat16)
- LoRA: Llama 3.1 8B (rank 8 default)
- LoRA + FSDP: Llama 3.1 70B (multi-node via Docker Swarm)
- QLoRA 4-bit: Llama 3.1 70B (single unit)
- Dependencies: transformers, peft, datasets, trl, bitsandbytes
Inference Tools
- llama.cpp: Build with CUDA sm_121, provides OpenAI-compatible API (streaming, function calling)
- Nemotron-3-Nano 30B: MoE (3B active), ~38 GB at Q8, built-in reasoning/tool-calling
- Speculative Decoding: EAGLE-3 (built-in drafting) and Draft-Target (8B+70B, FP4)
- Ollama + Open WebUI: Docker container, ports 12000 (Sync) or 8080 (direct)
Image Generation
- ComfyUI confirmed working (SD, SDXL, Flux) on port 8188
- Native Blackwell GPU acceleration with CUDA 13.0
Scientific Computing
- scRNA-seq: RAPIDS-singlecell, ~130s full pipeline, exact nearest-neighbor graph
- Portfolio Optimization: cuOpt + cuML, Mean-CVaR model, ~7 min pipeline
Development Environment
- VS Code: ARM64 .deb install or remote SSH via Sync
- Cursor: Remote SSH via Sync
- NVIDIA AI Workbench: Launchable via Sync
- NVIDIA Sync: Full details documented (SSH key automation, mDNS, port forwarding)
Files Updated
context/gb10-superchip.md— sm_121 CUDA architecturecontext/ai-frameworks.md— Major expansion: CUDA 13.0, TensorRT-LLM, Ollama, ComfyUI, NGC containers, UMA tipcontext/ai-workloads.md— Fine-tuning scripts, Nemotron, speculative decoding, image gen, scientific computingcontext/dgx-os-software.md— NVIDIA Sync §8 (full detail), DGX Dashboard §9 (port, features)context/setup-and-config.md— NVIDIA Sync cross-referencecontext/equations-and-bounds.md— sm_121, CUDA 13.0context/open-questions.md— 11 new resolved questions, 1 new open questionCLAUDE.md— Phase 4 added to history
Remaining Gaps
- Quantitative speculative decoding speedup (tokens/sec improvement not published)
- ComfyUI image generation benchmarks (images/sec)
- Fine-tuning wall-clock times
- Full list of Ollama-compatible models tested on GB10