5.4 KiB

Raw Blame History

id	title	status	source_sections	related_topics	key_equations	key_terms	images	examples	open_questions
ai-frameworks	AI Frameworks and Development Tools	established	Web research: NVIDIA newsroom, Arm learning paths, NVIDIA DGX Spark User Guide, build.nvidia.com/spark playbooks	[dgx-os-software gb10-superchip ai-workloads]	[]	[pytorch nemo rapids cuda ngc jupyter tensorrt tensorrt-llm llama-cpp docker nvidia-container-runtime fex ollama comfyui sm_121 cu130 speculative-decoding]	[]	[]	[TensorFlow support status on ARM GB10 (official vs. community) Full NGC catalog availability — which containers work on GB10? vLLM or other inference server support on ARM Blackwell JAX support status]

AI Frameworks and Development Tools

The Dell Pro Max GB10 supports a broad AI software ecosystem, pre-configured through DGX OS.

1. Core Frameworks

PyTorch

Primary deep learning framework
ARM64-native builds available
Full CUDA support on Blackwell GPU

NVIDIA NeMo

Framework for fine-tuning and customizing large language models
Supports supervised fine-tuning (SFT), RLHF, and other alignment techniques
Optimized for NVIDIA hardware

NVIDIA RAPIDS

GPU-accelerated data science libraries
Includes cuDF (DataFrames), cuML (machine learning), cuGraph (graph analytics)
Drop-in replacements for pandas, scikit-learn, and NetworkX

2. Inference Tools

CUDA Toolkit (v13.0)

CUDA compute capability: sm_121 (Blackwell on GB10) — use -DCMAKE_CUDA_ARCHITECTURES="121" when compiling
PyTorch CUDA wheels: cu130 (e.g., pip3 install torch --index-url https://download.pytorch.org/whl/cu130)
Low-level GPU compute API, compiler (nvcc), profiling and debugging tools

llama.cpp

Quantized LLM inference engine
ARM-optimized builds available for GB10
Supports GGUF model format
Build with CUDA: cmake .. -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES="121" (T1, build.nvidia.com/spark)
Provides OpenAI-compatible API via llama-server (chat completions, streaming, function calling)
Documented in Arm Learning Path

TensorRT-LLM

NVIDIA's LLM inference optimizer — confirmed available (T1, build.nvidia.com/spark)
Container: tensorrt-llm/release:1.2.0rc6
Supports speculative decoding for faster inference:
- EAGLE-3: Built-in drafting head, no separate draft model needed
- Draft-Target: Pairs small (8B) and large (70B) models, uses FP4 quantization
Configurable KV cache memory fraction for memory management

Ollama

LLM runtime with model library — runs via Docker on GB10 (T1, build.nvidia.com/spark)
Container: ghcr.io/open-webui/open-webui:ollama (bundles Open WebUI + Ollama)
Models available from ollama.com/library (e.g., gpt-oss:20b)
Port: 12000 (via NVIDIA Sync) or 8080 (direct)

3. Development Environment

DGX Dashboard — web-based system monitor at http://localhost:11000 with integrated JupyterLab (T0 Spec). JupyterLab ports configured in /opt/nvidia/dgx-dashboard-service/jupyterlab_ports.yaml.
VS Code — ARM64 .deb available; also remote SSH via NVIDIA Sync or manual SSH (T1, build.nvidia.com/spark)
Cursor — supported via NVIDIA Sync remote SSH launch (T1, build.nvidia.com/spark)
NVIDIA AI Workbench — launchable via NVIDIA Sync (T1, build.nvidia.com/spark)
Python — system Python with AI/ML package ecosystem
NVIDIA NGC Catalog — library of pre-trained models, containers, and SDKs
Docker + NVIDIA Container Runtime — pre-installed for containerized workflows (T0 Spec)
NVIDIA AI Enterprise — enterprise-grade AI software and services
Tutorials & Playbooks: https://build.nvidia.com/spark

Key NGC Containers (confirmed ARM64)

Container	Tag	Use Case
`nvcr.io/nvidia/pytorch`	`25.11-py3`	PyTorch training & fine-tuning
`tensorrt-llm/release`	`1.2.0rc6`	Optimized LLM inference
RAPIDS	`25.10`	GPU-accelerated data science
`ghcr.io/open-webui/open-webui`	`ollama`	Open WebUI + Ollama LLM chat

4. Image Generation

ComfyUI

Node-based image generation UI for Stable Diffusion, SDXL, Flux, etc. (T1, build.nvidia.com/spark)
Runs natively on GB10 Blackwell GPU
Requires: Python 3.8+, CUDA toolkit, PyTorch with cu130
Port: 8188 (--listen 0.0.0.0 for remote access)
Storage: ~20 GB minimum (plus model files, e.g., SD 1.5 ~2 GB)

5. UMA Memory Management Tip

DGX Spark uses Unified Memory Architecture (UMA) — CPU and GPU share the same LPDDR5X pool. If GPU memory appears low due to filesystem buffer cache:

sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'

This frees cached memory back to the unified pool without data loss. (T1, build.nvidia.com/spark)

6. Software Compatibility Notes

Since the GB10 is an ARM system:

All Python packages must have ARM64 wheels or be compilable from source
Most popular ML libraries (PyTorch, NumPy, etc.) have ARM64 support
Some niche packages may require building from source
x86-only binary packages will not run natively
FEX emulator can translate x86 binaries to ARM at a performance cost (used for Steam/Proton gaming — see ai-workloads)
Container images must be ARM64/aarch64 builds

Key Relationships

Runs on: dgx-os-software
Accelerated by: gb10-superchip
Powers: ai-workloads

5.4 KiB Raw Blame History