You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 

5.4 KiB

id title status source_sections related_topics key_equations key_terms images examples open_questions
ai-frameworks AI Frameworks and Development Tools established Web research: NVIDIA newsroom, Arm learning paths, NVIDIA DGX Spark User Guide, build.nvidia.com/spark playbooks [dgx-os-software gb10-superchip ai-workloads] [] [pytorch nemo rapids cuda ngc jupyter tensorrt tensorrt-llm llama-cpp docker nvidia-container-runtime fex ollama comfyui sm_121 cu130 speculative-decoding] [] [] [TensorFlow support status on ARM GB10 (official vs. community) Full NGC catalog availability — which containers work on GB10? vLLM or other inference server support on ARM Blackwell JAX support status]

AI Frameworks and Development Tools

The Dell Pro Max GB10 supports a broad AI software ecosystem, pre-configured through DGX OS.

1. Core Frameworks

PyTorch

  • Primary deep learning framework
  • ARM64-native builds available
  • Full CUDA support on Blackwell GPU

NVIDIA NeMo

  • Framework for fine-tuning and customizing large language models
  • Supports supervised fine-tuning (SFT), RLHF, and other alignment techniques
  • Optimized for NVIDIA hardware

NVIDIA RAPIDS

  • GPU-accelerated data science libraries
  • Includes cuDF (DataFrames), cuML (machine learning), cuGraph (graph analytics)
  • Drop-in replacements for pandas, scikit-learn, and NetworkX

2. Inference Tools

CUDA Toolkit (v13.0)

  • CUDA compute capability: sm_121 (Blackwell on GB10) — use -DCMAKE_CUDA_ARCHITECTURES="121" when compiling
  • PyTorch CUDA wheels: cu130 (e.g., pip3 install torch --index-url https://download.pytorch.org/whl/cu130)
  • Low-level GPU compute API, compiler (nvcc), profiling and debugging tools

llama.cpp

  • Quantized LLM inference engine
  • ARM-optimized builds available for GB10
  • Supports GGUF model format
  • Build with CUDA: cmake .. -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES="121" (T1, build.nvidia.com/spark)
  • Provides OpenAI-compatible API via llama-server (chat completions, streaming, function calling)
  • Documented in Arm Learning Path

TensorRT-LLM

  • NVIDIA's LLM inference optimizer — confirmed available (T1, build.nvidia.com/spark)
  • Container: tensorrt-llm/release:1.2.0rc6
  • Supports speculative decoding for faster inference:
    • EAGLE-3: Built-in drafting head, no separate draft model needed
    • Draft-Target: Pairs small (8B) and large (70B) models, uses FP4 quantization
  • Configurable KV cache memory fraction for memory management

Ollama

  • LLM runtime with model library — runs via Docker on GB10 (T1, build.nvidia.com/spark)
  • Container: ghcr.io/open-webui/open-webui:ollama (bundles Open WebUI + Ollama)
  • Models available from ollama.com/library (e.g., gpt-oss:20b)
  • Port: 12000 (via NVIDIA Sync) or 8080 (direct)

3. Development Environment

  • DGX Dashboard — web-based system monitor at http://localhost:11000 with integrated JupyterLab (T0 Spec). JupyterLab ports configured in /opt/nvidia/dgx-dashboard-service/jupyterlab_ports.yaml.
  • VS Code — ARM64 .deb available; also remote SSH via NVIDIA Sync or manual SSH (T1, build.nvidia.com/spark)
  • Cursor — supported via NVIDIA Sync remote SSH launch (T1, build.nvidia.com/spark)
  • NVIDIA AI Workbench — launchable via NVIDIA Sync (T1, build.nvidia.com/spark)
  • Python — system Python with AI/ML package ecosystem
  • NVIDIA NGC Catalog — library of pre-trained models, containers, and SDKs
  • Docker + NVIDIA Container Runtime — pre-installed for containerized workflows (T0 Spec)
  • NVIDIA AI Enterprise — enterprise-grade AI software and services
  • Tutorials & Playbooks: https://build.nvidia.com/spark

Key NGC Containers (confirmed ARM64)

Container Tag Use Case
nvcr.io/nvidia/pytorch 25.11-py3 PyTorch training & fine-tuning
tensorrt-llm/release 1.2.0rc6 Optimized LLM inference
RAPIDS 25.10 GPU-accelerated data science
ghcr.io/open-webui/open-webui ollama Open WebUI + Ollama LLM chat

4. Image Generation

ComfyUI

  • Node-based image generation UI for Stable Diffusion, SDXL, Flux, etc. (T1, build.nvidia.com/spark)
  • Runs natively on GB10 Blackwell GPU
  • Requires: Python 3.8+, CUDA toolkit, PyTorch with cu130
  • Port: 8188 (--listen 0.0.0.0 for remote access)
  • Storage: ~20 GB minimum (plus model files, e.g., SD 1.5 ~2 GB)

5. UMA Memory Management Tip

DGX Spark uses Unified Memory Architecture (UMA) — CPU and GPU share the same LPDDR5X pool. If GPU memory appears low due to filesystem buffer cache:

sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'

This frees cached memory back to the unified pool without data loss. (T1, build.nvidia.com/spark)

6. Software Compatibility Notes

Since the GB10 is an ARM system:

  • All Python packages must have ARM64 wheels or be compilable from source
  • Most popular ML libraries (PyTorch, NumPy, etc.) have ARM64 support
  • Some niche packages may require building from source
  • x86-only binary packages will not run natively
  • FEX emulator can translate x86 binaries to ARM at a performance cost (used for Steam/Proton gaming — see ai-workloads)
  • Container images must be ARM64/aarch64 builds

Key Relationships