--- id: ai-frameworks title: "AI Frameworks and Development Tools" status: established source_sections: "Web research: NVIDIA newsroom, Arm learning paths, NVIDIA DGX Spark User Guide, build.nvidia.com/spark playbooks" related_topics: [dgx-os-software, gb10-superchip, ai-workloads] key_equations: [] key_terms: [pytorch, nemo, rapids, cuda, ngc, jupyter, tensorrt, tensorrt-llm, llama-cpp, docker, nvidia-container-runtime, fex, ollama, comfyui, sm_121, cu130, speculative-decoding] images: [] examples: [] open_questions: - "TensorFlow support status on ARM GB10 (official vs. community)" - "Full NGC catalog availability — which containers work on GB10?" - "vLLM or other inference server support on ARM Blackwell" - "JAX support status" --- # AI Frameworks and Development Tools The Dell Pro Max GB10 supports a broad AI software ecosystem, pre-configured through DGX OS. ## 1. Core Frameworks ### PyTorch - Primary deep learning framework - ARM64-native builds available - Full CUDA support on Blackwell GPU ### NVIDIA NeMo - Framework for fine-tuning and customizing large language models - Supports supervised fine-tuning (SFT), RLHF, and other alignment techniques - Optimized for NVIDIA hardware ### NVIDIA RAPIDS - GPU-accelerated data science libraries - Includes cuDF (DataFrames), cuML (machine learning), cuGraph (graph analytics) - Drop-in replacements for pandas, scikit-learn, and NetworkX ## 2. Inference Tools ### CUDA Toolkit (v13.0) - **CUDA compute capability:** `sm_121` (Blackwell on GB10) — use `-DCMAKE_CUDA_ARCHITECTURES="121"` when compiling - **PyTorch CUDA wheels:** `cu130` (e.g., `pip3 install torch --index-url https://download.pytorch.org/whl/cu130`) - Low-level GPU compute API, compiler (nvcc), profiling and debugging tools ### llama.cpp - Quantized LLM inference engine - ARM-optimized builds available for GB10 - Supports GGUF model format - Build with CUDA: `cmake .. -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES="121"` (T1, build.nvidia.com/spark) - Provides **OpenAI-compatible API** via `llama-server` (chat completions, streaming, function calling) - Documented in [Arm Learning Path](https://learn.arm.com/learning-paths/laptops-and-desktops/dgx_spark_llamacpp/) ### TensorRT-LLM - NVIDIA's LLM inference optimizer — **confirmed available** (T1, build.nvidia.com/spark) - Container: `tensorrt-llm/release:1.2.0rc6` - Supports **speculative decoding** for faster inference: - **EAGLE-3:** Built-in drafting head, no separate draft model needed - **Draft-Target:** Pairs small (8B) and large (70B) models, uses FP4 quantization - Configurable KV cache memory fraction for memory management ### Ollama - LLM runtime with model library — runs via Docker on GB10 (T1, build.nvidia.com/spark) - Container: `ghcr.io/open-webui/open-webui:ollama` (bundles Open WebUI + Ollama) - Models available from ollama.com/library (e.g., `gpt-oss:20b`) - Port: 12000 (via NVIDIA Sync) or 8080 (direct) ## 3. Development Environment - **DGX Dashboard** — web-based system monitor at `http://localhost:11000` with integrated JupyterLab (T0 Spec). JupyterLab ports configured in `/opt/nvidia/dgx-dashboard-service/jupyterlab_ports.yaml`. - **VS Code** — ARM64 .deb available; also remote SSH via NVIDIA Sync or manual SSH (T1, build.nvidia.com/spark) - **Cursor** — supported via NVIDIA Sync remote SSH launch (T1, build.nvidia.com/spark) - **NVIDIA AI Workbench** — launchable via NVIDIA Sync (T1, build.nvidia.com/spark) - **Python** — system Python with AI/ML package ecosystem - **NVIDIA NGC Catalog** — library of pre-trained models, containers, and SDKs - **Docker + NVIDIA Container Runtime** — pre-installed for containerized workflows (T0 Spec) - **NVIDIA AI Enterprise** — enterprise-grade AI software and services - **Tutorials & Playbooks:** https://build.nvidia.com/spark ### Key NGC Containers (confirmed ARM64) | Container | Tag | Use Case | |-----------|-----|----------| | `nvcr.io/nvidia/pytorch` | `25.11-py3` | PyTorch training & fine-tuning | | `tensorrt-llm/release` | `1.2.0rc6` | Optimized LLM inference | | RAPIDS | `25.10` | GPU-accelerated data science | | `ghcr.io/open-webui/open-webui` | `ollama` | Open WebUI + Ollama LLM chat | ## 4. Image Generation ### ComfyUI - Node-based image generation UI for Stable Diffusion, SDXL, Flux, etc. (T1, build.nvidia.com/spark) - Runs natively on GB10 Blackwell GPU - Requires: Python 3.8+, CUDA toolkit, PyTorch with `cu130` - Port: 8188 (`--listen 0.0.0.0` for remote access) - Storage: ~20 GB minimum (plus model files, e.g., SD 1.5 ~2 GB) ## 5. UMA Memory Management Tip DGX Spark uses Unified Memory Architecture (UMA) — CPU and GPU share the same LPDDR5X pool. If GPU memory appears low due to filesystem buffer cache: ```bash sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' ``` This frees cached memory back to the unified pool without data loss. (T1, build.nvidia.com/spark) ## 6. Software Compatibility Notes Since the GB10 is an ARM system: - All Python packages must have ARM64 wheels or be compilable from source - Most popular ML libraries (PyTorch, NumPy, etc.) have ARM64 support - Some niche packages may require building from source - x86-only binary packages will not run natively - **FEX emulator** can translate x86 binaries to ARM at a performance cost (used for Steam/Proton gaming — see [[ai-workloads]]) - Container images must be ARM64/aarch64 builds ## Key Relationships - Runs on: [[dgx-os-software]] - Accelerated by: [[gb10-superchip]] - Powers: [[ai-workloads]]