---
id: ai-frameworks
title: "AI Frameworks and Development Tools"
status: established
source_sections: "Web research: NVIDIA newsroom, Arm learning paths, NVIDIA DGX Spark User Guide, build.nvidia.com/spark playbooks"
related_topics: [dgx-os-software, gb10-superchip, ai-workloads]
key_equations: []
key_terms: [pytorch, nemo, rapids, cuda, ngc, jupyter, tensorrt, tensorrt-llm, llama-cpp, docker, nvidia-container-runtime, fex, ollama, comfyui, sm_121, cu130, speculative-decoding]
images: []
examples: []
open_questions:
  - "TensorFlow support status on ARM GB10 (official vs. community)"
  - "Full NGC catalog availability — which containers work on GB10?"
  - "vLLM or other inference server support on ARM Blackwell"
  - "JAX support status"
---

# AI Frameworks and Development Tools

The Dell Pro Max GB10 supports a broad AI software ecosystem, pre-configured through DGX OS.

## 1. Core Frameworks

### PyTorch
- Primary deep learning framework
- ARM64-native builds available
- Full CUDA support on Blackwell GPU

### NVIDIA NeMo
- Framework for fine-tuning and customizing large language models
- Supports supervised fine-tuning (SFT), RLHF, and other alignment techniques
- Optimized for NVIDIA hardware

### NVIDIA RAPIDS
- GPU-accelerated data science libraries
- Includes cuDF (DataFrames), cuML (machine learning), cuGraph (graph analytics)
- Drop-in replacements for pandas, scikit-learn, and NetworkX

## 2. Inference Tools

### CUDA Toolkit (v13.0)
- **CUDA compute capability:** `sm_121` (Blackwell on GB10) — use `-DCMAKE_CUDA_ARCHITECTURES="121"` when compiling
- **PyTorch CUDA wheels:** `cu130` (e.g., `pip3 install torch --index-url https://download.pytorch.org/whl/cu130`)
- Low-level GPU compute API, compiler (nvcc), profiling and debugging tools

### llama.cpp
- Quantized LLM inference engine
- ARM-optimized builds available for GB10
- Supports GGUF model format
- Build with CUDA: `cmake .. -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES="121"` (T1, build.nvidia.com/spark)
- Provides **OpenAI-compatible API** via `llama-server` (chat completions, streaming, function calling)
- Documented in [Arm Learning Path](https://learn.arm.com/learning-paths/laptops-and-desktops/dgx_spark_llamacpp/)

### TensorRT-LLM
- NVIDIA's LLM inference optimizer — **confirmed available** (T1, build.nvidia.com/spark)
- Container: `tensorrt-llm/release:1.2.0rc6`
- Supports **speculative decoding** for faster inference:
  - **EAGLE-3:** Built-in drafting head, no separate draft model needed
  - **Draft-Target:** Pairs small (8B) and large (70B) models, uses FP4 quantization
- Configurable KV cache memory fraction for memory management

### Ollama
- LLM runtime with model library — runs via Docker on GB10 (T1, build.nvidia.com/spark)
- Container: `ghcr.io/open-webui/open-webui:ollama` (bundles Open WebUI + Ollama)
- Models available from ollama.com/library (e.g., `gpt-oss:20b`)
- Port: 12000 (via NVIDIA Sync) or 8080 (direct)

## 3. Development Environment

- **DGX Dashboard** — web-based system monitor at `http://localhost:11000` with integrated JupyterLab (T0 Spec). JupyterLab ports configured in `/opt/nvidia/dgx-dashboard-service/jupyterlab_ports.yaml`.
- **VS Code** — ARM64 .deb available; also remote SSH via NVIDIA Sync or manual SSH (T1, build.nvidia.com/spark)
- **Cursor** — supported via NVIDIA Sync remote SSH launch (T1, build.nvidia.com/spark)
- **NVIDIA AI Workbench** — launchable via NVIDIA Sync (T1, build.nvidia.com/spark)
- **Python** — system Python with AI/ML package ecosystem
- **NVIDIA NGC Catalog** — library of pre-trained models, containers, and SDKs
- **Docker + NVIDIA Container Runtime** — pre-installed for containerized workflows (T0 Spec)
- **NVIDIA AI Enterprise** — enterprise-grade AI software and services
- **Tutorials & Playbooks:** https://build.nvidia.com/spark

### Key NGC Containers (confirmed ARM64)

| Container | Tag | Use Case |
|-----------|-----|----------|
| `nvcr.io/nvidia/pytorch` | `25.11-py3` | PyTorch training & fine-tuning |
| `tensorrt-llm/release` | `1.2.0rc6` | Optimized LLM inference |
| RAPIDS | `25.10` | GPU-accelerated data science |
| `ghcr.io/open-webui/open-webui` | `ollama` | Open WebUI + Ollama LLM chat |

## 4. Image Generation

### ComfyUI
- Node-based image generation UI for Stable Diffusion, SDXL, Flux, etc. (T1, build.nvidia.com/spark)
- Runs natively on GB10 Blackwell GPU
- Requires: Python 3.8+, CUDA toolkit, PyTorch with `cu130`
- Port: 8188 (`--listen 0.0.0.0` for remote access)
- Storage: ~20 GB minimum (plus model files, e.g., SD 1.5 ~2 GB)

## 5. UMA Memory Management Tip

DGX Spark uses Unified Memory Architecture (UMA) — CPU and GPU share the same LPDDR5X pool. If GPU memory appears low due to filesystem buffer cache:

```bash
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
```

This frees cached memory back to the unified pool without data loss. (T1, build.nvidia.com/spark)

## 6. Software Compatibility Notes

Since the GB10 is an ARM system:

- All Python packages must have ARM64 wheels or be compilable from source
- Most popular ML libraries (PyTorch, NumPy, etc.) have ARM64 support
- Some niche packages may require building from source
- x86-only binary packages will not run natively
- **FEX emulator** can translate x86 binaries to ARM at a performance cost (used for Steam/Proton gaming — see [[ai-workloads]])
- Container images must be ARM64/aarch64 builds

## Key Relationships

- Runs on: [[dgx-os-software]]
- Accelerated by: [[gb10-superchip]]
- Powers: [[ai-workloads]]