13 KiB

Raw Blame History

id	title	status	source_sections	related_topics	key_equations	key_terms	images	examples	open_questions
learning-and-ai	Learning & AI	established	reference/sources/github-unitree-rl-gym.md, reference/sources/github-unitree-rl-lab.md, reference/sources/github-xr-teleoperate.md, reference/sources/paper-bfm-zero.md, reference/sources/paper-gait-conditioned-rl.md	[simulation locomotion-control manipulation sdk-programming whole-body-control motion-retargeting push-recovery-balance]	[]	[gait_conditioned_rl curriculum_learning sim_to_real lerobot xr_teleoperate teleoperation]	[]	[]	[Optimal reward function design for G1 locomotion Training time estimates for different policy types How to fine-tune the stock locomotion policy LLM-based task planning integration status (firmware v3.2+)]

Learning & AI

Reinforcement learning, imitation learning, and AI-based control for the G1.

1. Reinforcement Learning

Official RL Frameworks

Framework	Repository	Base Library	Sim Engine	G1 Support	Tier
unitree_rl_gym	unitreerobotics/unitree_rl_gym	legged_gym + rsl_rl	Isaac Gym	Yes	T0
unitree_rl_lab	unitreerobotics/unitree_rl_lab	Isaac Lab	Isaac Lab	G1-29dof	T0

unitree_rl_gym — Complete RL Pipeline

The primary framework for training locomotion policies: [T0]

Supported robots: Go2, H1, H1_2, G1
Algorithm: PPO (via rsl_rl)
Training: Parallel environments, GPU/CPU device selection, checkpoint management
Pipeline: Train → Play → Sim2Sim (MuJoCo validation) → Sim2Real (unitree_sdk2_python)
Deployment: Python scripts and C++ binaries with network interface configuration

unitree_rl_lab — Isaac Lab Integration

Advanced RL training on NVIDIA Isaac Lab: [T0]

Supported robots: Go2, H1, G1-29dof
Simulation backends: Isaac Lab (NVIDIA) and MuJoCo (cross-sim validation)
Deployment: Simulation → Sim-to-sim → Real robot via unitree_sdk2
Language mix: Python 65.1%, C++ 31.3%

Key RL Research on G1

Paper	Contribution	Validated on G1?	Tier
Gait-Conditioned RL (arXiv:2505.20619)	Multi-phase curriculum, gait-specific reward routing	Yes	T1
Getting-Up Policies (arXiv:2502.12152)	Two-stage fall recovery via RL	Yes	T1
HoST (arXiv:2502.08378)	Multi-critic RL for diverse posture recovery	Yes	T1
Fall-Safety (arXiv:2511.07407)	Unified prevention + mitigation + recovery	Yes (zero-shot)	T1
Vision Locomotion (arXiv:2602.06382)	End-to-end depth-based locomotion	Yes	T1
Safe Control (arXiv:2502.02858)	Projected Safe Set for collision avoidance	Yes	T1
ASAP (sim-to-real correction)	Adaptive Skill Adaptation Pipeline — residual network corrects sim-trained policy using real-world data. 52.7% tracking error reduction on G1.	Yes	T1

WBC-AGILE — Open-Source Training Framework [T1]

NVIDIA's WBC-AGILE (nvidia-isaac/WBC-AGILE) provides the training framework for GR00T-WBC policies:

Repository: nvidia-isaac/WBC-AGILE (GitHub)
Purpose: Train locomotion + WBC policies for G1 and other humanoids
Framework: Isaac Lab + RSL-RL (PPO)
G1 support: Built-in G1 configuration
Deployment: Exports to ONNX, drop-in replacement for GR00T-WBC pre-trained policies
Use cases: Retraining with corrected dynamics, fine-tuning PD gains, adding push recovery curriculum
GB10 compatible: Isaac Sim/Lab officially supported on GB10/DGX Spark

Note: The GR00T-WBC repository is inference-only — it does NOT contain training code. WBC-AGILE is the separate training framework.

2. Imitation Learning

Data Collection — Teleoperation

System	Device	Repository	Features
XR Teleoperate	Vision Pro, PICO 4, Quest 3	unitreerobotics/xr_teleoperate	Hand tracking, data recording
Kinect Teleoperate	Azure Kinect DK	unitreerobotics/kinect_teleoperate	Body tracking, safety wake-up

Training Frameworks

Framework	Repository	Purpose
unitree_IL_lerobot	unitreerobotics/unitree_IL_lerobot	Modified LeRobot for G1 dual-arm training
HuggingFace LeRobot	huggingface.co/docs/lerobot/en/unitree_g1	Standard LeRobot with G1 config

LeRobot G1 integration: Supports both 29-DOF and 23-DOF versions, includes gr00t_wbc locomotion integration for whole-body control during manipulation tasks. [T1]

Imitation Learning Workflow

1. Teleoperate (XR/Kinect) → record episodes
2. Process data → extract observation-action pairs
3. Train policy (LeRobot / custom) → behavior cloning or diffusion policy
4. Deploy → unitree_sdk2 on real robot

3. Policy Deployment

Deployment Options

Method	Language	Latency	Use Case
unitree_sdk2_python	Python	Higher	Prototyping, research
unitree_sdk2 (C++)	C++	Lower	Production, real-time control

Deployment Checklist

Validate in simulation — Run policy in unitree_mujoco or Isaac Lab
Cross-sim validate — Test in a second simulator (Sim2Sim)
Low-gain start — Deploy with reduced gains initially
Tethered testing — Support robot with a safety harness for first real-world tests
Gradual ramp-up — Increase to full gains after verifying stability

Safety Wrappers

When deploying custom policies, add safety layers: [T2 — Best practice]

Joint limit clamping (see equations-and-bounds)
Torque saturation limits
Fall detection with emergency stop
Velocity bounds for safe walking speeds

4. Foundation Models

BFM-Zero (arXiv:2511.04131)

First behavioral foundation model for real humanoids: [T1]

Key innovation: Promptable control without retraining (reward optimization, pose reaching, motion tracking)
Training: Motion capture data regularization + online off-policy unsupervised RL
Validation: Deployed on G1 hardware
Significance: Enables flexible task specification without policy retraining

Behavior Foundation Model (arXiv:2509.13780)

Uses masked online distillation with Conditional Variational Autoencoder (CVAE)
Models behavioral distributions from large-scale datasets
Tested on G1 (1.3m, 29-DOF) [T1]

LLM Integration (Firmware v3.2+)

Preliminary LLM integration support on EDU models [T2]
Natural language task commands via Jetson Orin [T2]
Status and capabilities not yet fully documented — see open questions

5. Motion Tracking Policies

RL policies trained to imitate reference motions (from mocap) while maintaining balance: [T1 — Research papers]

Framework	Paper	Approach	G1 Validated?
BFM-Zero	arXiv:2511.04131	Foundation model with motion tracking mode	Yes
H2O	arXiv:2403.01623	Real-time human-to-humanoid tracking	Humanoid (not G1 specifically)
OmniH2O	arXiv:2406.08858	Multi-modal input tracking	Humanoid
HumanPlus	arXiv:2406.10454	RGB camera shadow → imitation	Humanoid

BFM-Zero is the most directly G1-relevant: it provides a "motion tracking" mode where the policy receives a reference pose and tracks it while maintaining balance. Zero-shot generalization to unseen motions. Open-source. See motion-retargeting for the full retargeting pipeline.

Key insight: These policies learn to simultaneously track the reference motion AND maintain balance. Push recovery is implicit — the same policy handles both. Training with perturbation curriculum further enhances robustness. See push-recovery-balance.

6. Residual Policy Learning

Training a small correction policy on top of an existing base controller: [T1 — Established technique]

a_final = a_base + α * a_residual     (α ∈ [0, 1] for safety scaling)

Base policy: Stock G1 controller or a pre-trained locomotion policy
Residual policy: Small network trained to improve specific behavior (e.g., push recovery)
Scaling factor α: Limits maximum deviation from base behavior

Use case for G1: Enhance the stock controller's push recovery without replacing it entirely. Train the residual in simulation with perturbation curriculum, deploy as an overlay. See push-recovery-balance §3b.

7. Perturbation Curriculum

Training RL policies with progressively increasing external disturbances: [T1 — Multiple G1 papers]

Stage 1: No perturbations (learn basic locomotion)
Stage 2: Small random pushes (10-30N, occasional)
Stage 3: Medium pushes (30-80N, more frequent)
Stage 4: Large pushes (80-200N) + terrain variation
Stage 5: Large pushes + concurrent upper-body task

This is the primary method for achieving the "always-on balance" goal. Papers arXiv:2505.20619 and arXiv:2511.07407 demonstrate this approach on real G1 hardware. See push-recovery-balance §3a for detailed parameters.

8. MuJoCo Playground Training Pipeline — Verified (2026-02-15) [T1]

GPU-parallelized RL training for G1 locomotion using MuJoCo Playground (Google DeepMind) on the Dell Pro Max GB10.

Setup

Framework: MuJoCo Playground (playground package from GitHub, not PyPI)
Environment: G1JoystickFlatTerrain (29-DOF, 103-dim obs, velocity tracking with phase-based gait)
Training: Brax PPO, JAX + CUDA 12 on Blackwell GPU, 8192 parallel MJX environments
Throughput: ~17K steps/sec on GB10 Blackwell

G1 Environment Details (from source inspection)

Observation (103 dims): linvel(3) + gyro(3) + gravity(3) + command(3) + joint_pos-default(29) + joint_vel(29) + last_act(29) + phase(4)
Privileged state (165 dims): state(103) + clean sensors + actuator force + contact + feet velocity
Actions: 29 joint position targets (all DOF), residual from default pose, scaled by 0.25
Control rate: 50 Hz (0.02s ctrl_dt), physics at 500 Hz (0.002s sim_dt)
Push perturbations: Enabled by default (0.1-2.0 m/s velocity impulse, every 5-10s)
23 reward terms including velocity tracking, gait phase, orientation, foot slip, joint deviation
Domain randomization: Friction (0.4-1.0), mass (±10%), torso mass offset (±1kg), armature (1.0-1.05x)

Also Available

G1JoystickRoughTerrain — same env with procedural terrain
H1 gait tracking environments — reference pattern for extending G1 with tracking rewards
No existing whole-body tracking env for G1 (only H1 and Spot have gait tracking variants)

Training Results (locomotion-only baseline)

5M steps (tiny): 6 min 41 sec, reward -6.4 → -2.8
200M steps (full): reward progression -6.4 → +8.8 at 117M steps (training in progress)

Planned: Unified Whole-Body Control Training

Research direction: fork G1JoystickFlatTerrain to add upper body pose tracking for telepresence (Apple Vision Pro mocap + joystick locomotion). See plans/eager-shimmying-raccoon.md for full plan. Approach follows ExBody/ExBody2 paradigm: decouple velocity tracking (lower body) from keypoint tracking (upper body), 4-stage curriculum, ~400M steps.

Key Open-Source Repos for G1 Whole-Body RL

Repo	Approach	G1 Validated?
MuJoCo Playground	GPU-parallelized MJX training, native G1 env	Yes [T1]
BFM-Zero (LeCAR-Lab)	Foundation model, motion tracking mode	Yes [T1]
BeyondMimic (HybridRobotics)	Whole-body tracking from LAFAN1	Yes (claimed)
H2O / OmniH2O (LeCAR-Lab)	Real-time teleoperation	Humanoid (not G1-specific)
ExBody2 (UC San Diego)	Expressive whole-body with velocity decoupling	Humanoid

Key Relationships

Trains in: simulation (MuJoCo, Isaac Lab, Isaac Gym)
Deploys via: sdk-programming (unitree_sdk2 DDS interface)
Controls: locomotion-control (RL-trained gait policies)
Controls: manipulation (learned manipulation policies)
Data from: manipulation (teleoperation → imitation learning)
Enables: motion-retargeting (RL-based motion tracking policies)
Enables: push-recovery-balance (perturbation curriculum, residual policies)
Coordinated by: whole-body-control (WBC training frameworks)

13 KiB Raw Blame History