9.1 KiB
| id | title | status | source_sections | related_topics | key_equations | key_terms | images | examples | open_questions |
|---|---|---|---|---|---|---|---|---|---|
| learning-and-ai | Learning & AI | established | reference/sources/github-unitree-rl-gym.md, reference/sources/github-unitree-rl-lab.md, reference/sources/github-xr-teleoperate.md, reference/sources/paper-bfm-zero.md, reference/sources/paper-gait-conditioned-rl.md | [simulation locomotion-control manipulation sdk-programming whole-body-control motion-retargeting push-recovery-balance] | [] | [gait_conditioned_rl curriculum_learning sim_to_real lerobot xr_teleoperate teleoperation] | [] | [] | [Optimal reward function design for G1 locomotion Training time estimates for different policy types How to fine-tune the stock locomotion policy LLM-based task planning integration status (firmware v3.2+)] |
Learning & AI
Reinforcement learning, imitation learning, and AI-based control for the G1.
1. Reinforcement Learning
Official RL Frameworks
| Framework | Repository | Base Library | Sim Engine | G1 Support | Tier |
|---|---|---|---|---|---|
| unitree_rl_gym | unitreerobotics/unitree_rl_gym | legged_gym + rsl_rl | Isaac Gym | Yes | T0 |
| unitree_rl_lab | unitreerobotics/unitree_rl_lab | Isaac Lab | Isaac Lab | G1-29dof | T0 |
unitree_rl_gym — Complete RL Pipeline
The primary framework for training locomotion policies: [T0]
- Supported robots: Go2, H1, H1_2, G1
- Algorithm: PPO (via rsl_rl)
- Training: Parallel environments, GPU/CPU device selection, checkpoint management
- Pipeline: Train → Play → Sim2Sim (MuJoCo validation) → Sim2Real (unitree_sdk2_python)
- Deployment: Python scripts and C++ binaries with network interface configuration
unitree_rl_lab — Isaac Lab Integration
Advanced RL training on NVIDIA Isaac Lab: [T0]
- Supported robots: Go2, H1, G1-29dof
- Simulation backends: Isaac Lab (NVIDIA) and MuJoCo (cross-sim validation)
- Deployment: Simulation → Sim-to-sim → Real robot via unitree_sdk2
- Language mix: Python 65.1%, C++ 31.3%
Key RL Research on G1
| Paper | Contribution | Validated on G1? | Tier |
|---|---|---|---|
| Gait-Conditioned RL (arXiv:2505.20619) | Multi-phase curriculum, gait-specific reward routing | Yes | T1 |
| Getting-Up Policies (arXiv:2502.12152) | Two-stage fall recovery via RL | Yes | T1 |
| HoST (arXiv:2502.08378) | Multi-critic RL for diverse posture recovery | Yes | T1 |
| Fall-Safety (arXiv:2511.07407) | Unified prevention + mitigation + recovery | Yes (zero-shot) | T1 |
| Vision Locomotion (arXiv:2602.06382) | End-to-end depth-based locomotion | Yes | T1 |
| Safe Control (arXiv:2502.02858) | Projected Safe Set for collision avoidance | Yes | T1 |
2. Imitation Learning
Data Collection — Teleoperation
| System | Device | Repository | Features |
|---|---|---|---|
| XR Teleoperate | Vision Pro, PICO 4, Quest 3 | unitreerobotics/xr_teleoperate | Hand tracking, data recording |
| Kinect Teleoperate | Azure Kinect DK | unitreerobotics/kinect_teleoperate | Body tracking, safety wake-up |
Training Frameworks
| Framework | Repository | Purpose |
|---|---|---|
| unitree_IL_lerobot | unitreerobotics/unitree_IL_lerobot | Modified LeRobot for G1 dual-arm training |
| HuggingFace LeRobot | huggingface.co/docs/lerobot/en/unitree_g1 | Standard LeRobot with G1 config |
LeRobot G1 integration: Supports both 29-DOF and 23-DOF versions, includes gr00t_wbc locomotion integration for whole-body control during manipulation tasks. [T1]
Imitation Learning Workflow
1. Teleoperate (XR/Kinect) → record episodes
2. Process data → extract observation-action pairs
3. Train policy (LeRobot / custom) → behavior cloning or diffusion policy
4. Deploy → unitree_sdk2 on real robot
3. Policy Deployment
Deployment Options
| Method | Language | Latency | Use Case |
|---|---|---|---|
| unitree_sdk2_python | Python | Higher | Prototyping, research |
| unitree_sdk2 (C++) | C++ | Lower | Production, real-time control |
Deployment Checklist
- Validate in simulation — Run policy in unitree_mujoco or Isaac Lab
- Cross-sim validate — Test in a second simulator (Sim2Sim)
- Low-gain start — Deploy with reduced gains initially
- Tethered testing — Support robot with a safety harness for first real-world tests
- Gradual ramp-up — Increase to full gains after verifying stability
Safety Wrappers
When deploying custom policies, add safety layers: [T2 — Best practice]
- Joint limit clamping (see equations-and-bounds)
- Torque saturation limits
- Fall detection with emergency stop
- Velocity bounds for safe walking speeds
4. Foundation Models
BFM-Zero (arXiv:2511.04131)
First behavioral foundation model for real humanoids: [T1]
- Key innovation: Promptable control without retraining (reward optimization, pose reaching, motion tracking)
- Training: Motion capture data regularization + online off-policy unsupervised RL
- Validation: Deployed on G1 hardware
- Significance: Enables flexible task specification without policy retraining
Behavior Foundation Model (arXiv:2509.13780)
- Uses masked online distillation with Conditional Variational Autoencoder (CVAE)
- Models behavioral distributions from large-scale datasets
- Tested on G1 (1.3m, 29-DOF) [T1]
LLM Integration (Firmware v3.2+)
- Preliminary LLM integration support on EDU models [T2]
- Natural language task commands via Jetson Orin [T2]
- Status and capabilities not yet fully documented — see open questions
5. Motion Tracking Policies
RL policies trained to imitate reference motions (from mocap) while maintaining balance: [T1 — Research papers]
| Framework | Paper | Approach | G1 Validated? |
|---|---|---|---|
| BFM-Zero | arXiv:2511.04131 | Foundation model with motion tracking mode | Yes |
| H2O | arXiv:2403.01623 | Real-time human-to-humanoid tracking | Humanoid (not G1 specifically) |
| OmniH2O | arXiv:2406.08858 | Multi-modal input tracking | Humanoid |
| HumanPlus | arXiv:2406.10454 | RGB camera shadow → imitation | Humanoid |
BFM-Zero is the most directly G1-relevant: it provides a "motion tracking" mode where the policy receives a reference pose and tracks it while maintaining balance. Zero-shot generalization to unseen motions. Open-source. See motion-retargeting for the full retargeting pipeline.
Key insight: These policies learn to simultaneously track the reference motion AND maintain balance. Push recovery is implicit — the same policy handles both. Training with perturbation curriculum further enhances robustness. See push-recovery-balance.
6. Residual Policy Learning
Training a small correction policy on top of an existing base controller: [T1 — Established technique]
a_final = a_base + α * a_residual (α ∈ [0, 1] for safety scaling)
- Base policy: Stock G1 controller or a pre-trained locomotion policy
- Residual policy: Small network trained to improve specific behavior (e.g., push recovery)
- Scaling factor α: Limits maximum deviation from base behavior
Use case for G1: Enhance the stock controller's push recovery without replacing it entirely. Train the residual in simulation with perturbation curriculum, deploy as an overlay. See push-recovery-balance §3b.
7. Perturbation Curriculum
Training RL policies with progressively increasing external disturbances: [T1 — Multiple G1 papers]
Stage 1: No perturbations (learn basic locomotion)
Stage 2: Small random pushes (10-30N, occasional)
Stage 3: Medium pushes (30-80N, more frequent)
Stage 4: Large pushes (80-200N) + terrain variation
Stage 5: Large pushes + concurrent upper-body task
This is the primary method for achieving the "always-on balance" goal. Papers arXiv:2505.20619 and arXiv:2511.07407 demonstrate this approach on real G1 hardware. See push-recovery-balance §3a for detailed parameters.
Key Relationships
- Trains in: simulation (MuJoCo, Isaac Lab, Isaac Gym)
- Deploys via: sdk-programming (unitree_sdk2 DDS interface)
- Controls: locomotion-control (RL-trained gait policies)
- Controls: manipulation (learned manipulation policies)
- Data from: manipulation (teleoperation → imitation learning)
- Enables: motion-retargeting (RL-based motion tracking policies)
- Enables: push-recovery-balance (perturbation curriculum, residual policies)
- Coordinated by: whole-body-control (WBC training frameworks)