--- id: locomotion-control title: "Locomotion & Balance Control" status: established source_sections: "reference/sources/paper-gait-conditioned-rl.md, reference/sources/paper-getting-up-policies.md, reference/sources/official-product-page.md" related_topics: [joint-configuration, sensors-perception, equations-and-bounds, learning-and-ai, safety-limits, whole-body-control, push-recovery-balance, motion-retargeting] key_equations: [zmp, com, inverse_dynamics] key_terms: [gait, state_estimation, gait_conditioned_rl, curriculum_learning, sim_to_real] images: [] examples: [] open_questions: - "Exact RL policy observation/action space dimensions" - "How to replace the stock locomotion policy with a custom one" - "Stair climbing capability and limits" - "Running gait availability (H1-2 can run at 3.3 m/s — can G1?)" --- # Locomotion & Balance Control Walking, balance, gait generation, and whole-body control for bipedal locomotion. ## 1. Control Architecture The G1 uses a reinforcement-learning-based locomotion controller running on the proprietary locomotion computer. Users interact with it via high-level commands; the low-level balance and gait control is handled internally. [T1 — Confirmed from RL papers and developer docs] ``` User Commands (high-level API) │ ▼ ┌─────────────────────────┐ │ Locomotion Computer │ (192.168.123.161, proprietary) │ │ │ RL Policy (gait- │ ← IMU, joint encoders (500 Hz) │ conditioned, multi- │ │ phase curriculum) │ │ │ │ Motor Commands ─────────┼──→ Joint Actuators └─────────────────────────┘ ``` ### Key Architecture Details - **Framework:** Gait-conditioned reinforcement learning with multi-phase curriculum (arXiv:2505.20619) [T1] - **Gait switching:** One-hot gait ID enables dynamic switching between gaits [T1] - **Reward design:** Gait-specific reward routing mechanism with biomechanically inspired shaping [T1] - **Training:** Policies trained in simulation (Isaac Gym / MuJoCo), transferred to physical hardware [T1] - **Biomechanical features:** Straight-knee stance promotion, coordinated arm-leg swing, natural motion without motion capture data [T1] ## 2. Gait Modes | Mode | Description | Verified | Tier | |-------------------|------------------------------------------|----------|------| | Standing | Static balance, all feet grounded | Yes | T1 | | Walking | Dynamic bipedal walking | Yes | T1 | | Walk-to-stand | Smooth transition from walking to standing | Yes | T1 | | Stand-to-walk | Smooth transition from standing to walking | Yes | T1 | [T1 — Validated in arXiv:2505.20619 on real G1 hardware] ## 3. Performance | Metric | Value | Notes | Tier | |--------------------------|-----------------|--------------------------------------|------| | Maximum walking speed | 2.0 m/s | 7.2 km/h | T0 | | Verified terrain | Tile, concrete, carpet | Office-environment surfaces | T1 | | Balance recovery | Light push recovery | Stable recovery from perturbations | T1 | | Gait transition | Smooth | No abrupt mode switches | T1 | For comparison, the H1-2 (larger Unitree humanoid) achieves 3.3 m/s running. Whether the G1 has a running gait is unconfirmed. [T3] ## 4. Balance Control The RL-based locomotion policy implicitly handles balance through learned behavior rather than explicit ZMP or capture-point controllers: [T1] - **Inputs:** IMU data (orientation, angular velocity), joint encoder feedback (position, velocity), gait command - **Outputs:** Target joint positions/torques for all leg joints - **Rate:** 500 Hz control loop - **Learned behaviors:** Center-of-mass tracking, foot placement, push recovery, arm counterbalancing While classical bipedal control uses explicit ZMP constraints (see [[equations-and-bounds]]), the G1's RL policy learns these constraints implicitly during training. For deep coverage of enhanced push recovery, perturbation training, and always-on balance architectures, see [[push-recovery-balance]]. ## 5. Fall Recovery Multiple research approaches have been validated on the G1: [T1 — Research papers] - **Two-stage RL:** Supine and prone recovery policies (arXiv:2502.12152) — overcome limitations of hand-crafted controllers - **HoST framework:** Multi-critic RL with curriculum training for diverse posture recovery (arXiv:2502.08378) - **Unified fall-safety:** Combined fall prevention + impact mitigation + recovery from sparse demonstrations (arXiv:2511.07407) — zero-shot sim-to-real transfer ## 6. Terrain Adaptation | Terrain Type | Status | Notes | Tier | |-------------------|-------------|-------------------------------------|------| | Flat tile | Verified | Standard office floor | T1 | | Concrete | Verified | Indoor/outdoor flat surfaces | T1 | | Carpet | Verified | Standard office carpet | T1 | | Stairs | Unconfirmed | Research papers suggest capability | T4 | | Rough terrain | Sim only | Trained in sim, real-world unconfirmed | T3 | | Slopes | Unconfirmed | — | T4 | ## 7. User Control Interface Users control locomotion through the high-level sport mode API: [T0] - **Velocity commands:** Set forward/lateral velocity and yaw rate - **Posture commands:** Stand, sit, lie down - **Attitude adjustment:** Modify body orientation - **Trajectory tracking:** Follow waypoint sequences Low-level joint control is also possible (bypassing the locomotion controller) but requires the user to implement their own balance control. This is advanced and carries significant fall risk. [T2] ## 8. Locomotion Computer Internals The locomotion computer is a **Rockchip RK3588** (8-core ARM Cortex-A76/A55, 8GB LPDDR4X, 32GB eMMC) running Linux kernel 5.10.176-rt86+ (real-time patched). [T1 — Security research papers arXiv:2509.14096, arXiv:2509.14139] ### Software Architecture A centralized `master_service` orchestrator (9.2 MB binary) supervises **26 daemons**: [T1] | Daemon | Role | Resource Usage | |---|---|---| | `ai_sport` | Primary locomotion/balance policy | 145% CPU, 135 MB RAM | | `state_estimator` | IMU + encoder fusion | ~30% CPU | | `motion_switcher` | Gait mode management | — | | `robot_state_service` | State broadcasting | — | | `dex3_service_l/r` | Left/right hand control | — | | `webrtc_bridge` | Video streaming | — | | `ros_bridge` | ROS2 interface | — | | Others | OTA, BLE, WiFi, telemetry, etc. | — | The `ai_sport` daemon is the stock RL policy. When you enter debug mode (L2+R2), this daemon is shut down, allowing direct motor control via `rt/lowcmd`. Configuration files use proprietary **FMX encryption** (Blowfish-ECB + LCG stream cipher with static keys). This has been partially reverse-engineered by security researchers but not fully cracked. [T1] ### Can You Access the Locomotion Computer? **Root access is technically possible** via known BLE exploits (UniPwn, FreeBOT jailbreak), but **no one has publicly documented deploying a custom policy to it**: [T1] | Method | Status | Notes | |---|---|---| | SSH from network | Blocked | No SSH server exposed by default | | FreeBOT jailbreak (app WiFi field injection) | Works on firmware ≤1.6.0 | Patched Oct 2025 | | UniPwn BLE exploit (Bin4ry/UniPwn on GitHub) | Works on unpatched firmware | Hardcoded AES keys + command injection | | RockUSB physical flash | Blocked by SecureBoot on G1 | Works on Go2 only | | Replacing `ai_sport` binary after root | **Not documented** | Nobody has published doing this | | Extracting stock policy weights | **Not documented** | Binary analysis not published | **Bottom line:** Getting root on the RK3588 is solved. Getting a custom locomotion policy running natively on it is not — the `master_service` orchestrator, FMX encryption, and lack of documentation are barriers nobody has publicly overcome. [T1] ### How Every Research Group Actually Deploys All published research (BFM-Zero, gait-conditioned RL, fall recovery, etc.) uses the same approach: [T1] 1. Enter debug mode (L2+R2) — shuts down `ai_sport` 2. Run custom policy on the **Jetson Orin NX** or an external computer 3. Read `rt/lowstate`, compute actions, publish `rt/lowcmd` via DDS 4. Motor commands travel over the internal DDS network to the RK3588, which passes them to motor drivers This works but has inherent limitations: - DDS network latency (~2ms round trip) vs. native on-board execution - No access to the RK3588's real-time Linux kernel guarantees - Policy frequency limited by DDS throughput and compute (typically 200-500 Hz from Jetson) ## 9. Custom Policy Replacement (Practical) ### When to Replace - You need whole-body coordination (mocap + balance) - You need push recovery beyond what the stock controller provides - You want to run a custom RL policy trained with perturbation curriculum ### How to Replace (Debug Mode) 1. Suspend robot on stand or harness 2. Enter damping state, press **L2+R2** — `ai_sport` shuts down 3. Send `MotorCmd_` messages on `rt/lowcmd` from Jetson or external PC 4. Read `rt/lowstate` for joint positions, velocities, and IMU data 5. Publish at 500 Hz for smooth control (C++ recommended over Python for lower latency) 6. **To exit debug mode: reboot the robot** (no other way) ### Risks - **Fall risk:** If your policy fails, the robot falls immediately — no stock controller safety net - **Hardware damage:** Incorrect joint commands can damage actuators - **Always test in simulation first** (see [[simulation]]) ### Alternative: Residual Overlay Instead of full replacement, train a residual policy that adds small corrections to the stock controller output. See [[push-recovery-balance]] for details. ### WBC Frameworks For coordinated whole-body control (balance + task), see [[whole-body-control]], particularly GR00T-WBC which is designed for exactly this use case on G1. ## Key Relationships - Uses: [[joint-configuration]] (leg joints as actuators, 500 Hz commands) - Uses: [[sensors-perception]] (IMU + encoders for state estimation) - Trained via: [[learning-and-ai]] (RL training pipeline) - Bounded by: [[equations-and-bounds]] (ZMP, joint limits) - Governed by: [[safety-limits]] (fall detection, torque limits) - Extended by: [[push-recovery-balance]] (enhanced perturbation robustness) - Coordinated by: [[whole-body-control]] (WBC for combined loco-manipulation) - Enables: [[motion-retargeting]] (balance during mocap playback)