---
id: locomotion-control
title: "Locomotion & Balance Control"
status: established
source_sections: "reference/sources/paper-gait-conditioned-rl.md, reference/sources/paper-getting-up-policies.md, reference/sources/official-product-page.md"
related_topics: [joint-configuration, sensors-perception, equations-and-bounds, learning-and-ai, safety-limits, whole-body-control, push-recovery-balance, motion-retargeting]
key_equations: [zmp, com, inverse_dynamics]
key_terms: [gait, state_estimation, gait_conditioned_rl, curriculum_learning, sim_to_real]
images: []
examples: []
open_questions:
  - "Exact RL policy observation/action space dimensions"
  - "How to replace the stock locomotion policy with a custom one"
  - "Stair climbing capability and limits"
  - "Running gait availability (H1-2 can run at 3.3 m/s — can G1?)"
---

# Locomotion & Balance Control

Walking, balance, gait generation, and whole-body control for bipedal locomotion.

## 1. Control Architecture

The G1 uses a reinforcement-learning-based locomotion controller running on the proprietary locomotion computer. Users interact with it via high-level commands; the low-level balance and gait control is handled internally. [T1 — Confirmed from RL papers and developer docs]

```
User Commands (high-level API)
        │
        ▼
┌─────────────────────────┐
│  Locomotion Computer     │  (192.168.123.161, proprietary)
│                          │
│  RL Policy (gait-        │  ← IMU, joint encoders (500 Hz)
│  conditioned, multi-     │
│  phase curriculum)       │
│                          │
│  Motor Commands ─────────┼──→ Joint Actuators
└─────────────────────────┘
```

### Key Architecture Details

- **Framework:** Gait-conditioned reinforcement learning with multi-phase curriculum (arXiv:2505.20619) [T1]
- **Gait switching:** One-hot gait ID enables dynamic switching between gaits [T1]
- **Reward design:** Gait-specific reward routing mechanism with biomechanically inspired shaping [T1]
- **Training:** Policies trained in simulation (Isaac Gym / MuJoCo), transferred to physical hardware [T1]
- **Biomechanical features:** Straight-knee stance promotion, coordinated arm-leg swing, natural motion without motion capture data [T1]

## 2. Gait Modes

| Mode               | Description                              | Verified | Tier |
|-------------------|------------------------------------------|----------|------|
| Standing           | Static balance, all feet grounded        | Yes      | T1   |
| Walking            | Dynamic bipedal walking                  | Yes      | T1   |
| Walk-to-stand      | Smooth transition from walking to standing | Yes    | T1   |
| Stand-to-walk      | Smooth transition from standing to walking | Yes    | T1   |

[T1 — Validated in arXiv:2505.20619 on real G1 hardware]

## 3. Performance

| Metric                    | Value           | Notes                               | Tier |
|--------------------------|-----------------|--------------------------------------|------|
| Maximum walking speed     | 2.0 m/s         | 7.2 km/h                            | T0   |
| Verified terrain          | Tile, concrete, carpet | Office-environment surfaces   | T1   |
| Balance recovery          | Light push recovery | Stable recovery from perturbations | T1   |
| Gait transition           | Smooth           | No abrupt mode switches              | T1   |

For comparison, the H1-2 (larger Unitree humanoid) achieves 3.3 m/s running. Whether the G1 has a running gait is unconfirmed. [T3]

## 4. Balance Control

The RL-based locomotion policy implicitly handles balance through learned behavior rather than explicit ZMP or capture-point controllers: [T1]

- **Inputs:** IMU data (orientation, angular velocity), joint encoder feedback (position, velocity), gait command
- **Outputs:** Target joint positions/torques for all leg joints
- **Rate:** 500 Hz control loop
- **Learned behaviors:** Center-of-mass tracking, foot placement, push recovery, arm counterbalancing

While classical bipedal control uses explicit ZMP constraints (see [[equations-and-bounds]]), the G1's RL policy learns these constraints implicitly during training.

For deep coverage of enhanced push recovery, perturbation training, and always-on balance architectures, see [[push-recovery-balance]].

## 5. Fall Recovery

Multiple research approaches have been validated on the G1: [T1 — Research papers]

- **Two-stage RL:** Supine and prone recovery policies (arXiv:2502.12152) — overcome limitations of hand-crafted controllers
- **HoST framework:** Multi-critic RL with curriculum training for diverse posture recovery (arXiv:2502.08378)
- **Unified fall-safety:** Combined fall prevention + impact mitigation + recovery from sparse demonstrations (arXiv:2511.07407) — zero-shot sim-to-real transfer

## 6. Terrain Adaptation

| Terrain Type       | Status      | Notes                              | Tier |
|-------------------|-------------|-------------------------------------|------|
| Flat tile          | Verified    | Standard office floor               | T1   |
| Concrete           | Verified    | Indoor/outdoor flat surfaces        | T1   |
| Carpet             | Verified    | Standard office carpet              | T1   |
| Stairs             | Unconfirmed | Research papers suggest capability  | T4   |
| Rough terrain      | Sim only    | Trained in sim, real-world unconfirmed | T3 |
| Slopes             | Unconfirmed | —                                   | T4   |

## 7. User Control Interface

Users control locomotion through the high-level sport mode API: [T0]

- **Velocity commands:** Set forward/lateral velocity and yaw rate
- **Posture commands:** Stand, sit, lie down
- **Attitude adjustment:** Modify body orientation
- **Trajectory tracking:** Follow waypoint sequences

Low-level joint control is also possible (bypassing the locomotion controller) but requires the user to implement their own balance control. This is advanced and carries significant fall risk. [T2]

## 8. Locomotion Computer Internals

The locomotion computer is a **Rockchip RK3588** (8-core ARM Cortex-A76/A55, 8GB LPDDR4X, 32GB eMMC) running Linux kernel 5.10.176-rt86+ (real-time patched). [T1 — Security research papers arXiv:2509.14096, arXiv:2509.14139]

### Software Architecture

A centralized `master_service` orchestrator (9.2 MB binary) supervises **26 daemons**: [T1]

| Daemon | Role | Resource Usage |
|---|---|---|
| `ai_sport` | Primary locomotion/balance policy | 145% CPU, 135 MB RAM |
| `state_estimator` | IMU + encoder fusion | ~30% CPU |
| `motion_switcher` | Gait mode management | — |
| `robot_state_service` | State broadcasting | — |
| `dex3_service_l/r` | Left/right hand control | — |
| `webrtc_bridge` | Video streaming | — |
| `ros_bridge` | ROS2 interface | — |
| Others | OTA, BLE, WiFi, telemetry, etc. | — |

The `ai_sport` daemon is the stock RL policy. When you enter debug mode (L2+R2), this daemon is shut down, allowing direct motor control via `rt/lowcmd`.

Configuration files use proprietary **FMX encryption** (Blowfish-ECB + LCG stream cipher with static keys). This has been partially reverse-engineered by security researchers but not fully cracked. [T1]

### Can You Access the Locomotion Computer?

**Root access is technically possible** via known BLE exploits (UniPwn, FreeBOT jailbreak), but **no one has publicly documented deploying a custom policy to it**: [T1]

| Method | Status | Notes |
|---|---|---|
| SSH from network | Blocked | No SSH server exposed by default |
| FreeBOT jailbreak (app WiFi field injection) | Works on firmware ≤1.6.0 | Patched Oct 2025 |
| UniPwn BLE exploit (Bin4ry/UniPwn on GitHub) | Works on unpatched firmware | Hardcoded AES keys + command injection |
| RockUSB physical flash | Blocked by SecureBoot on G1 | Works on Go2 only |
| Replacing `ai_sport` binary after root | **Not documented** | Nobody has published doing this |
| Extracting stock policy weights | **Not documented** | Binary analysis not published |

**Bottom line:** Getting root on the RK3588 is solved. Getting a custom locomotion policy running natively on it is not — the `master_service` orchestrator, FMX encryption, and lack of documentation are barriers nobody has publicly overcome. [T1]

### How Every Research Group Actually Deploys

All published research (BFM-Zero, gait-conditioned RL, fall recovery, etc.) uses the same approach: [T1]

1. Enter debug mode (L2+R2) — shuts down `ai_sport`
2. Run custom policy on the **Jetson Orin NX** or an external computer
3. Read `rt/lowstate`, compute actions, publish `rt/lowcmd` via DDS
4. Motor commands travel over the internal DDS network to the RK3588, which passes them to motor drivers

This works but has inherent limitations:
- DDS network latency (~2ms round trip) vs. native on-board execution
- No access to the RK3588's real-time Linux kernel guarantees
- Policy frequency limited by DDS throughput and compute (typically 200-500 Hz from Jetson)

## 9. Custom Policy Replacement (Practical)

### When to Replace
- You need whole-body coordination (mocap + balance)
- You need push recovery beyond what the stock controller provides
- You want to run a custom RL policy trained with perturbation curriculum

### How to Replace (Debug Mode)
1. Suspend robot on stand or harness
2. Enter damping state, press **L2+R2** — `ai_sport` shuts down
3. Send `MotorCmd_` messages on `rt/lowcmd` from Jetson or external PC
4. Read `rt/lowstate` for joint positions, velocities, and IMU data
5. Publish at 500 Hz for smooth control (C++ recommended over Python for lower latency)
6. **To exit debug mode: reboot the robot** (no other way)

### Risks
- **Fall risk:** If your policy fails, the robot falls immediately — no stock controller safety net
- **Hardware damage:** Incorrect joint commands can damage actuators
- **Always test in simulation first** (see [[simulation]])

### Alternative: Residual Overlay
Instead of full replacement, train a residual policy that adds small corrections to the stock controller output. See [[push-recovery-balance]] for details.

### WBC Frameworks
For coordinated whole-body control (balance + task), see [[whole-body-control]], particularly GR00T-WBC which is designed for exactly this use case on G1.

## Key Relationships
- Uses: [[joint-configuration]] (leg joints as actuators, 500 Hz commands)
- Uses: [[sensors-perception]] (IMU + encoders for state estimation)
- Trained via: [[learning-and-ai]] (RL training pipeline)
- Bounded by: [[equations-and-bounds]] (ZMP, joint limits)
- Governed by: [[safety-limits]] (fall detection, torque limits)
- Extended by: [[push-recovery-balance]] (enhanced perturbation robustness)
- Coordinated by: [[whole-body-control]] (WBC for combined loco-manipulation)
- Enables: [[motion-retargeting]] (balance during mocap playback)