15 KiB
| id | title | status | source_sections | related_topics | key_equations | key_terms | images | examples | open_questions |
|---|---|---|---|---|---|---|---|---|---|
| push-recovery-balance | Push Recovery & Robust Balance | established | reference/sources/paper-gait-conditioned-rl.md, reference/sources/paper-getting-up-policies.md, reference/sources/paper-safe-control-cluttered.md, reference/sources/paper-residual-policy.md, reference/sources/paper-cbf-humanoid.md | [locomotion-control whole-body-control safety-limits equations-and-bounds learning-and-ai simulation] | [com zmp inverse_dynamics] | [push_recovery ankle_strategy hip_strategy stepping_strategy residual_policy control_barrier_function support_polygon perturbation_curriculum] | [] | [] | [What is the max recoverable push force for the stock G1 controller? Does residual policy overlay work with the proprietary locomotion computer, or does it require full replacement? What is the minimum viable sensor set for push detection (IMU only vs. IMU + F/T)? What perturbation force ranges should be used in training curriculum?] |
Push Recovery & Robust Balance
Making the G1 robust to external pushes and maintaining balance during all activities — the "always-on" stability layer.
1. Push Recovery Strategies
When a humanoid is pushed, it can respond with progressively more aggressive strategies depending on perturbation magnitude: [T1 — Established biomechanics/robotics]
Ankle Strategy (Small Perturbations)
- Mechanism: Ankle torque adjusts center of pressure (CoP) within the foot
- Range: Small pushes that don't move CoM outside the foot support area
- Speed: Fastest response (~50ms)
- G1 applicability: Yes — G1 has ankle pitch and roll joints [T1]
- Limitation: Only works for small perturbations; foot must remain flat
Hip Strategy (Medium Perturbations)
- Mechanism: Rapid hip flexion/extension shifts CoM back over support
- Range: Pushes that exceed ankle authority but don't require stepping
- Speed: Medium (~100-200ms)
- G1 applicability: Yes — G1 hip has 3 DOF with ±154° pitch [T1]
- Often combined with: Upper body countermotion (arms swing opposite to push direction)
Stepping Strategy (Large Perturbations)
- Mechanism: Take a recovery step to create a new support polygon under the shifted CoM
- Range: Large pushes where CoM exits current support polygon
- Speed: Slowest (~300-500ms, must plan and execute step)
- G1 applicability: Yes — requires whole-body coordination [T1]
- Most complex: Needs free space to step into, foot placement planning
Combined/Learned Strategy
Modern RL-based controllers learn a blended strategy that seamlessly transitions between ankle, hip, and stepping responses based on perturbation magnitude. This is the approach used by the G1's stock controller and by research push-recovery policies. [T1]
2. What the Stock Controller Already Does
The G1's proprietary RL-based locomotion controller (running on the locomotion computer at 192.168.123.161) already handles basic push recovery: [T1]
- Light push recovery during standing and walking — confirmed in arXiv:2505.20619
- Gait-conditioned policy implicitly learns balance through training
- 500 Hz control loop provides fast response to perturbations
However, the stock controller's push recovery limits are not documented. Key unknowns:
- Maximum recoverable impulse (N·s) during standing
- Maximum recoverable impulse during walking
- Whether it uses stepping recovery or only ankle/hip strategies
- How it performs when the upper body is doing something unexpected (e.g., mocap)
3. Enhancing Push Recovery
3a. Perturbation Curriculum Training
The most validated approach: train an RL policy in simulation with random external forces applied during training. [T1 — Multiple G1 papers]
Training Loop (in sim):
1. Run locomotion policy
2. At random intervals, apply external force to robot torso
- Direction: random (forward, backward, lateral)
- Magnitude: curriculum (start small, increase as policy improves)
- Duration: 0.1-0.5s impulse
3. Reward: stay upright, track velocity command, minimize energy
4. Penalty: falling, excessive joint acceleration
Key papers validated on G1:
| Paper | Approach | Validated? | Key Finding |
|---|---|---|---|
| arXiv:2505.20619 | Gait-conditioned RL with perturbations | Yes (real G1) | Push robustness during walking |
| arXiv:2511.07407 | Unified fall prevention + mitigation + recovery | Yes (zero-shot) | Combined strategy from sparse demos |
| arXiv:2502.12152 | Two-stage recovery (supine + prone) | Yes (real G1) | Get-up after falling |
| arXiv:2502.08378 | HoST multi-critic RL | Yes (real G1) | Diverse posture recovery |
Perturbation Curriculum Parameters (Typical)
| Parameter | Start | End | Notes |
|---|---|---|---|
| Max force (N) | 20 | 100-200 | Ramp over training |
| Force duration (s) | 0.1 | 0.5 | Short impulses to sustained pushes |
| Direction | Forward only | Omnidirectional | Add lateral/backward progressively |
| Frequency | Rare (every 10s) | Frequent (every 2s) | Increase as policy improves |
| Application point | Torso center | Random (torso, shoulders) | Vary to generalize |
[T2 — Ranges from research papers, not G1-specific tuning]
3b. Residual Policy Learning
Train a small "correction" policy that adds to the output of an existing base controller: [T1 — Established technique]
Base Controller Output (stock or trained): a_base
Residual Policy Output (small corrections): a_residual
Final Action: a = a_base + α * a_residual (α < 1 for safety)
Why this matters for G1:
- The stock locomotion controller is good but not customizable
- A residual policy can be trained on top of it to improve push recovery
- The scaling factor α limits how much the residual can deviate from base behavior
- This is the safest path to "enhanced" balance without replacing the stock controller
Implementation on G1 (Approach A — Overlay):
- Read current lowstate (joint positions/velocities, IMU)
- Estimate what the stock controller "wants" (by observing lowcmd at previous timestep)
- Compute residual correction based on detected perturbation
- Add correction to the stock controller's output on rt/lowcmd
- Clamp to joint limits
Challenge: The stock controller runs on the locomotion computer. The residual runs on the Jetson. There's a ~2ms DDS round-trip latency between them. This may cause instability if the residual and stock controller fight each other. [T3 — Architectural inference, not tested]
3c. Control Barrier Functions (CBFs)
A formal safety framework that guarantees the robot stays within a "safe set": [T1 — Control theory]
Safety constraint: h(x) ≥ 0 (e.g., CoM is within support polygon)
CBF condition: ḣ(x,u) + α·h(x) ≥ 0 (safety is maintained over time)
At each timestep, solve a QP:
minimize || u - u_desired ||^2 (stay close to desired action)
subject to ḣ(x,u) + α·h(x) ≥ 0 (CBF safety constraint)
u_min ≤ u ≤ u_max (actuator limits)
G1-specific work:
- arXiv:2502.02858 uses Projected Safe Set Algorithm (p-SSA) on real G1 for collision avoidance in cluttered environments
- The same CBF framework can be applied to balance: define h(x) as the distance of CoM projection from the edge of the support polygon
Pros: Formal guarantee (if model is accurate), minimal modification to existing controller (just a safety filter) Cons: Requires accurate dynamics model, computationally expensive real-time QP, conservative (may reject valid actions)
4. Always-On Balance Architecture
How to maintain balance as a background process during other activities (mocap playback, manipulation, teleoperation):
Option A: Residual Overlay on Stock Controller
┌──────────┐ high-level ┌──────────────┐ rt/lowcmd ┌──────────┐
│ Task │ commands │ Stock Loco │ (legs) │ Joint │
│ (mocap, │────────────►│ Controller │─────────────►│ Actuators│
│ manip) │ │ (proprietary)│ └──────────┘
│ │ rt/lowcmd │ │
│ │─(arms only)─►│ │
└──────────┘ └──────────────┘
+ optional residual corrections on leg joints
- Stock controller handles balance automatically
- User code controls arms/waist for task
- Optional: add small residual corrections to leg joints for enhanced stability
- Risk level: Low
- Balance authority: Whatever the stock controller provides
Option B: GR00T-WBC (Recommended)
┌──────────┐ upper-body ┌──────────────┐ rt/lowcmd ┌──────────┐
│ Task │ targets │ GR00T-WBC │ (all joints)│ Joint │
│ (mocap, │────────────►│ │─────────────►│ Actuators│
│ manip) │ │ Loco Policy │ └──────────┘
│ │ │ (RL, trained │
│ │ │ with pushes)│
└──────────┘ └──────────────┘
- Trained locomotion policy handles balance (including push recovery if perturbation-trained)
- Upper-body targets come from task (mocap, manipulation, teleoperation)
- WBC coordinator resolves conflicts between task and balance
- Risk level: Medium (need to validate RL locomotion policy)
- Balance authority: Full (can be specifically trained for perturbation robustness)
Option C: Full Custom Policy
┌──────────┐ reference ┌──────────────┐ rt/lowcmd ┌──────────┐
│ Mocap │ motion │ Custom RL │ (all joints)│ Joint │
│ Reference │────────────►│ Tracking + │─────────────►│ Actuators│
│ │ │ Balance │ └──────────┘
│ │ │ Policy │
└──────────┘ └──────────────┘
- Single RL policy that simultaneously tracks reference motion AND maintains balance
- BFM-Zero approach — trained on diverse motions with perturbation curriculum
- Risk level: High (full low-level control, must handle everything)
- Balance authority: Maximum (policy sees everything, controls everything)
- Best for: Production deployment after extensive sim validation
5. Fall Recovery (When Push Recovery Fails)
Even with robust push recovery, falls will happen during development. Recovery capability matters: [T1 — Research papers]
| Approach | Paper | Method | G1 Validated? |
|---|---|---|---|
| Two-stage RL | arXiv:2502.12152 | Separate supine/prone recovery policies | Yes |
| HoST | arXiv:2502.08378 | Multi-critic RL, diverse posture recovery | Yes |
| Unified safety | arXiv:2511.07407 | Prevention + mitigation + recovery combined | Yes (zero-shot) |
Fall Detection
- IMU-based: Detect excessive tilt angle (e.g., pitch/roll > 45°) or angular velocity
- Joint-based: Detect unexpected ground contact (arm joints hitting torque limits)
- CoM-based: Estimate CoM position, detect when it exits recoverable region
Fall Mitigation
- arXiv:2511.07407 trains a policy that, when fall is inevitable, actively reduces impact:
- Tuck arms in
- Rotate to distribute impact
- Reduce angular velocity before ground contact
6. Metrics for Push Recovery
Quantitative measures to evaluate balance robustness: [T2 — Research community standards]
| Metric | Definition | Target (Good) | Target (Excellent) |
|---|---|---|---|
| Max recoverable push (standing) | Maximum impulse (N·s) the robot survives while standing | 30 N·s | 60+ N·s |
| Max recoverable push (walking) | Maximum impulse during walking | 20 N·s | 40+ N·s |
| Recovery time | Time from perturbation to return to steady state | < 2s | < 1s |
| Success rate | % of randomized pushes survived (test distribution) | > 90% | > 98% |
| CoM deviation | Maximum CoM displacement during recovery | < 0.3m | < 0.15m |
| No-step recovery range | Max push recovered without taking a step | 20 N·s | 40 N·s |
[T3 — Targets are estimates based on research papers, not G1-specific benchmarks]
7. Training Push-Robust Policies for G1
Recommended Sim Environment
- Isaac Gym (via unitree_rl_gym) for massively parallel training
- MuJoCo (via MuJoCo Menagerie g1.xml) for validation
- Domain randomization: Friction (0.3-1.5), mass (±15%), motor strength (±10%), latency (0-10ms)
Reward Design for Push Robustness
# Pseudocode — typical reward structure
reward = (
+ w_alive * alive_bonus # Stay upright
+ w_track * velocity_tracking # Follow commanded velocity
+ w_smooth * action_smoothness # Minimize jerk
- w_energy * energy_penalty # Minimize energy use
- w_fall * fall_penalty # Heavy penalty for falling
- w_slip * foot_slip_penalty # Minimize foot sliding
+ w_upright * upright_bonus # Reward torso verticality
)
Training Stages (Multi-Phase Curriculum)
- Phase 1: Stand without falling (no perturbations)
- Phase 2: Walk on flat terrain (no perturbations)
- Phase 3: Walk with small random pushes (10-30N)
- Phase 4: Walk with medium pushes (30-80N) + terrain variation
- Phase 5: Walk with large pushes (80-200N) + task (upper body motion)
[T2 — Based on curriculum strategies in published G1 papers]
8. Development Roadmap
Recommended progression for achieving "always-on balance during mocap":
Phase 1: Evaluate stock controller push limits
└── Push test on real G1, document max impulse
Phase 2: Train push-robust locomotion policy in sim
└── unitree_rl_gym + perturbation curriculum
└── Validate in MuJoCo (Sim2Sim)
Phase 3: Deploy on real G1 (locomotion only)
└── Start with gentle pushes, increase gradually
Phase 4: Add upper-body mocap tracking
└── GR00T-WBC or custom WBC layer
└── Test: can it maintain balance while arms track mocap?
Phase 5: Combined push + mocap testing
└── Push robot while it replays mocap motion
└── Iterate on perturbation curriculum if needed
Key Relationships
- Extends: locomotion-control (enhanced version of stock balance)
- Component of: whole-body-control (balance as a constraint in WBC)
- Protects: motion-retargeting (ensures stability during mocap playback)
- Governed by: safety-limits (fall detection, e-stop integration)
- Trained via: learning-and-ai (RL with perturbation curriculum)
- Tested in: simulation (MuJoCo/Isaac with external force application)
- Bounded by: equations-and-bounds (CoM, ZMP, support polygon)