You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 

15 KiB

id title status source_sections related_topics key_equations key_terms images examples open_questions
push-recovery-balance Push Recovery & Robust Balance established reference/sources/paper-gait-conditioned-rl.md, reference/sources/paper-getting-up-policies.md, reference/sources/paper-safe-control-cluttered.md, reference/sources/paper-residual-policy.md, reference/sources/paper-cbf-humanoid.md [locomotion-control whole-body-control safety-limits equations-and-bounds learning-and-ai simulation] [com zmp inverse_dynamics] [push_recovery ankle_strategy hip_strategy stepping_strategy residual_policy control_barrier_function support_polygon perturbation_curriculum] [] [] [What is the max recoverable push force for the stock G1 controller? Does residual policy overlay work with the proprietary locomotion computer, or does it require full replacement? What is the minimum viable sensor set for push detection (IMU only vs. IMU + F/T)? What perturbation force ranges should be used in training curriculum?]

Push Recovery & Robust Balance

Making the G1 robust to external pushes and maintaining balance during all activities — the "always-on" stability layer.

1. Push Recovery Strategies

When a humanoid is pushed, it can respond with progressively more aggressive strategies depending on perturbation magnitude: [T1 — Established biomechanics/robotics]

Ankle Strategy (Small Perturbations)

  • Mechanism: Ankle torque adjusts center of pressure (CoP) within the foot
  • Range: Small pushes that don't move CoM outside the foot support area
  • Speed: Fastest response (~50ms)
  • G1 applicability: Yes — G1 has ankle pitch and roll joints [T1]
  • Limitation: Only works for small perturbations; foot must remain flat

Hip Strategy (Medium Perturbations)

  • Mechanism: Rapid hip flexion/extension shifts CoM back over support
  • Range: Pushes that exceed ankle authority but don't require stepping
  • Speed: Medium (~100-200ms)
  • G1 applicability: Yes — G1 hip has 3 DOF with ±154° pitch [T1]
  • Often combined with: Upper body countermotion (arms swing opposite to push direction)

Stepping Strategy (Large Perturbations)

  • Mechanism: Take a recovery step to create a new support polygon under the shifted CoM
  • Range: Large pushes where CoM exits current support polygon
  • Speed: Slowest (~300-500ms, must plan and execute step)
  • G1 applicability: Yes — requires whole-body coordination [T1]
  • Most complex: Needs free space to step into, foot placement planning

Combined/Learned Strategy

Modern RL-based controllers learn a blended strategy that seamlessly transitions between ankle, hip, and stepping responses based on perturbation magnitude. This is the approach used by the G1's stock controller and by research push-recovery policies. [T1]

2. What the Stock Controller Already Does

The G1's proprietary RL-based locomotion controller (running on the locomotion computer at 192.168.123.161) already handles basic push recovery: [T1]

  • Light push recovery during standing and walking — confirmed in arXiv:2505.20619
  • Gait-conditioned policy implicitly learns balance through training
  • 500 Hz control loop provides fast response to perturbations

However, the stock controller's push recovery limits are not documented. Key unknowns:

  • Maximum recoverable impulse (N·s) during standing
  • Maximum recoverable impulse during walking
  • Whether it uses stepping recovery or only ankle/hip strategies
  • How it performs when the upper body is doing something unexpected (e.g., mocap)

3. Enhancing Push Recovery

3a. Perturbation Curriculum Training

The most validated approach: train an RL policy in simulation with random external forces applied during training. [T1 — Multiple G1 papers]

Training Loop (in sim):
1. Run locomotion policy
2. At random intervals, apply external force to robot torso
   - Direction: random (forward, backward, lateral)
   - Magnitude: curriculum (start small, increase as policy improves)
   - Duration: 0.1-0.5s impulse
3. Reward: stay upright, track velocity command, minimize energy
4. Penalty: falling, excessive joint acceleration

Key papers validated on G1:

Paper Approach Validated? Key Finding
arXiv:2505.20619 Gait-conditioned RL with perturbations Yes (real G1) Push robustness during walking
arXiv:2511.07407 Unified fall prevention + mitigation + recovery Yes (zero-shot) Combined strategy from sparse demos
arXiv:2502.12152 Two-stage recovery (supine + prone) Yes (real G1) Get-up after falling
arXiv:2502.08378 HoST multi-critic RL Yes (real G1) Diverse posture recovery

Perturbation Curriculum Parameters (Typical)

Parameter Start End Notes
Max force (N) 20 100-200 Ramp over training
Force duration (s) 0.1 0.5 Short impulses to sustained pushes
Direction Forward only Omnidirectional Add lateral/backward progressively
Frequency Rare (every 10s) Frequent (every 2s) Increase as policy improves
Application point Torso center Random (torso, shoulders) Vary to generalize

[T2 — Ranges from research papers, not G1-specific tuning]

3b. Residual Policy Learning

Train a small "correction" policy that adds to the output of an existing base controller: [T1 — Established technique]

Base Controller Output (stock or trained):  a_base
Residual Policy Output (small corrections):  a_residual
Final Action:  a = a_base + α * a_residual     (α < 1 for safety)

Why this matters for G1:

  • The stock locomotion controller is good but not customizable
  • A residual policy can be trained on top of it to improve push recovery
  • The scaling factor α limits how much the residual can deviate from base behavior
  • This is the safest path to "enhanced" balance without replacing the stock controller

Implementation on G1 (Approach A — Overlay):

  1. Read current lowstate (joint positions/velocities, IMU)
  2. Estimate what the stock controller "wants" (by observing lowcmd at previous timestep)
  3. Compute residual correction based on detected perturbation
  4. Add correction to the stock controller's output on rt/lowcmd
  5. Clamp to joint limits

Challenge: The stock controller runs on the locomotion computer. The residual runs on the Jetson. There's a ~2ms DDS round-trip latency between them. This may cause instability if the residual and stock controller fight each other. [T3 — Architectural inference, not tested]

3c. Control Barrier Functions (CBFs)

A formal safety framework that guarantees the robot stays within a "safe set": [T1 — Control theory]

Safety constraint:  h(x) ≥ 0     (e.g., CoM is within support polygon)
CBF condition:      ḣ(x,u) + α·h(x) ≥ 0     (safety is maintained over time)

At each timestep, solve a QP:

minimize    || u - u_desired ||^2     (stay close to desired action)
subject to  ḣ(x,u) + α·h(x) ≥ 0     (CBF safety constraint)
            u_min ≤ u ≤ u_max         (actuator limits)

G1-specific work:

  • arXiv:2502.02858 uses Projected Safe Set Algorithm (p-SSA) on real G1 for collision avoidance in cluttered environments
  • The same CBF framework can be applied to balance: define h(x) as the distance of CoM projection from the edge of the support polygon

Pros: Formal guarantee (if model is accurate), minimal modification to existing controller (just a safety filter) Cons: Requires accurate dynamics model, computationally expensive real-time QP, conservative (may reject valid actions)

4. Always-On Balance Architecture

How to maintain balance as a background process during other activities (mocap playback, manipulation, teleoperation):

Option A: Residual Overlay on Stock Controller

┌──────────┐  high-level  ┌──────────────┐  rt/lowcmd   ┌──────────┐
│ Task      │  commands    │ Stock Loco   │  (legs)      │ Joint    │
│ (mocap,   │────────────►│ Controller   │─────────────►│ Actuators│
│ manip)    │              │ (proprietary)│              └──────────┘
│           │  rt/lowcmd   │              │
│           │─(arms only)─►│              │
└──────────┘              └──────────────┘
    + optional residual corrections on leg joints
  • Stock controller handles balance automatically
  • User code controls arms/waist for task
  • Optional: add small residual corrections to leg joints for enhanced stability
  • Risk level: Low
  • Balance authority: Whatever the stock controller provides
┌──────────┐  upper-body  ┌──────────────┐  rt/lowcmd   ┌──────────┐
│ Task      │  targets     │ GR00T-WBC    │  (all joints)│ Joint    │
│ (mocap,   │────────────►│              │─────────────►│ Actuators│
│ manip)    │              │ Loco Policy  │              └──────────┘
│           │              │ (RL, trained │
│           │              │  with pushes)│
└──────────┘              └──────────────┘
  • Trained locomotion policy handles balance (including push recovery if perturbation-trained)
  • Upper-body targets come from task (mocap, manipulation, teleoperation)
  • WBC coordinator resolves conflicts between task and balance
  • Risk level: Medium (need to validate RL locomotion policy)
  • Balance authority: Full (can be specifically trained for perturbation robustness)

Option C: Full Custom Policy

┌──────────┐  reference   ┌──────────────┐  rt/lowcmd   ┌──────────┐
│ Mocap     │  motion      │ Custom RL    │  (all joints)│ Joint    │
│ Reference │────────────►│ Tracking +   │─────────────►│ Actuators│
│           │              │ Balance      │              └──────────┘
│           │              │ Policy       │
└──────────┘              └──────────────┘
  • Single RL policy that simultaneously tracks reference motion AND maintains balance
  • BFM-Zero approach — trained on diverse motions with perturbation curriculum
  • Risk level: High (full low-level control, must handle everything)
  • Balance authority: Maximum (policy sees everything, controls everything)
  • Best for: Production deployment after extensive sim validation

5. Fall Recovery (When Push Recovery Fails)

Even with robust push recovery, falls will happen during development. Recovery capability matters: [T1 — Research papers]

Approach Paper Method G1 Validated?
Two-stage RL arXiv:2502.12152 Separate supine/prone recovery policies Yes
HoST arXiv:2502.08378 Multi-critic RL, diverse posture recovery Yes
Unified safety arXiv:2511.07407 Prevention + mitigation + recovery combined Yes (zero-shot)

Fall Detection

  • IMU-based: Detect excessive tilt angle (e.g., pitch/roll > 45°) or angular velocity
  • Joint-based: Detect unexpected ground contact (arm joints hitting torque limits)
  • CoM-based: Estimate CoM position, detect when it exits recoverable region

Fall Mitigation

  • arXiv:2511.07407 trains a policy that, when fall is inevitable, actively reduces impact:
    • Tuck arms in
    • Rotate to distribute impact
    • Reduce angular velocity before ground contact

6. Metrics for Push Recovery

Quantitative measures to evaluate balance robustness: [T2 — Research community standards]

Metric Definition Target (Good) Target (Excellent)
Max recoverable push (standing) Maximum impulse (N·s) the robot survives while standing 30 N·s 60+ N·s
Max recoverable push (walking) Maximum impulse during walking 20 N·s 40+ N·s
Recovery time Time from perturbation to return to steady state < 2s < 1s
Success rate % of randomized pushes survived (test distribution) > 90% > 98%
CoM deviation Maximum CoM displacement during recovery < 0.3m < 0.15m
No-step recovery range Max push recovered without taking a step 20 N·s 40 N·s

[T3 — Targets are estimates based on research papers, not G1-specific benchmarks]

7. Training Push-Robust Policies for G1

  • Isaac Gym (via unitree_rl_gym) for massively parallel training
  • MuJoCo (via MuJoCo Menagerie g1.xml) for validation
  • Domain randomization: Friction (0.3-1.5), mass (±15%), motor strength (±10%), latency (0-10ms)

Reward Design for Push Robustness

# Pseudocode — typical reward structure
reward = (
    + w_alive * alive_bonus            # Stay upright
    + w_track * velocity_tracking      # Follow commanded velocity
    + w_smooth * action_smoothness     # Minimize jerk
    - w_energy * energy_penalty        # Minimize energy use
    - w_fall * fall_penalty            # Heavy penalty for falling
    - w_slip * foot_slip_penalty       # Minimize foot sliding
    + w_upright * upright_bonus        # Reward torso verticality
)

Training Stages (Multi-Phase Curriculum)

  1. Phase 1: Stand without falling (no perturbations)
  2. Phase 2: Walk on flat terrain (no perturbations)
  3. Phase 3: Walk with small random pushes (10-30N)
  4. Phase 4: Walk with medium pushes (30-80N) + terrain variation
  5. Phase 5: Walk with large pushes (80-200N) + task (upper body motion)

[T2 — Based on curriculum strategies in published G1 papers]

8. Development Roadmap

Recommended progression for achieving "always-on balance during mocap":

Phase 1: Evaluate stock controller push limits
    └── Push test on real G1, document max impulse
Phase 2: Train push-robust locomotion policy in sim
    └── unitree_rl_gym + perturbation curriculum
    └── Validate in MuJoCo (Sim2Sim)
Phase 3: Deploy on real G1 (locomotion only)
    └── Start with gentle pushes, increase gradually
Phase 4: Add upper-body mocap tracking
    └── GR00T-WBC or custom WBC layer
    └── Test: can it maintain balance while arms track mocap?
Phase 5: Combined push + mocap testing
    └── Push robot while it replays mocap motion
    └── Iterate on perturbation curriculum if needed

Key Relationships