15 KiB

Raw Permalink Blame History

id	title	status	source_sections	related_topics	key_equations	key_terms	images	examples	open_questions
push-recovery-balance	Push Recovery & Robust Balance	established	reference/sources/paper-gait-conditioned-rl.md, reference/sources/paper-getting-up-policies.md, reference/sources/paper-safe-control-cluttered.md, reference/sources/paper-residual-policy.md, reference/sources/paper-cbf-humanoid.md	[locomotion-control whole-body-control safety-limits equations-and-bounds learning-and-ai simulation]	[com zmp inverse_dynamics]	[push_recovery ankle_strategy hip_strategy stepping_strategy residual_policy control_barrier_function support_polygon perturbation_curriculum]	[]	[]	[What is the max recoverable push force for the stock G1 controller? Does residual policy overlay work with the proprietary locomotion computer, or does it require full replacement? What is the minimum viable sensor set for push detection (IMU only vs. IMU + F/T)? What perturbation force ranges should be used in training curriculum?]

Push Recovery & Robust Balance

Making the G1 robust to external pushes and maintaining balance during all activities — the "always-on" stability layer.

1. Push Recovery Strategies

When a humanoid is pushed, it can respond with progressively more aggressive strategies depending on perturbation magnitude: [T1 — Established biomechanics/robotics]

Ankle Strategy (Small Perturbations)

Mechanism: Ankle torque adjusts center of pressure (CoP) within the foot
Range: Small pushes that don't move CoM outside the foot support area
Speed: Fastest response (~50ms)
G1 applicability: Yes — G1 has ankle pitch and roll joints [T1]
Limitation: Only works for small perturbations; foot must remain flat

Hip Strategy (Medium Perturbations)

Mechanism: Rapid hip flexion/extension shifts CoM back over support
Range: Pushes that exceed ankle authority but don't require stepping
Speed: Medium (~100-200ms)
G1 applicability: Yes — G1 hip has 3 DOF with ±154° pitch [T1]
Often combined with: Upper body countermotion (arms swing opposite to push direction)

Stepping Strategy (Large Perturbations)

Mechanism: Take a recovery step to create a new support polygon under the shifted CoM
Range: Large pushes where CoM exits current support polygon
Speed: Slowest (~300-500ms, must plan and execute step)
G1 applicability: Yes — requires whole-body coordination [T1]
Most complex: Needs free space to step into, foot placement planning

Combined/Learned Strategy

Modern RL-based controllers learn a blended strategy that seamlessly transitions between ankle, hip, and stepping responses based on perturbation magnitude. This is the approach used by the G1's stock controller and by research push-recovery policies. [T1]

2. What the Stock Controller Already Does

The G1's proprietary RL-based locomotion controller (running on the locomotion computer at 192.168.123.161) already handles basic push recovery: [T1]

Light push recovery during standing and walking — confirmed in arXiv:2505.20619
Gait-conditioned policy implicitly learns balance through training
500 Hz control loop provides fast response to perturbations

However, the stock controller's push recovery limits are not documented. Key unknowns:

Maximum recoverable impulse (N·s) during standing
Maximum recoverable impulse during walking
Whether it uses stepping recovery or only ankle/hip strategies
How it performs when the upper body is doing something unexpected (e.g., mocap)

3. Enhancing Push Recovery

3a. Perturbation Curriculum Training

The most validated approach: train an RL policy in simulation with random external forces applied during training. [T1 — Multiple G1 papers]

Training Loop (in sim):
1. Run locomotion policy
2. At random intervals, apply external force to robot torso
   - Direction: random (forward, backward, lateral)
   - Magnitude: curriculum (start small, increase as policy improves)
   - Duration: 0.1-0.5s impulse
3. Reward: stay upright, track velocity command, minimize energy
4. Penalty: falling, excessive joint acceleration

Key papers validated on G1:

Paper	Approach	Validated?	Key Finding
arXiv:2505.20619	Gait-conditioned RL with perturbations	Yes (real G1)	Push robustness during walking
arXiv:2511.07407	Unified fall prevention + mitigation + recovery	Yes (zero-shot)	Combined strategy from sparse demos
arXiv:2502.12152	Two-stage recovery (supine + prone)	Yes (real G1)	Get-up after falling
arXiv:2502.08378	HoST multi-critic RL	Yes (real G1)	Diverse posture recovery

Perturbation Curriculum Parameters (Typical)

Parameter	Start	End	Notes
Max force (N)	20	100-200	Ramp over training
Force duration (s)	0.1	0.5	Short impulses to sustained pushes
Direction	Forward only	Omnidirectional	Add lateral/backward progressively
Frequency	Rare (every 10s)	Frequent (every 2s)	Increase as policy improves
Application point	Torso center	Random (torso, shoulders)	Vary to generalize

[T2 — Ranges from research papers, not G1-specific tuning]

3b. Residual Policy Learning

Train a small "correction" policy that adds to the output of an existing base controller: [T1 — Established technique]

Base Controller Output (stock or trained):  a_base
Residual Policy Output (small corrections):  a_residual
Final Action:  a = a_base + α * a_residual     (α < 1 for safety)

Why this matters for G1:

The stock locomotion controller is good but not customizable
A residual policy can be trained on top of it to improve push recovery
The scaling factor α limits how much the residual can deviate from base behavior
This is the safest path to "enhanced" balance without replacing the stock controller

Implementation on G1 (Approach A — Overlay):

Read current lowstate (joint positions/velocities, IMU)
Estimate what the stock controller "wants" (by observing lowcmd at previous timestep)
Compute residual correction based on detected perturbation
Add correction to the stock controller's output on rt/lowcmd
Clamp to joint limits

Challenge: The stock controller runs on the locomotion computer. The residual runs on the Jetson. There's a ~2ms DDS round-trip latency between them. This may cause instability if the residual and stock controller fight each other. [T3 — Architectural inference, not tested]

3c. Control Barrier Functions (CBFs)

A formal safety framework that guarantees the robot stays within a "safe set": [T1 — Control theory]

Safety constraint:  h(x) ≥ 0     (e.g., CoM is within support polygon)
CBF condition:      ḣ(x,u) + α·h(x) ≥ 0     (safety is maintained over time)

At each timestep, solve a QP:

minimize    || u - u_desired ||^2     (stay close to desired action)
subject to  ḣ(x,u) + α·h(x) ≥ 0     (CBF safety constraint)
            u_min ≤ u ≤ u_max         (actuator limits)

G1-specific work:

arXiv:2502.02858 uses Projected Safe Set Algorithm (p-SSA) on real G1 for collision avoidance in cluttered environments
The same CBF framework can be applied to balance: define h(x) as the distance of CoM projection from the edge of the support polygon

Pros: Formal guarantee (if model is accurate), minimal modification to existing controller (just a safety filter) Cons: Requires accurate dynamics model, computationally expensive real-time QP, conservative (may reject valid actions)

4. Always-On Balance Architecture

How to maintain balance as a background process during other activities (mocap playback, manipulation, teleoperation):

Option A: Residual Overlay on Stock Controller

┌──────────┐  high-level  ┌──────────────┐  rt/lowcmd   ┌──────────┐
│ Task      │  commands    │ Stock Loco   │  (legs)      │ Joint    │
│ (mocap,   │────────────►│ Controller   │─────────────►│ Actuators│
│ manip)    │              │ (proprietary)│              └──────────┘
│           │  rt/lowcmd   │              │
│           │─(arms only)─►│              │
└──────────┘              └──────────────┘
    + optional residual corrections on leg joints

Stock controller handles balance automatically
User code controls arms/waist for task
Optional: add small residual corrections to leg joints for enhanced stability
Risk level: Low
Balance authority: Whatever the stock controller provides

Option B: GR00T-WBC (Recommended)

┌──────────┐  upper-body  ┌──────────────┐  rt/lowcmd   ┌──────────┐
│ Task      │  targets     │ GR00T-WBC    │  (all joints)│ Joint    │
│ (mocap,   │────────────►│              │─────────────►│ Actuators│
│ manip)    │              │ Loco Policy  │              └──────────┘
│           │              │ (RL, trained │
│           │              │  with pushes)│
└──────────┘              └──────────────┘

Trained locomotion policy handles balance (including push recovery if perturbation-trained)
Upper-body targets come from task (mocap, manipulation, teleoperation)
WBC coordinator resolves conflicts between task and balance
Risk level: Medium (need to validate RL locomotion policy)
Balance authority: Full (can be specifically trained for perturbation robustness)

Option C: Full Custom Policy

┌──────────┐  reference   ┌──────────────┐  rt/lowcmd   ┌──────────┐
│ Mocap     │  motion      │ Custom RL    │  (all joints)│ Joint    │
│ Reference │────────────►│ Tracking +   │─────────────►│ Actuators│
│           │              │ Balance      │              └──────────┘
│           │              │ Policy       │
└──────────┘              └──────────────┘

Single RL policy that simultaneously tracks reference motion AND maintains balance
BFM-Zero approach — trained on diverse motions with perturbation curriculum
Risk level: High (full low-level control, must handle everything)
Balance authority: Maximum (policy sees everything, controls everything)
Best for: Production deployment after extensive sim validation

5. Fall Recovery (When Push Recovery Fails)

Even with robust push recovery, falls will happen during development. Recovery capability matters: [T1 — Research papers]

Approach	Paper	Method	G1 Validated?
Two-stage RL	arXiv:2502.12152	Separate supine/prone recovery policies	Yes
HoST	arXiv:2502.08378	Multi-critic RL, diverse posture recovery	Yes
Unified safety	arXiv:2511.07407	Prevention + mitigation + recovery combined	Yes (zero-shot)

Fall Detection

IMU-based: Detect excessive tilt angle (e.g., pitch/roll > 45°) or angular velocity
Joint-based: Detect unexpected ground contact (arm joints hitting torque limits)
CoM-based: Estimate CoM position, detect when it exits recoverable region

Fall Mitigation

arXiv:2511.07407 trains a policy that, when fall is inevitable, actively reduces impact:
- Tuck arms in
- Rotate to distribute impact
- Reduce angular velocity before ground contact

6. Metrics for Push Recovery

Quantitative measures to evaluate balance robustness: [T2 — Research community standards]

Metric	Definition	Target (Good)	Target (Excellent)
Max recoverable push (standing)	Maximum impulse (N·s) the robot survives while standing	30 N·s	60+ N·s
Max recoverable push (walking)	Maximum impulse during walking	20 N·s	40+ N·s
Recovery time	Time from perturbation to return to steady state	< 2s	< 1s
Success rate	% of randomized pushes survived (test distribution)	> 90%	> 98%
CoM deviation	Maximum CoM displacement during recovery	< 0.3m	< 0.15m
No-step recovery range	Max push recovered without taking a step	20 N·s	40 N·s

[T3 — Targets are estimates based on research papers, not G1-specific benchmarks]

7. Training Push-Robust Policies for G1

Recommended Sim Environment

Isaac Gym (via unitree_rl_gym) for massively parallel training
MuJoCo (via MuJoCo Menagerie g1.xml) for validation
Domain randomization: Friction (0.3-1.5), mass (±15%), motor strength (±10%), latency (0-10ms)

Reward Design for Push Robustness

# Pseudocode — typical reward structure
reward = (
    + w_alive * alive_bonus            # Stay upright
    + w_track * velocity_tracking      # Follow commanded velocity
    + w_smooth * action_smoothness     # Minimize jerk
    - w_energy * energy_penalty        # Minimize energy use
    - w_fall * fall_penalty            # Heavy penalty for falling
    - w_slip * foot_slip_penalty       # Minimize foot sliding
    + w_upright * upright_bonus        # Reward torso verticality
)

Training Stages (Multi-Phase Curriculum)

Phase 1: Stand without falling (no perturbations)
Phase 2: Walk on flat terrain (no perturbations)
Phase 3: Walk with small random pushes (10-30N)
Phase 4: Walk with medium pushes (30-80N) + terrain variation
Phase 5: Walk with large pushes (80-200N) + task (upper body motion)

[T2 — Based on curriculum strategies in published G1 papers]

8. Development Roadmap

Recommended progression for achieving "always-on balance during mocap":

Phase 1: Evaluate stock controller push limits
    └── Push test on real G1, document max impulse
Phase 2: Train push-robust locomotion policy in sim
    └── unitree_rl_gym + perturbation curriculum
    └── Validate in MuJoCo (Sim2Sim)
Phase 3: Deploy on real G1 (locomotion only)
    └── Start with gentle pushes, increase gradually
Phase 4: Add upper-body mocap tracking
    └── GR00T-WBC or custom WBC layer
    └── Test: can it maintain balance while arms track mocap?
Phase 5: Combined push + mocap testing
    └── Push robot while it replays mocap motion
    └── Iterate on perturbation curriculum if needed

Key Relationships

Extends: locomotion-control (enhanced version of stock balance)
Component of: whole-body-control (balance as a constraint in WBC)
Protects: motion-retargeting (ensures stability during mocap playback)
Governed by: safety-limits (fall detection, e-stop integration)
Trained via: learning-and-ai (RL with perturbation curriculum)
Tested in: simulation (MuJoCo/Isaac with external force application)
Bounded by: equations-and-bounds (CoM, ZMP, support polygon)

15 KiB Raw Permalink Blame History