13 KiB

Raw Blame History

id	title	status	source_sections	related_topics	key_equations	key_terms	images	examples	open_questions
motion-retargeting	Motion Capture & Retargeting	established	reference/sources/paper-bfm-zero.md, reference/sources/paper-h2o.md, reference/sources/paper-omnih2o.md, reference/sources/paper-humanplus.md, reference/sources/dataset-amass-g1.md, reference/sources/github-groot-wbc.md, reference/sources/community-mocap-retarget-tools.md	[whole-body-control joint-configuration simulation learning-and-ai equations-and-bounds push-recovery-balance]	[inverse_kinematics kinematic_scaling]	[motion_retargeting mocap amass smpl kinematic_scaling inverse_kinematics]	[]	[]	[What AMASS motions have been successfully replayed on physical G1? What is the end-to-end latency from mocap capture to robot execution? Which retargeting approach gives best visual fidelity on G1 (IK vs. RL)? Can video-based pose estimation (MediaPipe/OpenPose) provide sufficient accuracy for G1 retargeting?]

Motion Capture & Retargeting

Capturing human motion and replaying it on the G1, including the kinematic mapping problem, data sources, and execution approaches.

1. The Retargeting Problem

A human has ~200+ degrees of freedom (skeleton + soft tissue). The G1 has 23-43 DOF. Retargeting must solve three mismatches: [T1 — Established robotics problem]

Mismatch	Human	G1 (29-DOF)	Challenge
DOF count	~200+	29	Many human motions have no G1 equivalent
Limb proportions	Variable	Fixed (1.32m height, 0.6m legs, ~0.45m arms)	Workspace scaling needed
Joint ranges	Very flexible	Constrained (e.g., knee 0-165°, hip pitch ±154°)	Motions may exceed limits
Dynamics	~70kg average	~35kg, different mass distribution	Forces/torques don't scale linearly

What Works Well on G1

Walking, standing, stepping motions
Upper-body gestures (waving, pointing, reaching)
Pick-and-place style manipulation
Simple dance or expressive motions

What's Difficult or Impossible

Motions requiring finger dexterity (without hands attached)
Deep squats or ground-level motions (joint limit violations)
Fast acrobatic motions (torque/speed limits)
Motions requiring more DOF than available (e.g., spine articulation with 1-DOF waist)

2. Retargeting Approaches

2a. IK-Based Retargeting (Classical)

Solve inverse kinematics to map human end-effector positions to G1 joint angles: [T1]

Pipeline:
Mocap data (human skeleton) → Extract key points (hands, feet, head, pelvis)
    → Scale to G1 proportions → Solve IK per frame → Smooth trajectory
    → Check joint limits → Execute or reject

Tools:

Pinocchio: C++/Python rigid body dynamics with fast IK solver (see whole-body-control)
MuJoCo IK: Built-in inverse kinematics in MuJoCo simulator
Drake: MIT's robotics toolbox with optimization-based IK
IKPy / ikflow: Lightweight Python IK libraries

Pros: Fast, interpretable, no training required, deterministic Cons: Frame-by-frame IK can produce jerky motions, doesn't account for dynamics/balance, may violate torque limits even if joint limits are satisfied

2b. Optimization-Based Retargeting

Solve a trajectory optimization over the full motion: [T1]

minimize    Σ_t || FK(q_t) - x_human_t ||^2          (tracking error)
          + Σ_t || q_t - q_{t-1} ||^2                 (smoothness)
subject to  q_min ≤ q_t ≤ q_max                       (joint limits)
            CoM_t ∈ support_polygon_t                   (balance)
            || tau_t || ≤ tau_max                       (torque limits)
            no self-collision                           (collision avoidance)

Tools: CasADi, Pinocchio + ProxQP, Drake, Crocoddyl Pros: Globally smooth, respects all constraints, can enforce balance Cons: Slow (offline only), requires accurate dynamics model, problem formulation complexity

2c. RL-Based Motion Tracking (Recommended for G1)

Train an RL policy that imitates reference motions while maintaining balance: [T1 — Multiple papers validated on G1]

Pipeline:
Mocap data → Retarget to G1 skeleton (rough IK) → Use as reference
    → Train RL policy in sim: reward = tracking + balance + energy
    → Deploy on real G1 via sim-to-real transfer

This is the approach used by BFM-Zero, H2O, OmniH2O, and HumanPlus. The RL policy learns to:

Track the reference motion as closely as possible
Maintain balance even when the reference motion would be unstable
Respect joint and torque limits naturally (they're part of the sim environment)
Recover from perturbations (if trained with perturbation curriculum)

Key advantage: Balance is baked into the policy — you don't need a separate balance controller.

Key RL Motion Tracking Frameworks

Framework	Paper	G1 Validated?	Key Feature
BFM-Zero	arXiv:2511.04131	Yes	Zero-shot generalization to unseen motions, open-source
H2O	arXiv:2403.01623	On humanoid (not G1 specifically)	Real-time teleoperation
OmniH2O	arXiv:2406.08858	On humanoid	Multi-modal input (VR, RGB, mocap)
HumanPlus	arXiv:2406.10454	On humanoid	RGB camera → shadow → imitate
GMT	Generic Motion Tracking	In sim	Tracks diverse AMASS motions

2d. Hybrid Approach: IK + WBC

Use IK for the upper body, WBC for balance: [T1 — GR00T-WBC approach]

Mocap data → IK retarget (upper body only: arms, waist)
    → Feed to GR00T-WBC as upper-body targets
    → WBC locomotion policy handles legs/balance automatically
    → Execute on G1

This is likely the most practical near-term approach for the G1, using GR00T-WBC as the coordination layer. See whole-body-control for details.

3. Motion Capture Sources

3a. AMASS — Archive of Motion Capture as Surface Shapes

The largest publicly available human motion dataset: [T1]

Property	Value
Motions	11,000+ sequences from 15 mocap datasets
Format	SMPL body model parameters
G1 retarget	Available on HuggingFace (unitree) — pre-retargeted
License	Research use (check individual sub-datasets)

G1-specific: Unitree has published AMASS motions retargeted to the G1 skeleton on HuggingFace. This provides ready-to-use reference trajectories for RL training or direct playback.

3b. CMU Motion Capture Database

Classic academic motion capture archive: [T1]

Property	Value
Subjects	144 subjects
Motions	2,500+ sequences
Categories	Walking, running, sports, dance, interaction, etc.
Formats	BVH, C3D, ASF+AMC
License	Free for research
URL	mocap.cs.cmu.edu

3c. Real-Time Sources (Live Mocap)

Source	Device	Latency	Accuracy	G1 Integration
XR Teleoperate	Vision Pro, Quest 3, PICO 4	Low (~50ms)	High (VR tracking)	Official (unitreerobotics/xr_teleoperate)
Kinect	Azure Kinect DK	Medium (~100ms)	Medium	Official (kinect_teleoperate)
MediaPipe	RGB camera	Low (~30ms)	Low-Medium	Community, needs retarget code
OpenPose	RGB camera	Medium	Medium	Community, needs retarget code
OptiTrack/Vicon	Marker-based system	Very low (~5ms)	Very high	Custom integration needed

For the user's goal (mocap → robot), the XR teleoperation system is the most direct path for real-time, while AMASS provides offline motion libraries.

3d. Video-Based Pose Estimation

Extract human pose from standard RGB video without mocap hardware: [T2]

MediaPipe Pose: 33 landmarks, real-time on CPU, Google
OpenPose: 25 body keypoints, GPU required
HMR2.0 / 4DHumans: SMPL mesh recovery from single image — richer than keypoints
MotionBERT: Temporal pose estimation from video sequences

These are lower fidelity than marker-based mocap but require only a webcam. HumanPlus (arXiv:2406.10454) uses RGB camera input specifically for humanoid shadowing.

4. The Retargeting Pipeline

End-to-end pipeline from human motion to G1 execution:

┌─────────────┐     ┌──────────────┐     ┌───────────────┐
│ Motion       │     │ Skeleton     │     │ Kinematic      │
│ Source       │────►│ Extraction   │────►│ Retargeting    │
│ (mocap/video)│     │ (SMPL/joints)│     │ (scale + IK)   │
└─────────────┘     └──────────────┘     └───────┬───────┘
                                                  │
                                                  ▼
┌─────────────┐     ┌──────────────┐     ┌───────────────┐
│ Execute on   │     │ WBC / RL     │     │ Feasibility    │
│ Real G1      │◄───│ Policy       │◄───│ Check          │
│ (sdk2)       │     │ (balance +   │     │ (joint limits, │
└─────────────┘     │  tracking)   │     │  stability)    │
                    └──────────────┘     └───────────────┘

Step 1: Motion Source

Offline: AMASS dataset, CMU mocap, recorded demonstrations
Real-time: XR headset, Kinect, RGB camera

Step 2: Skeleton Extraction

AMASS: Already in SMPL format, extract joint angles
BVH/C3D: Parse standard mocap formats
Video: Run pose estimator (MediaPipe, OpenPose, HMR2.0)
Output: Human joint positions/rotations per frame

Step 3: Kinematic Retargeting

Map human skeleton to G1 skeleton (limb length scaling)
Solve IK for each frame or use direct joint angle mapping
Handle DOF mismatch (project higher-DOF human motion to G1 subspace)
Clamp to G1 joint limits (see equations-and-bounds)

Step 4: Feasibility Check

Verify all joint angles within limits
Check CoM remains within support polygon (static stability)
Estimate required torques (inverse dynamics) — reject if exceeding actuator limits
Check for self-collisions

Step 5: Execution Policy

Direct playback: Send retargeted joint angles via rt/lowcmd (no balance guarantee)
WBC execution: Feed to GR00T-WBC as upper-body targets, let locomotion policy handle balance
RL tracking: Use trained motion tracking policy (BFM-Zero style) that simultaneously tracks and balances

Step 6: Deploy on Real G1

Via unitree_sdk2_python (prototyping) or unitree_sdk2 C++ (production)
500 Hz control loop, 2ms DDS latency
Always validate in simulation first (see simulation)

5. SMPL Body Model

SMPL (Skinned Multi-Person Linear model) is the standard representation for human body shape and pose in mocap datasets: [T1]

Parameters: 72 pose parameters (24 joints x 3 rotations) + 10 shape parameters
Output: 6,890 vertices mesh + joint locations
Extensions: SMPL-X (hands + face), SMPL+H (hands)
Relevance: AMASS uses SMPL, so retargeting from AMASS means mapping SMPL joints → G1 joints

SMPL to G1 Joint Mapping (Approximate)

SMPL Joint	G1 Joint(s)	Notes
Pelvis	Waist (yaw)	G1 has 1-3 waist DOF vs. SMPL's 3
L/R Hip	left/right_hip_pitch/roll/yaw	Direct mapping, 3-DOF each
L/R Knee	left/right_knee	Direct mapping, 1-DOF
L/R Ankle	left/right_ankle_pitch/roll	Direct mapping, 2-DOF
L/R Shoulder	left/right_shoulder_pitch/roll/yaw	Direct mapping, 3-DOF
L/R Elbow	left/right_elbow	Direct mapping, 1-DOF
L/R Wrist	left/right_wrist_yaw(+pitch+roll)	1-DOF (23-DOF) or 3-DOF (29-DOF)
Spine	Waist (limited)	SMPL has 3 spine joints, G1 has 1-3 waist
Head/Neck	—	G1 has no head/neck DOF
Fingers	Hand joints (if equipped)	Only with Dex3-1 or INSPIRE

6. Key Software & Repositories

Tool	Purpose	Language	License
GR00T-WBC	End-to-end WBC + retargeting for G1	Python/C++	Apache 2.0
Pinocchio	Rigid body dynamics, IK, Jacobians	C++/Python	BSD-2
xr_teleoperate	Real-time VR mocap → G1	Python	Unitree
unitree_mujoco	Simulate retargeted motions	C++/Python	BSD-3
smplx (Python)	SMPL body model processing	Python	MIT
rofunc	Robot learning from human demos + retargeting	Python	MIT
MuJoCo Menagerie	G1 model (g1.xml) for IK/simulation	MJCF	BSD-3

Key Relationships

Requires: joint-configuration (target skeleton — DOF, joint limits, link lengths)
Executed via: whole-body-control (WBC provides balance during playback)
Stabilized by: push-recovery-balance (perturbation robustness during execution)
Trained in: simulation (RL tracking policies trained in MuJoCo/Isaac)
Training methods: learning-and-ai (RL, imitation learning frameworks)
Bounded by: equations-and-bounds (joint limits, torque limits for feasibility)

13 KiB Raw Blame History