You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 

14 KiB

id title status source_sections related_topics key_equations key_terms images examples open_questions
motion-retargeting Motion Capture & Retargeting established reference/sources/paper-bfm-zero.md, reference/sources/paper-h2o.md, reference/sources/paper-omnih2o.md, reference/sources/paper-humanplus.md, reference/sources/dataset-amass-g1.md, reference/sources/github-groot-wbc.md, reference/sources/community-mocap-retarget-tools.md [whole-body-control joint-configuration simulation learning-and-ai equations-and-bounds push-recovery-balance] [inverse_kinematics kinematic_scaling] [motion_retargeting mocap amass smpl kinematic_scaling inverse_kinematics] [] [] [What AMASS motions have been successfully replayed on physical G1? What is the end-to-end latency from mocap capture to robot execution? Which retargeting approach gives best visual fidelity on G1 (IK vs. RL)? Can video-based pose estimation (MediaPipe/OpenPose) provide sufficient accuracy for G1 retargeting?]

Motion Capture & Retargeting

Capturing human motion and replaying it on the G1, including the kinematic mapping problem, data sources, and execution approaches.

1. The Retargeting Problem

A human has ~200+ degrees of freedom (skeleton + soft tissue). The G1 has 23-43 DOF. Retargeting must solve three mismatches: [T1 — Established robotics problem]

Mismatch Human G1 (29-DOF) Challenge
DOF count ~200+ 29 Many human motions have no G1 equivalent
Limb proportions Variable Fixed (1.32m height, 0.6m legs, ~0.45m arms) Workspace scaling needed
Joint ranges Very flexible Constrained (e.g., knee 0-165°, hip pitch ±154°) Motions may exceed limits
Dynamics ~70kg average ~35kg, different mass distribution Forces/torques don't scale linearly

What Works Well on G1

  • Walking, standing, stepping motions
  • Upper-body gestures (waving, pointing, reaching)
  • Pick-and-place style manipulation
  • Simple dance or expressive motions

What's Difficult or Impossible

  • Motions requiring finger dexterity (without hands attached)
  • Deep squats or ground-level motions (joint limit violations)
  • Fast acrobatic motions (torque/speed limits)
  • Motions requiring more DOF than available (e.g., spine articulation with 1-DOF waist)

2. Retargeting Approaches

2a. IK-Based Retargeting (Classical)

Solve inverse kinematics to map human end-effector positions to G1 joint angles: [T1]

Pipeline:
Mocap data (human skeleton) → Extract key points (hands, feet, head, pelvis)
    → Scale to G1 proportions → Solve IK per frame → Smooth trajectory
    → Check joint limits → Execute or reject

Tools:

  • Pinocchio: C++/Python rigid body dynamics with fast IK solver (see whole-body-control)
  • MuJoCo IK: Built-in inverse kinematics in MuJoCo simulator
  • Drake: MIT's robotics toolbox with optimization-based IK
  • IKPy / ikflow: Lightweight Python IK libraries

Pros: Fast, interpretable, no training required, deterministic Cons: Frame-by-frame IK can produce jerky motions, doesn't account for dynamics/balance, may violate torque limits even if joint limits are satisfied

2b. Optimization-Based Retargeting

Solve a trajectory optimization over the full motion: [T1]

minimize    Σ_t || FK(q_t) - x_human_t ||^2          (tracking error)
          + Σ_t || q_t - q_{t-1} ||^2                 (smoothness)
subject to  q_min ≤ q_t ≤ q_max                       (joint limits)
            CoM_t ∈ support_polygon_t                   (balance)
            || tau_t || ≤ tau_max                       (torque limits)
            no self-collision                           (collision avoidance)

Tools: CasADi, Pinocchio + ProxQP, Drake, Crocoddyl Pros: Globally smooth, respects all constraints, can enforce balance Cons: Slow (offline only), requires accurate dynamics model, problem formulation complexity

Train an RL policy that imitates reference motions while maintaining balance: [T1 — Multiple papers validated on G1]

Pipeline:
Mocap data → Retarget to G1 skeleton (rough IK) → Use as reference
    → Train RL policy in sim: reward = tracking + balance + energy
    → Deploy on real G1 via sim-to-real transfer

This is the approach used by BFM-Zero, H2O, OmniH2O, and HumanPlus. The RL policy learns to:

  • Track the reference motion as closely as possible
  • Maintain balance even when the reference motion would be unstable
  • Respect joint and torque limits naturally (they're part of the sim environment)
  • Recover from perturbations (if trained with perturbation curriculum)

Key advantage: Balance is baked into the policy — you don't need a separate balance controller.

Key RL Motion Tracking Frameworks

Framework Paper G1 Validated? Key Feature
BFM-Zero arXiv:2511.04131 Yes Zero-shot generalization to unseen motions, open-source
H2O arXiv:2403.01623 On humanoid (not G1 specifically) Real-time teleoperation
OmniH2O arXiv:2406.08858 On humanoid Multi-modal input (VR, RGB, mocap)
HumanPlus arXiv:2406.10454 On humanoid RGB camera → shadow → imitate
GMT Generic Motion Tracking In sim Tracks diverse AMASS motions

2d. Hybrid Approach: IK + WBC

Use IK for the upper body, WBC for balance: [T1 — GR00T-WBC approach]

Mocap data → IK retarget (upper body only: arms, waist)
    → Feed to GR00T-WBC as upper-body targets
    → WBC locomotion policy handles legs/balance automatically
    → Execute on G1

This is likely the most practical near-term approach for the G1, using GR00T-WBC as the coordination layer. See whole-body-control for details.

3. Motion Capture Sources

3a. AMASS — Archive of Motion Capture as Surface Shapes

The largest publicly available human motion dataset: [T1]

Property Value
Motions 11,000+ sequences from 15 mocap datasets
Format SMPL body model parameters
G1 retarget Available on HuggingFace (unitree) — pre-retargeted
License Research use (check individual sub-datasets)

G1-specific: Unitree has published AMASS motions retargeted to the G1 skeleton on HuggingFace. This provides ready-to-use reference trajectories for RL training or direct playback.

3b. CMU Motion Capture Database

Classic academic motion capture archive: [T1]

Property Value
Subjects 144 subjects
Motions 2,500+ sequences
Categories Walking, running, sports, dance, interaction, etc.
Formats BVH, C3D, ASF+AMC
License Free for research
URL mocap.cs.cmu.edu

3c. Real-Time Sources (Live Mocap)

Source Device Latency Accuracy G1 Integration
XR Teleoperate Vision Pro, Quest 3, PICO 4 Low (~50ms) High (VR tracking) Official (unitreerobotics/xr_teleoperate)
Kinect Azure Kinect DK Medium (~100ms) Medium Official (kinect_teleoperate)
MediaPipe RGB camera Low (~30ms) Low-Medium Community, needs retarget code
OpenPose RGB camera Medium Medium Community, needs retarget code
OptiTrack/Vicon Marker-based system Very low (~5ms) Very high Custom integration needed

For the user's goal (mocap → robot), the XR teleoperation system is the most direct path for real-time, while AMASS provides offline motion libraries.

3d. Video-Based Pose Estimation

Extract human pose from standard RGB video without mocap hardware: [T2]

  • MediaPipe Pose: 33 landmarks, real-time on CPU, Google
  • OpenPose: 25 body keypoints, GPU required
  • HMR2.0 / 4DHumans: SMPL mesh recovery from single image — richer than keypoints
  • MotionBERT: Temporal pose estimation from video sequences

These are lower fidelity than marker-based mocap but require only a webcam. HumanPlus (arXiv:2406.10454) uses RGB camera input specifically for humanoid shadowing.

4. The Retargeting Pipeline

End-to-end pipeline from human motion to G1 execution:

┌─────────────┐     ┌──────────────┐     ┌───────────────┐
│ Motion       │     │ Skeleton     │     │ Kinematic      │
│ Source       │────►│ Extraction   │────►│ Retargeting    │
│ (mocap/video)│     │ (SMPL/joints)│     │ (scale + IK)   │
└─────────────┘     └──────────────┘     └───────┬───────┘
                                                  │
                                                  ▼
┌─────────────┐     ┌──────────────┐     ┌───────────────┐
│ Execute on   │     │ WBC / RL     │     │ Feasibility    │
│ Real G1      │◄───│ Policy       │◄───│ Check          │
│ (sdk2)       │     │ (balance +   │     │ (joint limits, │
└─────────────┘     │  tracking)   │     │  stability)    │
                    └──────────────┘     └───────────────┘

Step 1: Motion Source

  • Offline: AMASS dataset, CMU mocap, recorded demonstrations
  • Real-time: XR headset, Kinect, RGB camera

Step 2: Skeleton Extraction

  • AMASS: Already in SMPL format, extract joint angles
  • BVH/C3D: Parse standard mocap formats
  • Video: Run pose estimator (MediaPipe, OpenPose, HMR2.0)
  • Output: Human joint positions/rotations per frame

Step 3: Kinematic Retargeting

  • Map human skeleton to G1 skeleton (limb length scaling)
  • Solve IK for each frame or use direct joint angle mapping
  • Handle DOF mismatch (project higher-DOF human motion to G1 subspace)
  • Clamp to G1 joint limits (see equations-and-bounds)

Step 4: Feasibility Check

  • Verify all joint angles within limits
  • Check CoM remains within support polygon (static stability)
  • Estimate required torques (inverse dynamics) — reject if exceeding actuator limits
  • Check for self-collisions

Step 5: Execution Policy

  • Direct playback: Send retargeted joint angles via rt/lowcmd (no balance guarantee)
  • WBC execution: Feed to GR00T-WBC as upper-body targets, let locomotion policy handle balance
  • RL tracking: Use trained motion tracking policy (BFM-Zero style) that simultaneously tracks and balances

Step 6: Deploy on Real G1

  • Via unitree_sdk2_python (prototyping) or unitree_sdk2 C++ (production)
  • 500 Hz control loop, 2ms DDS latency
  • Always validate in simulation first (see simulation)

5. SMPL Body Model

SMPL (Skinned Multi-Person Linear model) is the standard representation for human body shape and pose in mocap datasets: [T1]

  • Parameters: 72 pose parameters (24 joints x 3 rotations) + 10 shape parameters
  • Output: 6,890 vertices mesh + joint locations
  • Extensions: SMPL-X (hands + face), SMPL+H (hands)
  • Relevance: AMASS uses SMPL, so retargeting from AMASS means mapping SMPL joints → G1 joints

SMPL to G1 Joint Mapping (Approximate)

SMPL Joint G1 Joint(s) Notes
Pelvis Waist (yaw) G1 has 1-3 waist DOF vs. SMPL's 3
L/R Hip left/right_hip_pitch/roll/yaw Direct mapping, 3-DOF each
L/R Knee left/right_knee Direct mapping, 1-DOF
L/R Ankle left/right_ankle_pitch/roll Direct mapping, 2-DOF
L/R Shoulder left/right_shoulder_pitch/roll/yaw Direct mapping, 3-DOF
L/R Elbow left/right_elbow Direct mapping, 1-DOF
L/R Wrist left/right_wrist_yaw(+pitch+roll) 1-DOF (23-DOF) or 3-DOF (29-DOF)
Spine Waist (limited) SMPL has 3 spine joints, G1 has 1-3 waist
Head/Neck G1 has no head/neck DOF
Fingers Hand joints (if equipped) Only with Dex3-1 or INSPIRE

6. Key Software & Repositories

Tool Purpose Language License
GR00T-WBC End-to-end WBC + retargeting for G1 Python/C++ Apache 2.0
Pinocchio Rigid body dynamics, IK, Jacobians C++/Python BSD-2
xr_teleoperate Real-time VR mocap → G1 Python Unitree
unitree_mujoco Simulate retargeted motions C++/Python BSD-3
smplx (Python) SMPL body model processing Python MIT
rofunc Robot learning from human demos + retargeting Python MIT
MuJoCo Menagerie G1 model (g1.xml) for IK/simulation MJCF BSD-3

6. Apple Vision Pro Telepresence Paths (Researched 2026-02-15) [T1/T2]

Available Integration Options

Path Approach App Required? GR00T-WBC Compatible? Retargeting
xr_teleoperate WebXR via Safari No (browser) No (uses stock SDK) Pinocchio IK
VisionProTeleop Native visionOS app Yes (App Store / open-source) Yes (via bridge) Custom (flexible)
iPhone streamer Socket.IO protocol Custom visionOS app Yes (built-in) Pinocchio IK in GR00T-WBC

xr_teleoperate (Unitree Official)

  • Vision Pro connects via Safari to https://<host>:8012 (WebXR)
  • TeleVuer (Python, built on Vuer) serves the 3D interface
  • WebSocket for tracking data, WebRTC for video feedback
  • Pinocchio IK solves wrist poses → G1 arm joint angles
  • Supports G1_29 and G1_23 variants
  • Limitation: Bypasses GR00T-WBC — sends motor commands directly via DDS

VisionProTeleop (MIT, Open-Source)

  • Native visionOS app "Tracking Streamer" — on App Store + source on GitHub
  • Python library avp_stream receives data via gRPC
  • 25 finger joints/hand, head pose, wrist positions (native ARKit, better than WebXR)
  • Robot-agnostic — needs a bridge to publish to GR00T-WBC's ControlPolicy/upper_body_pose ROS2 topic
  • Best path for GR00T-WBC integration with RL-based balance

GR00T-WBC Integration Point

The single integration point is the ControlPolicy/upper_body_pose ROS2 topic. Any source that publishes target_upper_body_pose (17 joint angles: 3 waist + 7 left arm + 7 right arm) and optionally navigate_cmd (velocity [vx, vy, wz]) can drive the robot. The InterpolationPolicy smooths targets before execution.

Key Relationships