Human2Any

Human2Any achieves successful real-world rollouts across diverse task contexts and robot embodiments.

Abstract

Human videos are a scalable source of supervision for robot manipulation, as they are abundant and naturally capture rich object interactions. However, transferring human demonstrations to robots remains challenging due to embodiment mismatch, scene variation, and robot-specific feasibility constraints. We present Human2Any, a framework for learning reusable object-centric interaction priors from human videos without requiring real-world robot demonstrations in the target task contexts. Human2Any represents manipulation through object-object interaction motion, capturing task-relevant scene changes while abstracting away embodiment-specific details. It composes learned interaction priors with robot-side feasibility reasoning and motion planning, allowing the same human-derived knowledge to adapt to different embodiments, scene geometries, and task contexts. We validate Human2Any across diverse manipulation settings, including real-world experiments on a Franka tabletop setup and an RBY-1 humanoid mobile robot, demonstrating robust interaction-centric manipulation without real-world robot training data.

Interactive Planning Visualizer

Initializing…

0 / 0

Drag to rotate · Scroll to zoom · Right-drag to pan

Framework overview. From human videos, Human2Any learns object–object interaction priors between tool and target objects that are independent of embodiment. Separate agent–object priors capture how a given robot grasps and controls tools. At deployment, the two are composed under the current robot, scene, and task constraints to generate executable manipulation trajectories.

Initializing…

Drag to rotate · Scroll to zoom · Right-drag to pan

Franka tabletop

Pour In Bowl

The robot avoids the kettle, grasps the cup by its handle, and rotates it during the transfer motion to align the cup for pouring into the bowl.

Hang Mug Tree

The robot aligns its grasp to avoid non-target objects and reorients the mug on the fly to accurately place the handle onto the mug tree across various object arrangements.

Sort Utensils

The robot sequentially places a bowl onto a plate and a utensil onto the bowl, demonstrating long-horizon skill composition for different scene layouts.

RBY-1 humanoid mobile robot

Pour Cup

The robot securely grasps the cup and leverages coordinated whole-body motion to pour a variety of cups into the target container.

Use Roller

The robot grasps the handle of the roller precisely and moves to press the target dough.

We visualize the particles during the diffusion process, with each particle corresponding to a full end-effector trajectory and color-coded according to its score. The execution of the best final particle is demonstrated in the accompanying video.

No Steering

Loading…

Ours

Noisy (Step 19) Clean (Step 0) Diffusion Step: 19

0.0 (low) Score 1.0 (high)

Drag to rotate · Scroll to zoom · Right-drag to pan

Human2Any: Human-to-Robot Transfer via Constraint-Aware Compositional Planning

Abstract

Interactive Planning Visualizer

Method Overview

Human Data Visualizer

Real-world Results

Franka tabletop

Pour In Bowl

Hang Mug Tree

Sort Utensils

RBY-1 humanoid mobile robot

Pour Cup

Use Roller

Diffusion Visualization