Human2Any achieves successful real-world rollouts across diverse task contexts and robot embodiments.
Drag to rotate · Scroll to zoom · Right-drag to pan
Framework overview. From human videos, Human2Any learns object–object interaction priors between tool and target objects that are independent of embodiment. Separate agent–object priors capture how a given robot grasps and controls tools. At deployment, the two are composed under the current robot, scene, and task constraints to generate executable manipulation trajectories.
Drag to rotate · Scroll to zoom · Right-drag to pan
The robot avoids the kettle, grasps the cup by its handle, and rotates it during the transfer motion to align the cup for pouring into the bowl.
The robot aligns its grasp to avoid non-target objects and reorients the mug on the fly to accurately place the handle onto the mug tree across various object arrangements.
The robot sequentially places a bowl onto a plate and a utensil onto the bowl, demonstrating long-horizon skill composition for different scene layouts.
The robot securely grasps the cup and leverages coordinated whole-body motion to pour a variety of cups into the target container.
The robot grasps the handle of the roller precisely and moves to press the target dough.
We visualize the particles during the diffusion process, with each particle corresponding to a full end-effector trajectory and color-coded according to its score. The execution of the best final particle is demonstrated in the accompanying video.
No Steering
Ours
Drag to rotate · Scroll to zoom · Right-drag to pan