A central question in robot learning is how to acquire skills from the kinds of data that humans learn from: passive observation, embodied practice, and the experience of failure. Human videos provide the first of these in abundance, and prior work has shown they can initialize useful policies. Far less clear is whether they can support the second and third: whether priors extracted from human videos can ground a robot's own attempts well enough to evaluate them, correct them, and improve from them. In this work, we show that human videos can be used to learn embodiment-agnostic action, dynamics, and value representations that transfer across robot embodiments, providing the predictive foundation for robots to autonomously improve from their own rollouts and failures. We introduce Dynamics-Guided Action Correction (DGAC), a training-free approach that leverages these adapted models to repair failed states — each failure becomes a query for which the learned models propose and rank corrective actions, turning failures into supervision for the next policy update. Across seven real-world manipulation tasks spanning both a mobile manipulator and a static manipulator arm, our approach improves success rates from 40% to 81% across multiple policy backbones, demonstrating cross-embodiment robot self-improvement from human-video priors.
Left: We pretrain shared policy, dynamics, and value representations from human videos to support cross-embodiment robot self-improvement. The policy model predicts wrist actions represented by a 6-DoF pose and a hand-closure variable. The dynamics model forecasts action-conditioned world states represented by DINO-v3 visual features and 3D point trajectories. The value model learns an embodiment-agnostic progress representation that estimates a state's proximity to task success.
Right: Building on these pretrained models, we develop a self-improvement pipeline that learns from autonomous robot experience. Successful and failed rollouts are used to adapt the dynamics and value models, while Dynamics-Guided Action Correction (DGAC) converts recoverable failures into corrective supervision. The resulting trajectories are then used for policy improvement through advantage-conditioned policy extraction, enabling continual learning without human intervention.
DGAC converts failed states into corrective supervision. It uses the learned dynamics and value models to rank candidate actions and identify the best correction for near-failure but recoverable states, without human intervention.
Given a failed state, DGAC samples candidate corrective actions (colorful trajectories), predicts their future states and values, and selects the highest-value proposal (green) as the corrective action, adding it to the repair dataset for policy supervision.
Ours: 85.3% average success rate, best among all baselines.
† No human-intervened corrections for fair comparisons.
Across 5 real-world tasks, our framework achieves the highest average success rate among 6 representative baselines.
Our framework also generalizes to a different policy backbone (𝜋0.5).
Each pair shows Before Self-Improvement (left) vs. After Self-Improvement (right) using proposed DGAC module. Repeated trials show that self-improvement converts initial failures into consistent task success across different embodiments. All videos are played at 1x speed.
We further show additional full-rollout results. Select a robot and task below.
This work was supported by Technical University of Munich (TUM) and the State of Bavaria through the REACT project, TUM Georg Nemetschek Institute via the SPAICR project, Munich Center for Machine Learning (MCML) and ETH Zurich. We thank Helen Oleynikova for her support during the initial phase of the project.
@article{chenzhang2026robot,
title = {Robot Self-Improvement via Human-Video Dynamics Models},
author = {Chen, Hanzhi and Zhang, Anran and Schaefer, Simon and
Chen, Kejia and Chen, Shi and Cremers, Daniel and
Mees, Oier and Leutenegger, Stefan},
journal = {arXiv preprint arXiv:2606.21406},
year = {2026}
}