SteadyTray: Learning Object Balancing Tasks in Humanoid Tray Transport via Residual Reinforcement Learning
Anlun Huang, Zhenyu Wu, Soofiyan Atar, Yuheng Zhi, Michael Yip
- Year
- 2026
- Access
- Open access
Abstract
Stabilizing unsecured payloads against the inherent oscillations of dynamic bipedal locomotion remains a critical engineering bottleneck for humanoids in unstructured environments. To solve this, we introduce ReST-RL, a hierarchical reinforcement learning architecture that explicitly decouples locomotion from payload stabilization, evaluated via the SteadyTray benchmark. Rather than relying on monolithic end-to-end learning, our framework integrates a robust base locomotion policy with a dynamic residual module engineered to actively cancel gait-induced perturbations at the end-effector. This architectural separation ensures steady tray transport without degrading the underlying bipedal stability. In simulation, the residual design significantly outperforms end-to-end baselines in gait smoothness and orientation accuracy, achieving a 96.9% success rate in variable velocity tracking and 74.5% robustness against external force disturbances. Successfully deployed on the Unitree G1 humanoid hardware, this modular approach demonstrates highly reliable zero-shot sim-to-real generalization across various objects and external force disturbances.
Keywords
Related papers
Trust Region Policy Optimization
John Schulman, Sergey Levine, Philipp Moritz +2 more
2015
Legged Robots That Balance
Marc H. Raibert, Ernest R. Tello
1986
Being there: putting brain, body, and world together again
1997
Small-scale soft-bodied robot with multimodal locomotion
Wenqi Hu, Guo Zhan Lum, Massimo Mastrangeli +1 more
2018