Home /Research /Before the Body Moves: Learning Anticipatory Joint Intent for Language-Conditioned Humanoid Control

LOCOMOTION

Before the Body Moves: Learning Anticipatory Joint Intent for Language-Conditioned Humanoid Control

Haozhe Jia, Honglei Jin, Yuan Zhang, Youcheng Fan, Shaofeng Liang, Lei Wang, Shuxu Jin, Kuimou Yu, Zinuo Zhang, Jianfei Song, Wenshuo Chen, Yutao Yue

Year: 2026
Citations: 0
Access: Open access

Abstract

Natural language is an intuitive interface for humanoid robots, yet streaming whole-body control requires control representations that are executable now and anticipatory of future physical transitions. Existing language-conditioned humanoid systems typically generate kinematic references that a low-level tracker must repair reactively, or use latent/action policies whose outputs do not explicitly encode upcoming contact changes, support transfers, and balance preparation. We propose \textbf{DAJI} (\emph{Dynamics-Aligned Joint Intent}), a hierarchical framework that learns an anticipatory joint-intent interface between language generation and closed-loop control. DAJI-Act distills a future-aware teacher into a deployable diffusion action policy through student-driven rollouts, while DAJI-Flow autoregressively generates future intent chunks from language and intent history. Experiments show that DAJI achieves strong results in anticipatory latent learning, single-instruction generation, and streaming instruction following, reaching 94.42\% rollout success on HumanML3D-style generation and 0.152 subsequence FID on BABEL.

Keywords

humanoid controllanguage-conditionedanticipatory joint intentdiffusion policywhole-body control

Before the Body Moves: Learning Anticipatory Joint Intent for Language-Conditioned Humanoid Control

Abstract

Keywords

Related papers

Trust Region Policy Optimization

Legged Robots That Balance

Being there: putting brain, body, and world together again

Small-scale soft-bodied robot with multimodal locomotion