Home /Research /LG-H-PPO: offline hierarchical PPO for robot path planning on a latent graph
LEARNING

LG-H-PPO: offline hierarchical PPO for robot path planning on a latent graph

Xiang Han

Year
2026
Citations
2
Access
Open access

Abstract

The path planning capability of autonomous robots in complex environments is crucial for their widespread application in the real world. However, long-term decision-making and sparse reward signals pose significant challenges to traditional reinforcement learning (RL) algorithms. Offline hierarchical reinforcement learning offers an effective approach by decomposing tasks into two stages: high-level subgoal generation and low-level subgoal attainment. Advanced Offline HRL methods, such as Guider and HIQL, typically introduce latent spaces in high-level policies to represent subgoals, thereby handling high-dimensional states and enhancing generalization. However, these approaches require the high-level policy to search and generate sub-objectives within a continuous latent space. This remains a complex and sample-inefficient challenge for policy optimization algorithms-particularly policy gradient-based PPO-often leading to unstable training and slow convergence. To address this core limitation, this paper proposes a novel offline hierarchical PPO framework-LG-H-PPO (Latent Graph-based Hierarchical PPO). The core innovation of LG-H-PPO lies in discretizing the continuous latent space into a structured "latent graph." By transforming high-level planning from challenging "continuous creation" to simple "discrete selection," LG-H-PPO substantially reduces the learning difficulty for the high-level policy. Preliminary experiments on standard D4RL offline navigation benchmarks demonstrate that LG-H-PPO achieves significant advantages over advanced baselines like Guider and HIQL in both convergence speed and final task success rates. The main contribution of this paper is introducing graph structures into latent variable HRL planning. This effectively simplifies the action space for high-level policies, enhancing the training efficiency and stability of offline HRL algorithms for long-sequence navigation tasks. It lays the foundation for future offline HRL research combining latent variable representations with explicit graph planning.

Keywords

Reinforcement learningGraphRobotMotion planningOffline learningConvergence (economics)Path (computing)Core (optical fiber)Task (project management)

Related papers

Browse all LEARNING papers