LG-H-PPO: offline hierarchical PPO for robot path planning on a latent graph
Xiang Han
- Year
- 2026
- Citations
- 2
- Access
- Open access
Abstract
The path planning capability of autonomous robots in complex environments is crucial for their widespread application in the real world. However, long-term decision-making and sparse reward signals pose significant challenges to traditional reinforcement learning (RL) algorithms. Offline hierarchical reinforcement learning offers an effective approach by decomposing tasks into two stages: high-level subgoal generation and low-level subgoal attainment. Advanced Offline HRL methods, such as Guider and HIQL, typically introduce latent spaces in high-level policies to represent subgoals, thereby handling high-dimensional states and enhancing generalization. However, these approaches require the high-level policy to search and generate sub-objectives within a continuous latent space. This remains a complex and sample-inefficient challenge for policy optimization algorithms-particularly policy gradient-based PPO-often leading to unstable training and slow convergence. To address this core limitation, this paper proposes a novel offline hierarchical PPO framework-LG-H-PPO (Latent Graph-based Hierarchical PPO). The core innovation of LG-H-PPO lies in discretizing the continuous latent space into a structured "latent graph." By transforming high-level planning from challenging "continuous creation" to simple "discrete selection," LG-H-PPO substantially reduces the learning difficulty for the high-level policy. Preliminary experiments on standard D4RL offline navigation benchmarks demonstrate that LG-H-PPO achieves significant advantages over advanced baselines like Guider and HIQL in both convergence speed and final task success rates. The main contribution of this paper is introducing graph structures into latent variable HRL planning. This effectively simplifies the action space for high-level policies, enhancing the training efficiency and stability of offline HRL algorithms for long-sequence navigation tasks. It lays the foundation for future offline HRL research combining latent variable representations with explicit graph planning.
Keywords
Related papers
Artificial intelligence: a modern approach
1995
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002
Self-Organizing Maps
Teuvo Kohonen
1995
Vision meets robotics: The KITTI dataset
Andreas Geiger, Philip Lenz, Christoph Stiller +1 more
2013