Home /Research /NavFormer: A Transformer Architecture for Robot Target-Driven Navigation in Unknown and Dynamic Environments
SWARM

NavFormer: A Transformer Architecture for Robot Target-Driven Navigation in Unknown and Dynamic Environments

Haitong Wang, Aaron Hao Tan, Goldie Nejat

Year
2024
Citations
33

Abstract

In unknown cluttered and dynamic environments such as disaster scenes, mobile robots need to perform target-driven navigation in order to find people or objects of interest, where the only information provided about these targets are images of the individual targets. In this letter, we introduce NavFormer, a novel end-to-end transformer architecture developed for robot target-driven navigation in unknown and dynamic environments. NavFormer leverages the strengths of both 1) transformers for sequential data processing and 2) self-supervised learning (SSL) for visual representation to reason about spatial layouts and to perform collision-avoidance in dynamic settings. The architecture uniquely combines dual-visual encoders consisting of a static encoder for extracting invariant environment features for spatial reasoning, and a general encoder for dynamic obstacle avoidance. The primary robot navigation task is decomposed into two sub-tasks for training: single robot exploration and multi-robot collision avoidance. We perform cross-task training to enable the transfer of learned skills to the complex primary navigation task. Simulated experiments demonstrate that NavFormer can effectively navigate a mobile robot in diverse unknown environments, outperforming existing state-of-the-art methods. A comprehensive ablation study is performed to evaluate the impact of the main design choices of NavFormer. Furthermore, real-world experiments validate the generalizability of NavFormer.

Keywords

ArchitectureTransformerRobotComputer scienceArtificial intelligenceHuman–computer interactionEngineeringGeographyElectrical engineering

Related papers

Browse all SWARM papers