Home /Research /Real-time multimodal fusion and semantic mapping for robotic tower crane perception

PERCEPTION

Real-time multimodal fusion and semantic mapping for robotic tower crane perception

Yifan Lu, Xiuzhi DENG, Peter E.D. Love, Zhou We, Weili Fang

Year: 2026
Citations: 3

Abstract

Robotic tower crane operation requires real-time perception of complex and rapidly changing construction environments. Conventional Simultaneous Localization and Mapping (SLAM) methods assume smooth sensor motion and emphasize geometry over semantics, limiting their suitability for crane-mounted sensing affected by vibration, rotation, and intermittent movement. This research proposes a multimodal perception framework that integrates Light Detection and Ranging (LiDAR), camera, and Inertial Measurement Unit (IMU) data within a tightly coupled fusion and semantic reconstruction pipeline. A Mahony-filter-based attitude optimization module stabilizes high-frequency vibrations, while a Fast LiDAR-Inertial Odometry (FAST-LIVO2)-inspired LiDAR–visual–inertial fusion strategy achieves centimeter-level three-dimensional (3D) mapping. To enhance scene understanding, an improved Random Sampled and Lightweight Aggregated Network (RandLA-Net) jointly exploits geometric and visual cues for point-level semantic segmentation, with color-aware spatial encoding. Field deployment on an operational tower crane demonstrates superior performance, yielding the lowest global reconstruction errors and highest semantic accuracy. The framework provides a robust perception foundation for autonomous planning, safety monitoring, and intelligent lifting assistance.

Keywords

OdometrySemantic mappingSensor fusionRangingPerceptionRobotVisual odometrySoftware deploymentActive perceptionTower crane

Real-time multimodal fusion and semantic mapping for robotic tower crane perception

Abstract

Keywords

Related papers

Artificial intelligence: a modern approach

Are we ready for autonomous driving? The KITTI vision benchmark suite

Self-Organizing Maps

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems