Real-time multimodal fusion and semantic mapping for robotic tower crane perception
Yifan Lu, Xiuzhi DENG, Peter E.D. Love, Zhou We, Weili Fang
- Year
- 2026
- Citations
- 3
Abstract
Robotic tower crane operation requires real-time perception of complex and rapidly changing construction environments. Conventional Simultaneous Localization and Mapping (SLAM) methods assume smooth sensor motion and emphasize geometry over semantics, limiting their suitability for crane-mounted sensing affected by vibration, rotation, and intermittent movement. This research proposes a multimodal perception framework that integrates Light Detection and Ranging (LiDAR), camera, and Inertial Measurement Unit (IMU) data within a tightly coupled fusion and semantic reconstruction pipeline. A Mahony-filter-based attitude optimization module stabilizes high-frequency vibrations, while a Fast LiDAR-Inertial Odometry (FAST-LIVO2)-inspired LiDAR–visual–inertial fusion strategy achieves centimeter-level three-dimensional (3D) mapping. To enhance scene understanding, an improved Random Sampled and Lightweight Aggregated Network (RandLA-Net) jointly exploits geometric and visual cues for point-level semantic segmentation, with color-aware spatial encoding. Field deployment on an operational tower crane demonstrates superior performance, yielding the lowest global reconstruction errors and highest semantic accuracy. The framework provides a robust perception foundation for autonomous planning, safety monitoring, and intelligent lifting assistance.
Keywords
Related papers
Artificial intelligence: a modern approach
1995
Are we ready for autonomous driving? The KITTI vision benchmark suite
Andreas Geiger, P Lenz, R. Urtasun
2012
Self-Organizing Maps
Teuvo Kohonen
1995
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
Martı́n Abadi, Ashish Agarwal, Paul Barham +17 more
2016