TriFusion-AE: Language-Guided Depth and LiDAR Fusion for Robust Point Cloud Processing
Susmit Neogi
- 发表年份
- 2025
- 访问权限
- 开放获取
摘要
LiDAR-based perception is central to autonomous driving and robotics, yet raw point clouds remain highly vulnerable to noise, occlusion, and adversarial corruptions. Autoencoders offer a natural framework for denoising and reconstruction, but their performance degrades under challenging real-world conditions. In this work, we propose TriFusion-AE, a multimodal cross-attention autoencoder that integrates textual priors, monocular depth maps from multi-view images, and LiDAR point clouds to improve robustness. By aligning semantic cues from text, geometric (depth) features from images, and spatial structure from LiDAR, TriFusion-AE learns representations that are resilient to stochastic noise and adversarial perturbations. Interestingly, while showing limited gains under mild perturbations, our model achieves significantly more robust reconstruction under strong adversarial attacks and heavy noise, where CNN-based autoencoders collapse. We evaluate on the nuScenes-mini dataset to reflect realistic low-data deployment scenarios. Our multimodal fusion framework is designed to be model-agnostic, enabling seamless integration with any CNN-based point cloud autoencoder for joint representation learning.
关键词
相关论文
Artificial intelligence: a modern approach
1995
Are we ready for autonomous driving? The KITTI vision benchmark suite
Andreas Geiger, P Lenz, R. Urtasun
2012
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
Martı́n Abadi, Ashish Agarwal, Paul Barham 等 20 位作者
2016
Vision meets robotics: The KITTI dataset
Andreas Geiger, Philip Lenz, Christoph Stiller 等 4 位作者
2013