VGGT-SLAM 2.0: Real-time Dense Feed-forward Scene Reconstruction
Dominic Maggio, Luca Carlone
- 发表年份
- 2026
- 访问权限
- 开放获取
摘要
We present VGGT-SLAM 2.0, a real-time RGB feed-forward SLAM system which substantially improves upon VGGT-SLAM for incrementally aligning submaps created from VGGT. Firstly, we remove high-dimensional 15-degree-of-freedom drift and planar degeneracy from VGGT-SLAM by creating a new factor graph design while still addressing the reconstruction ambiguity of VGGT given unknown camera intrinsics. Secondly, by studying the attention layers of VGGT, we show that one of the layers is well suited to assist in image retrieval verification for free without additional training, which enables both rejecting false positive matches and allows for completing more loop closures. Finally, we conduct a suite of experiments which includes showing VGGT-SLAM 2.0 can easily be adapted for open-set object detection and demonstrating real-time performance while running online onboard a ground robot using a Jetson Thor. We test in environments ranging from cluttered indoor apartments and office scenes to a 4,200 square foot barn, and we also demonstrate VGGT-SLAM 2.0 achieves the highest accuracy on the TUM dataset with about 23 percent less pose error than VGGT-SLAM. Code will be released upon publication.
关键词
相关论文
Artificial intelligence: a modern approach
1995
Are we ready for autonomous driving? The KITTI vision benchmark suite
Andreas Geiger, P Lenz, R. Urtasun
2012
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
Martı́n Abadi, Ashish Agarwal, Paul Barham 等 20 位作者
2016
Vision meets robotics: The KITTI dataset
Andreas Geiger, Philip Lenz, Christoph Stiller 等 4 位作者
2013