Home /Research /How to Enable LLM with 3D Capacity? A Survey of Spatial Reasoning in LLM

PERCEPTION

How to Enable LLM with 3D Capacity? A Survey of Spatial Reasoning in LLM

Jirong Zha, Yuxuan Fan, Xiao Yang, Chen Gao, Xinlei Chen

Year: 2025
Citations: 4

Abstract

3D spatial understanding is essential in real-world applications such as robotics, autonomous vehicles, virtual reality, and medical imaging. Recently, Large Language Models (LLMs), having demonstrated remarkable success across various domains, have been leveraged to enhance 3D understanding tasks, showing potential to surpass traditional computer vision methods. In this survey, we present a comprehensive review of methods integrating LLMs with 3D spatial understanding. We propose a taxonomy that categorizes existing methods into three branches: image-based methods deriving 3D understanding from 2D visual data, point cloud-based methods working directly with 3D representations, and hybrid modality-based methods combining multiple data streams. We systematically review representative methods along these categories, covering data representations, architectural modifications, and training strategies that bridge textual and 3D modalities. Finally, we discuss current limitations, including dataset scarcity and computational challenges, while highlighting promising research directions in spatial perception, multi-modal fusion, and real-world applications.

Keywords

Spatial intelligenceVisual reasoningScarcityTaxonomy (biology)Bridge (graph theory)Point (geometry)Spatial analysisVirtual reality

How to Enable LLM with 3D Capacity? A Survey of Spatial Reasoning in LLM

Abstract

Keywords

Related papers

Artificial intelligence: a modern approach

Are we ready for autonomous driving? The KITTI vision benchmark suite

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

Vision meets robotics: The KITTI dataset