Vision Language Model Empowered Surgical Planning

Yi-He Chen, Runsheng Yu, Xin Wang, Wensheng Wang, Ning Tan, Youzhi Zhang

Year: 2024
Citations: 1

Abstract

The integration of a flexible endoscope with a surgical manipulator is crucial in minimally invasive surgery (MIS), facilitating detailed visualization of the operative field within the patient’s body. During MIS, the remote center of motion (RCM) constraints are essential for achieving visual servoing control and ensuring accurate tracking control of the robotic endoscope. Existing work requires the exact trajectory for the tracking control and does not connect both tasks with the RCM constraints. In this paper, we exploit GPT-V to develop Vision Language Model Empowered surgical Planning (VLM-EP), which uses environmental observations and task description to finish the tracking task without the exact trajectory and connect both tasks through the exploration procedure in vivo safety range. Our simulated experiments show that our VLM-EP significantly outperforms the state-of-the-art control-based baseline. We demonstrate a practical implementation of VLM-EP in real-world scenarios, which shows that VLM-EP effectively handles the tracking control task and the visual servoing control task.

Keywords

Computer scienceSurgical planningHuman–computer interactionArtificial intelligenceMedicineRadiology

Vision Language Model Empowered Surgical Planning

Abstract

Keywords

Related papers

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory