Vision Language Model Empowered Surgical Planning
Yi-He Chen, Runsheng Yu, Xin Wang, Wensheng Wang, Ning Tan, Youzhi Zhang
- Year
- 2024
- Citations
- 1
Abstract
The integration of a flexible endoscope with a surgical manipulator is crucial in minimally invasive surgery (MIS), facilitating detailed visualization of the operative field within the patient’s body. During MIS, the remote center of motion (RCM) constraints are essential for achieving visual servoing control and ensuring accurate tracking control of the robotic endoscope. Existing work requires the exact trajectory for the tracking control and does not connect both tasks with the RCM constraints. In this paper, we exploit GPT-V to develop Vision Language Model Empowered surgical Planning (VLM-EP), which uses environmental observations and task description to finish the tracking task without the exact trajectory and connect both tasks through the exploration procedure in vivo safety range. Our simulated experiments show that our VLM-EP significantly outperforms the state-of-the-art control-based baseline. We demonstrate a practical implementation of VLM-EP in real-world scenarios, which shows that VLM-EP effectively handles the tracking control task and the visual servoing control task.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002