Multimodal Optimal Transport for Training-free Temporal Segmentation in Surgical Robotics
Omar Mohamed, Edoardo Fazzari, Ayah Al-Naji, Hamdan Alhadhrami, Khalfan Hableel, Saif Alkindi, Ivan Laptev, Cesare Stefanini
- 发表年份
- 2026
- 访问权限
- 开放获取
摘要
Automated recognition of surgical phases and steps is a fundamental capability for intraoperative decision support, workflow automation, and skill assessment in robotic-assisted surgery. Existing approaches either depend on large-scale annotated surgical datasets or require expensive domain-specific pretraining on thousands of labeled videos, limiting their practical deployability across diverse robotic platforms and clinical environments. In this work, we propose TASOT (Text-Augmented Action Segmentation Optimal Transport), an annotation-free framework for surgical temporal segmentation that requires no task-specific annotations or surgical-domain pretraining. TASOT extends the Action Segmentation Optimal Transport (ASOT) formulation by incorporating temporally aligned textual descriptions generated directly from the input video, fusing visual and semantic cues within a unified unbalanced Gromov-Wasserstein optimal transport objective. Visual representations are extracted using DINOv3, while temporal captions produced by a vision-language model are encoded via CLIP and temporally aligned to individual frames, providing complementary semantic structure to the transport cost. We evaluate TASOT on three public surgical datasets and four benchmark settings spanning laparoscopic and robotic procedures, showing substantial improvements over the strongest zero-shot baselines: +18.9 F1 on Cholec80, +33.7 on AutoLaparo, +23.7 on StrasByPass70, and +4.5 on BernByPass70. These results suggest that fine-grained surgical workflow understanding in robotic settings can be achieved without manual training annotations or surgical-specific pretraining pipelines, offering a promising alternative for real-world robotic surgical systems.
关键词
相关论文
Campbell-Walsh urology
Alan J. Wein editor-in-chief
2012
Principles of Robot Motion: Theory, Algorithms, and Implementations
Howie Choset, Jean‐Claude Latombe
2005
Minimally Invasive versus Abdominal Radical Hysterectomy for Cervical Cancer
Pedro T. Ramírez, Michael Frumovitz, René Pareja 等 19 位作者
2018
Guideline for Management of the Clinical T1 Renal Mass
Steven C. Campbell, Andrew C. Novick, Arie S. Belldegrun 等 12 位作者
2009