首页 /研究 /CapStARE: Capsule-based Spatiotemporal Architecture for Robust and Efficient Gaze Estimation

HRI

CapStARE: Capsule-based Spatiotemporal Architecture for Robust and Efficient Gaze Estimation

Miren Samaniego, Igor Rodriguez, Elena Lazkano

发表年份: 2025
访问权限: 开放获取

摘要

We introduce CapStARE, a capsule-based spatio-temporal architecture for gaze estimation that integrates a ConvNeXt backbone, capsule formation with attention routing, and dual GRU decoders specialized for slow and rapid gaze dynamics. This modular design enables efficient part-whole reasoning and disentangled temporal modeling, achieving state-of-the-art performance on ETH-XGaze (3.36) and MPIIFaceGaze (2.65) while maintaining real-time inference (< 10 ms). The model also generalizes well to unconstrained conditions in Gaze360 (9.06) and human-robot interaction scenarios in RT-GENE (4.76), outperforming or matching existing methods with fewer parameters and greater interpretability. These results demonstrate that CapStARE offers a practical and robust solution for real-time gaze estimation in interactive systems. The related code and results for this article can be found on: https://github.com/toukapy/capsStare

关键词

cs.CV

CapStARE: Capsule-based Spatiotemporal Architecture for Robust and Efficient Gaze Estimation

摘要

关键词

相关论文

The Uncanny Valley [From the Field]

Measurement Instruments for the Anthropomorphism, Animacy, Likeability, Perceived Intelligence, and Perceived Safety of Robots

The development of Honda humanoid robot

A Meta-Analysis of Factors Affecting Trust in Human-Robot Interaction