Home /Research /Benchmarking YOLOv8 to YOLOv13 for robust hand gesture recognition in human–robot interaction
HRI

Benchmarking YOLOv8 to YOLOv13 for robust hand gesture recognition in human–robot interaction

Yifang Gao, Wei Luo, Shunshun Zhang, Nur Syazreen Ahmad, Xiaojun Wang, Patrick Goh

Year
2025
Citations
6
Access
Open access

Abstract

Real-time and accurate hand gesture detection is essential for safe and intuitive Human-Robot Interaction (HRI), enabling robots to interpret non-verbal cues and respond appropriately in dynamic environments. This research evaluates the effectiveness of YOLOv8n through YOLOv13n models in recognizing static hand gestures from the TSL detection dataset, which includes 5469 grayscale images across 31 gesture classes. The models underwent training with uniform data augmentation protocols and were assessed using object detection metrics including precision, recall, and mean average precision computed at an IoU threshold of 0.50 as well as over the interval from 0.50 to 0.95. The evaluation of computational efficiency involved metrics such as how fast the model infers, its frame rate, size and the total training duration. YOLOv9t exhibited the most robust detection accuracy across all evaluated metrics, achieving the highest mean mAP at 0.50 (0.990), mAP at 0.50 to 0.95 (0.876), precision (0.975), and recall (0.966). In contrast, YOLOv10n achieved the lowest inference latency (0.7 ms). These findings highlight the trade-off between accuracy and efficiency in gesture detection and show that YOLOv9t and YOLOv10n represent strong choices for accuracy and latency-critical applications, respectively.

Keywords

GestureBenchmarkingInferencePattern recognition (psychology)RecallGesture recognitionObject detectionPrecision and recallFrame (networking)

Related papers

Browse all HRI papers