Home /Research /Multimodal Interaction for Human-Robot Collaboration in Assembly: An LLM-Enhanced Approach

HRI

Multimodal Interaction for Human-Robot Collaboration in Assembly: An LLM-Enhanced Approach

Khansa Rekik, Grimaldo Silva, Attique Bashir, Rainer Müller

Year: 2025
Citations: 2

Abstract

As Robot as a Service (RaaS) models gain intrest in industrial automation, the need for intuitive and adaptive human-robot interaction (HRI) increaces. This paper introduces a multimodal interction framework for human-robot collaboration in assembly tasks, enhanced by Large Language Models (LLMs). The system combines explicit user inputs—such as speech commands, gestures, and graphical interfaces—with implicit intent recognition to generate and prioritize tasks in real-time. Leveraging LLMs for natural language understanding and task planning, the approach enables flexible and adaptive task execution, allowing the robot to respond to both direct requests and contextual cues. Through a pilot user study, performance and user satisfaction of each modality are evaluated, revealing trade-offs between ease of use, response speed, and accuracy. The results demonstrate the promise of the approach in industrial applications, while also identifying improvements’ opportunities for broader use.

Keywords

Task (project management)Multimodal interactionModality (human–computer interaction)RobotUsabilityNatural languageHuman–robot interactionNatural language understandingUser interface

Multimodal Interaction for Human-Robot Collaboration in Assembly: An LLM-Enhanced Approach

Abstract

Keywords

Related papers

Artificial intelligence: a modern approach

A new optimizer using particle swarm theory

Self-Organizing Maps

Vision meets robotics: The KITTI dataset