Home /Research /Enhancement of Virtual Assistants Through Multimodal AI for Emotion Recognition
HRI

Enhancement of Virtual Assistants Through Multimodal AI for Emotion Recognition

Shaun George Rajesh, Smriti Vipin Madangarli, Gauri Santosh Pisharady, Rolla Subrahmanyam

Year
2025
Citations
9

Abstract

Emotion recognition is becoming increasingly critical for enhancing human-computer interactions, as emotions play a vital role in shaping human interactions and overall well-being. Machines that can detect and respond to emotional cues similar to humans are essential in multiple industries. Emotionally responsive agents find applications in education, healthcare, gaming, marketing, customer service, human-robot interaction, and entertainment. This study explores the potential of enhancing virtual assistants through multimodal Artificial Intelligence (AI), utilizing various emotion recognition techniques to create more empathetic and effective systems. The proposed methodology makes use of facial expressions and textual cues to enhance the emotional awareness of the system and achieve user satisfaction through empathetic conversation. The Facial Emotion Recognition (FER) model achieved 71% real-time accuracy, whereas the Textual Emotion Recognition (TER) model achieved 59% validation accuracy, demonstrating effective Multimodal Emotion Recognition (MER). Unlike prior multimodal emotion-aware systems, our lightweight architecture ensures real-time inference and uniquely integrates facial and textual emotion recognition with DialoGPT-based response generation — demonstrating compatibility with large language models for empathetic dialogue.

Keywords

Computer scienceEmotion recognitionHuman–computer interactionSpeech recognitionArtificial intelligencePattern recognition (psychology)Computer vision

Related papers

Browse all HRI papers