01 janv. 2026neurodeveloppementenopen accessPubMed / PMC — neurodeveloppement open access

Enhanced graph attention network by integrating Long Short-Term Memory for artificial emotion representation in multi-modality datasets.

Abstract

Emotion representation is a critical aspect of artificial intelligence, particularly in human-computer interaction and affective computing. Emotion recognition from multi-modal data remains challenging due to the complex semantic relationships between textual, audio, and visual features. This study proposes a hybrid model combining Enhanced Graph Attention Networks and Bidirectional Long Short-Term Memory to address this challenge. First, E-GAT captures structural dependencies between emotional features by constructing a semantic graph from text embeddings. Second, Bi-LSTM models temporal dynamics of sequential data, enabling effective integration of contextual information. We evaluated the model on three benchmark datasets: SemEval-2018 (text-only), RAVDESS (audio-visual), and CMU-MOSEI (multi-modal). Experimental results show that the proposed model achieves state-of-the-art performance: 58.5% accuracy and 68.7% F1-score on SemEval-2018, outperforming baseline models. On multi-modal datasets, it achieves 78.9% accuracy (RAVDESS) and 82.3% accuracy (CMU-MOSEI), demonstrating robust cross-modal generalization. This work advances emotion recognition by providing a unified framework for both text-only and multi-modal scenarios, with applications in human-computer interaction and mental health monitoring.

X / Twitter Facebook LinkedIn Email