Home Business & Finance Speech Emotion Recognition using Attention-based...
Business & Finance

Speech Emotion Recognition using Attention-based LSTM-Network with Residual Connection

Key Points

arXiv:2606.03359v1 Announce Type: new Abstract: Speech emotion recognition is an important component of modern human-computer interaction systems. However, many state-of-the-art approaches rely on large pretrained models with high computational and memory requirements, limiting their applicability. This paper proposes ResLSTM-SA, a lightweight architecture that integrates residual connections with soft attention within an LSTM-based framework.

arXiv:2606.03359v1 Announce Type: new Abstract: Speech emotion recognition is an important component of modern human-computer interaction systems. However, many state-of-the-art approaches rely on large pretrained models with high computational and memory requirements, limiting their applicability. This paper proposes ResLSTM-SA, a lightweight architecture that integrates residual connections with soft attention within an LSTM-based framework. Evaluated on the RAVDESS dataset under strict speaker-independent partitioning, the proposed model outperforms conventional attention-based LSTM baselines and several previously reported CNN- and hybrid CNN-LSTM architectures in terms of unweighted average recall (UAR). The best-performing variant (ResLSTM-SA-h64) achieves a maximum UAR of 0.6517 with only 46.8k trainable parameters, delivering competitive accuracy with three orders of magnitude fewer parameters than large-scale self-supervised alternatives, thereby enabling efficient deployment on edge devices and real-time voice assistants. The source code is available at https://github.com/Mak-Sim/ResLSTM-SER.
LSTM-Network with Residual Connection arXiv:2606.03359v1 Announce Type: (ORG) ResLSTM-SA (ORG) LSTM (ORG) RAVDESS (ORG) CNN (ORG) UAR (ORG) ResLSTM-SA-h64 (ORG) https://github.com/Mak-Sim/ResLSTM-SER (ORG)
Originally published by arXiv CS Read original →