Improving Speech Emotion Recognition Performance using Differentiable Architecture Search

05/23/2023
by   Thejan Rajapakshe, et al.
0

Speech Emotion Recognition (SER) is a critical enabler of emotion-aware communication in human-computer interactions. Deep Learning (DL) has improved the performance of SER models by improving model complexity. However, designing DL architectures requires prior experience and experimental evaluations. Encouragingly, Neural Architecture Search (NAS) allows automatic search for an optimum DL model. In particular, Differentiable Architecture Search (DARTS) is an efficient method of using NAS to search for optimised models. In this paper, we propose DARTS for a joint CNN and LSTM architecture for improving SER performance. Our choice of the CNN LSTM coupling is inspired by results showing that similar models offer improved performance. While SER researchers have considered CNNs and RNNs separately, the viability of using DARTs jointly for CNN and LSTM still needs exploration. Experimenting with the IEMOCAP dataset, we demonstrate that our approach outperforms best-reported results using DARTS for SER.

READ FULL TEXT
research
03/25/2022

EmotionNAS: Two-stream Architecture Search for Speech Emotion Recognition

Speech emotion recognition (SER) is a crucial research topic in human-co...
research
03/31/2022

Neural Architecture Search for Speech Emotion Recognition

Deep neural networks have brought significant advancements to speech emo...
research
10/31/2022

Multilingual Speech Emotion Recognition With Multi-Gating Mechanism and Neural Architecture Search

Speech emotion recognition (SER) classifies audio into emotion categorie...
research
04/08/2019

Direct Modelling of Speech Emotion from Raw Speech

Speech emotion recognition is a challenging task and heavily depends on ...
research
01/08/2022

Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks

State-of-the-art automatic speech recognition (ASR) system development i...
research
06/12/2023

MFAS: Emotion Recognition through Multiple Perspectives Fusion Architecture Search Emulating Human Cognition

Speech emotion recognition aims to identify and analyze emotional states...
research
09/20/2023

Grassroots Operator Search for Model Edge Adaptation

Hardware-aware Neural Architecture Search (HW-NAS) is increasingly being...

Please sign up or login with your details

Forgot password? Click here to reset