Deep Architecture Enhancing Robustness to Noise, Adversarial Attacks, and Cross-corpus Setting for Speech Emotion Recognition

05/18/2020
by   Siddique Latif, et al.
0

Speech emotion recognition systems (SER) can achieve high accuracy when the training and test data are identically distributed, but this assumption is frequently violated in practice and the performance of SER systems plummet against unforeseen data shifts. The design of robust models for accurate SER is challenging, which limits its use in practical applications. In this paper we propose a deeper neural network architecture wherein we fuse DenseNet, LSTM and Highway Network to learn powerful discriminative features which are robust to noise. We also propose data augmentation with our network architecture to further improve the robustness. We comprehensively evaluate the architecture coupled with data augmentation against (1) noise, (2) adversarial attacks and (3) cross-corpus settings. Our evaluations on the widely used IEMOCAP and MSP-IMPROV datasets show promising results when compared with existing studies and state-of-the-art models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2022

Multitask Learning from Augmented Auxiliary Data for Improving Speech Emotion Recognition

Despite the recent progress in speech emotion recognition (SER), state-o...
research
11/09/2022

A Comparative Study of Data Augmentation Techniques for Deep Learning Based Emotion Recognition

Automated emotion recognition in speech is a long-standing problem. Whil...
research
04/18/2021

Best Practices for Noise-Based Augmentation to Improve the Performance of Emotion Recognition "In the Wild"

Emotion recognition as a key component of high-stake downstream applicat...
research
03/09/2022

Robust Federated Learning Against Adversarial Attacks for Speech Emotion Recognition

Due to the development of machine learning and speech processing, speech...
research
01/17/2022

AugLy: Data Augmentations for Robustness

We introduce AugLy, a data augmentation library with a focus on adversar...
research
05/15/2020

"I have vxxx bxx connexxxn!": Facing Packet Loss in Deep Speech Emotion Recognition

In applications that use emotion recognition via speech, frame-loss can ...
research
10/21/2020

Dynamic Layer Customization for Noise Robust Speech Emotion Recognition in Heterogeneous Condition Training

Robustness to environmental noise is important to creating automatic spe...

Please sign up or login with your details

Forgot password? Click here to reset