CopyPaste: An Augmentation Method for Speech Emotion Recognition

10/27/2020
by   Raghavendra Pappagari, et al.
0

Data augmentation is a widely used strategy for training robust machine learning models. It partially alleviates the problem of limited data for tasks like speech emotion recognition (SER), where collecting data is expensive and challenging. This study proposes CopyPaste, a perceptually motivated novel augmentation procedure for SER. Assuming that the presence of emotions other than neutral dictates a speaker's overall perceived emotion in a recording, concatenation of an emotional (emotion E) and a neutral utterance can still be labeled with emotion E. We hypothesize that SER performance can be improved using these concatenated utterances in model training. To verify this, three CopyPaste schemes are tested on two deep learning models: one trained independently and another using transfer learning from an x-vector model, a speaker recognition model. We observed that all three CopyPaste schemes improve SER performance on all the three datasets considered: MSP-Podcast, Crema-D, and IEMOCAP. Additionally, CopyPaste performs better than noise augmentation and, using them together improves the SER performance further. Our experiments on noisy test sets suggested that CopyPaste is effective even in noisy test conditions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/18/2021

Best Practices for Noise-Based Augmentation to Improve the Performance of Emotion Recognition "In the Wild"

Emotion recognition as a key component of high-stake downstream applicat...
research
02/12/2020

x-vectors meet emotions: A study on dependencies between emotion and speaker recognition

In this work, we explore the dependencies between speaker recognition an...
research
08/05/2021

Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation

Automatic speech emotion recognition (SER) is a challenging task that pl...
research
02/17/2023

Gaussian-smoothed Imbalance Data Improves Speech Emotion Recognition

In speech emotion recognition tasks, models learn emotional representati...
research
01/10/2022

A study on cross-corpus speech emotion recognition and data augmentation

Models that can handle a wide range of speakers and acoustic conditions ...
research
08/10/2022

Data Augmentation for Improving Emotion Recognition in Software Engineering Communication

Emotions (e.g., Joy, Anger) are prevalent in daily software engineering ...
research
05/05/2022

M2R2: Missing-Modality Robust emotion Recognition framework with iterative data augmentation

This paper deals with the utterance-level modalities missing problem wit...

Please sign up or login with your details

Forgot password? Click here to reset