SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning

06/27/2022
by   Zuheng Kang, et al.
0

Speech emotion recognition (SER) has many challenges, but one of the main challenges is that each framework does not have a unified standard. In this paper, we propose SpeechEQ, a framework for unifying SER tasks based on a multi-scale unified metric. This metric can be trained by Multitask Learning (MTL), which includes two emotion recognition tasks of Emotion States Category (EIS) and Emotion Intensity Scale (EIS), and two auxiliary tasks of phoneme recognition and gender recognition. For this framework, we build a Mandarin SER dataset - SpeechEQ Dataset (SEQD). We conducted experiments on the public CASIA and ESD datasets in Mandarin, which exhibit that our method outperforms baseline methods by a relatively large margin, yielding 8.0 improvement in accuracy respectively. Additional experiments on IEMOCAP with four emotion categories (i.e., angry, happy, sad, and neutral) also show the proposed method achieves a state-of-the-art of both weighted accuracy (WA) of 78.16

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/12/2017

Learning Spontaneity to Improve Emotion Recognition In Speech

We investigate the effect and usefulness of spontaneity in speech (i.e. ...
research
02/11/2021

Disentanglement for audio-visual emotion recognition using multitask setup

Deep learning models trained on audio-visual data have been successfully...
research
01/16/2020

Speech Emotion Recognition Based on Multi-feature and Multi-lingual Fusion

A speech emotion recognition algorithm based on multi-feature and Multi-...
research
07/23/2019

EmoBed: Strengthening Monomodal Emotion Recognition via Training with Crossmodal Emotion Embeddings

Despite remarkable advances in emotion recognition, they are severely re...
research
07/12/2022

Multitask Learning from Augmented Auxiliary Data for Improving Speech Emotion Recognition

Despite the recent progress in speech emotion recognition (SER), state-o...
research
06/19/2019

Learning Discriminative features using Center Loss and Reconstruction as Regularizer for Speech Emotion Recognition

This paper proposes a Convolutional Neural Network (CNN) inspired by Mul...
research
09/09/2021

Accounting for Variations in Speech Emotion Recognition with Nonparametric Hierarchical Neural Network

In recent years, deep-learning-based speech emotion recognition models h...

Please sign up or login with your details

Forgot password? Click here to reset