The Role of Phonetic Units in Speech Emotion Recognition

08/02/2021
by   Jiahong Yuan, et al.
0

We propose a method for emotion recognition through emotiondependent speech recognition using Wav2vec 2.0. Our method achieved a significant improvement over most previously reported results on IEMOCAP, a benchmark emotion dataset. Different types of phonetic units are employed and compared in terms of accuracy and robustness of emotion recognition within and across datasets and languages. Models of phonemes, broad phonetic classes, and syllables all significantly outperform the utterance model, demonstrating that phonetic units are helpful and should be incorporated in speech emotion recognition. The best performance is from using broad phonetic classes. Further research is needed to investigate the optimal set of broad phonetic classes for the task of emotion recognition. Finally, we found that Wav2vec 2.0 can be fine-tuned to recognize coarser-grained or larger phonetic units than phonemes, such as broad phonetic classes and syllables.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/12/2017

Learning Spontaneity to Improve Emotion Recognition In Speech

We investigate the effect and usefulness of spontaneity in speech (i.e. ...
research
08/21/2023

Refashioning Emotion Recognition Modelling: The Advent of Generalised Large Models

After the inception of emotion recognition or affective computing, it ha...
research
09/21/2023

The Broad Impact of Feature Imitation: Neural Enhancements Across Financial, Speech, and Physiological Domains

Initialization of neural network weights plays a pivotal role in determi...
research
10/07/2021

SERAB: A multi-lingual benchmark for speech emotion recognition

Recent developments in speech emotion recognition (SER) often leverage d...
research
01/05/2021

Fixed-MAML for Few Shot Classification in Multilingual Speech Emotion Recognition

In this paper, we analyze the feasibility of applying few-shot learning ...
research
10/11/2021

Cross Domain Emotion Recognition using Few Shot Knowledge Transfer

Emotion recognition from text is a challenging task due to diverse emoti...
research
03/09/2023

hierarchical network with decoupled knowledge distillation for speech emotion recognition

The goal of Speech Emotion Recognition (SER) is to enable computers to r...

Please sign up or login with your details

Forgot password? Click here to reset