On the use of Self-supervised Pre-trained Acoustic and Linguistic Features for Continuous Speech Emotion Recognition

11/18/2020
by   Manon Macary, et al.
0

Pre-training for feature extraction is an increasingly studied approach to get better continuous representations of audio and text content. In the present work, we use wav2vec and camemBERT as self-supervised learned models to represent our data in order to perform continuous emotion recognition from speech (SER) on AlloSat, a large French emotional database describing the satisfaction dimension, and on the state of the art corpus SEWA focusing on valence, arousal and liking dimensions. To the authors' knowledge, this paper presents the first study showing that the joint use of wav2vec and BERT-like pre-trained features is very relevant to deal with continuous SER task, usually characterized by a small amount of labeled training data. Evaluated by the well-known concordance correlation coefficient (CCC), our experiments show that we can reach a CCC value of 0.825 instead of 0.592 when using MFCC in conjunction with word2vec word embedding on the AlloSat dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/11/2020

Recognizing More Emotions with Less Data Using Self-supervised Transfer Learning

We propose a novel transfer learning method for speech emotion recogniti...
research
02/07/2022

Speech Emotion Recognition using Self-Supervised Features

Self-supervised pre-trained features have consistently delivered state-o...
research
05/05/2023

A vector quantized masked autoencoder for audiovisual speech emotion recognition

While fully-supervised models have been shown to be effective for audiov...
research
09/22/2021

Alzheimers Dementia Detection using Acoustic Linguistic features and Pre-Trained BERT

Alzheimers disease is a fatal progressive brain disorder that worsens wi...
research
09/09/2023

Speech Emotion Recognition with Distilled Prosodic and Linguistic Affect Representations

We propose EmoDistill, a novel speech emotion recognition (SER) framewor...
research
02/26/2023

Mingling or Misalignment? Temporal Shift for Speech Emotion Recognition with Pre-trained Representations

Fueled by recent advances of self-supervised models, pre-trained speech ...
research
09/07/2020

Is Everything Fine, Grandma? Acoustic and Linguistic Modeling for Robust Elderly Speech Emotion Recognition

Acoustic and linguistic analysis for elderly emotion recognition is an u...

Please sign up or login with your details

Forgot password? Click here to reset