Self-Supervised Learning for Audio-Based Emotion Recognition

07/23/2023
by   Peranut Nimitsurachat, et al.
0

Emotion recognition models using audio input data can enable the development of interactive systems with applications in mental healthcare, marketing, gaming, and social media analysis. While the field of affective computing using audio data is rich, a major barrier to achieve consistently high-performance models is the paucity of available training labels. Self-supervised learning (SSL) is a family of methods which can learn despite a scarcity of supervised labels by predicting properties of the data itself. To understand the utility of self-supervised learning for audio-based emotion recognition, we have applied self-supervised learning pre-training to the classification of emotions from the CMU- MOSEI's acoustic modality. Unlike prior papers that have experimented with raw acoustic data, our technique has been applied to encoded acoustic data. Our model is first pretrained to uncover the randomly-masked timestamps of the acoustic data. The pre-trained model is then fine-tuned using a small sample of annotated data. The performance of the final model is then evaluated via several evaluation metrics against a baseline deep learning model with an identical backbone architecture. We find that self-supervised learning consistently improves the performance of the model across all metrics. This work shows the utility of self-supervised learning for affective computing, demonstrating that self-supervised learning is most useful when the number of training examples is small, and that the effect is most pronounced for emotions which are easier to classify such as happy, sad and anger. This work further demonstrates that self-supervised learning works when applied to embedded feature representations rather than the traditional approach of pre-training on the raw input space.

READ FULL TEXT

page 1

page 2

research
11/20/2020

Self-Supervised learning with cross-modal transformers for emotion recognition

Emotion recognition is a challenging task due to limited availability of...
research
12/18/2022

BEATs: Audio Pre-Training with Acoustic Tokenizers

The massive growth of self-supervised learning (SSL) has been witnessed ...
research
07/20/2022

BYEL : Bootstrap on Your Emotion Latent

According to the problem of dataset construction cost for training in de...
research
06/21/2022

Exploring the Effectiveness of Self-supervised Learning and Classifier Chains in Emotion Recognition of Nonverbal Vocalizations

We present an emotion recognition system for nonverbal vocalizations (NV...
research
04/21/2023

A vector quantized masked autoencoder for speech emotion recognition

Recent years have seen remarkable progress in speech emotion recognition...
research
05/30/2023

Leveraging Semantic Information for Efficient Self-Supervised Emotion Recognition with Audio-Textual Distilled Models

In large part due to their implicit semantic modeling, self-supervised l...
research
04/01/2022

WavFT: Acoustic model finetuning with labelled and unlabelled data

Unsupervised and self-supervised learning methods have leveraged unlabel...

Please sign up or login with your details

Forgot password? Click here to reset