Supervised Contrastive Learning with Nearest Neighbor Search for Speech Emotion Recognition

08/31/2023
by   Xuechen Wang, et al.
0

Speech Emotion Recognition (SER) is a challenging task due to limited data and blurred boundaries of certain emotions. In this paper, we present a comprehensive approach to improve the SER performance throughout the model lifecycle, including pre-training, fine-tuning, and inference stages. To address the data scarcity issue, we utilize a pre-trained model, wav2vec2.0. During fine-tuning, we propose a novel loss function that combines cross-entropy loss with supervised contrastive learning loss to improve the model's discriminative ability. This approach increases the inter-class distances and decreases the intra-class distances, mitigating the issue of blurred boundaries. Finally, to leverage the improved distances, we propose an interpolation method at the inference stage that combines the model prediction with the output from a k-nearest neighbors model. Our experiments on IEMOCAP demonstrate that our proposed methods outperform current state-of-the-art results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/12/2020

Bi-tuning of Pre-trained Representations

It is common within the deep learning community to first pre-train a dee...
research
11/03/2020

Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning

State-of-the-art natural language understanding classification models fo...
research
01/23/2022

A Pre-trained Audio-Visual Transformer for Emotion Recognition

In this paper, we introduce a pretrained audio-visual Transformer traine...
research
10/23/2019

Speech Emotion Recognition via Contrastive Loss under Siamese Networks

Speech emotion recognition is an important aspect of human-computer inte...
research
12/21/2021

Contrast and Generation Make BART a Good Dialogue Emotion Recognizer

In dialogue systems, utterances with similar semantics may have distinct...
research
04/30/2022

Loss Function Entropy Regularization for Diverse Decision Boundaries

Is it possible to train several classifiers to perform meaningful crowd-...
research
04/27/2022

Unsupervised Word Segmentation using K Nearest Neighbors

In this paper, we propose an unsupervised kNN-based approach for word se...

Please sign up or login with your details

Forgot password? Click here to reset