Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition

10/12/2021
by   Li-Wei Chen, et al.
0

While wav2vec 2.0 has been proposed for speech recognition (ASR), it can also be used for speech emotion recognition (SER); its performance can be significantly improved using different fine-tuning strategies. Two baseline methods, vanilla fine-tuning (V-FT) and task adaptive pretraining (TAPT) are first presented. We show that V-FT is able to outperform state-of-the-art models on the IEMOCAP dataset. TAPT, an existing NLP fine-tuning strategy, further improves the performance on SER. We also introduce a novel fine-tuning method termed P-TAPT, which modifies the TAPT objective to learn contextualized emotion representations. Experiments show that P-TAPT performs better than TAPT especially under low-resource settings. Compared to prior works in this literature, our top-line system achieved a 7.4 unweighted accuracy (UA) over the state-of-the-art performance on IEMOCAP. Our code is publicly available.

READ FULL TEXT
research
07/29/2022

Domain Specific Wav2vec 2.0 Fine-tuning For The SE R 2022 Challenge

This paper presents our efforts to build a robust ASR model for the shar...
research
10/26/2022

Fast Yet Effective Speech Emotion Recognition with Self-distillation

Speech emotion recognition (SER) is the task of recognising human's emot...
research
12/10/2018

Data Fine-tuning

In real-world applications, commercial off-the-shelf systems are utilize...
research
02/01/2021

On Scaling Contrastive Representations for Low-Resource Speech Recognition

Recent advances in self-supervised learning through contrastive training...
research
10/24/2019

Unsupervised Representation Learning with Future Observation Prediction for Speech Emotion Recognition

Prior works on speech emotion recognition utilize various unsupervised l...
research
01/17/2023

BERT-ERC: Fine-tuning BERT is Enough for Emotion Recognition in Conversation

Previous works on emotion recognition in conversation (ERC) follow a two...
research
04/22/2019

Exploring Unsupervised Pretraining and Sentence Structure Modelling for Winograd Schema Challenge

Winograd Schema Challenge (WSC) was proposed as an AI-hard problem in te...

Please sign up or login with your details

Forgot password? Click here to reset