PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models

06/08/2023
by   Tiantian Feng, et al.
0

Many recent studies have focused on fine-tuning pre-trained models for speech emotion recognition (SER), resulting in promising performance compared to traditional methods that rely largely on low-level, knowledge-inspired acoustic features. These pre-trained speech models learn general-purpose speech representations using self-supervised or weakly-supervised learning objectives from large-scale datasets. Despite the significant advances made in SER through the use of pre-trained architecture, fine-tuning these large pre-trained models for different datasets requires saving copies of entire weight parameters, rendering them impractical to deploy in real-world settings. As an alternative, this work explores parameter-efficient fine-tuning (PEFT) approaches for adapting pre-trained speech models for emotion recognition. Specifically, we evaluate the efficacy of adapter tuning, embedding prompt tuning, and LoRa (Low-rank approximation) on four popular SER testbeds. Our results reveal that LoRa achieves the best fine-tuning performance in emotion recognition while enhancing fairness and requiring only a minimal extra amount of weight parameters. Furthermore, our findings offer novel insights into future research directions in SER, distinct from existing approaches focusing on directly fine-tuning the model architecture. Our code is publicly available under: https://github.com/usc-sail/peft-ser.

READ FULL TEXT

page 1

page 3

page 5

page 6

research
05/18/2023

TrustSER: On the Trustworthiness of Fine-tuning Pre-trained Speech Embeddings For Speech Emotion Recognition

Recent studies have explored the use of pre-trained embeddings for speec...
research
11/04/2021

A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding

Self-supervised speech representations such as wav2vec 2.0 and HuBERT ar...
research
08/10/2022

Non-Contrastive Self-supervised Learning for Utterance-Level Information Extraction from Speech

In recent studies, self-supervised pre-trained models tend to outperform...
research
10/26/2022

Fast Yet Effective Speech Emotion Recognition with Self-distillation

Speech emotion recognition (SER) is the task of recognising human's emot...
research
02/26/2023

Mingling or Misalignment? Temporal Shift for Speech Emotion Recognition with Pre-trained Representations

Fueled by recent advances of self-supervised models, pre-trained speech ...
research
06/01/2023

How to Estimate Model Transferability of Pre-Trained Speech Models?

In this work, we introduce a “score-based assessment” framework for esti...
research
09/07/2023

LanSER: Language-Model Supported Speech Emotion Recognition

Speech emotion recognition (SER) models typically rely on costly human-l...

Please sign up or login with your details

Forgot password? Click here to reset