Unsupervised Personalization of an Emotion Recognition System: The Unique Properties of the Externalization of Valence in Speech

01/19/2022
by   Kusha Sridhar, et al.
0

The prediction of valence from speech is an important, but challenging problem. The externalization of valence in speech has speaker-dependent cues, which contribute to performances that are often significantly lower than the prediction of other emotional attributes such as arousal and dominance. A practical approach to improve valence prediction from speech is to adapt the models to the target speakers in the test set. Adapting a speech emotion recognition (SER) system to a particular speaker is a hard problem, especially with deep neural networks (DNNs), since it requires optimizing millions of parameters. This study proposes an unsupervised approach to address this problem by searching for speakers in the train set with similar acoustic patterns as the speaker in the test set. Speech samples from the selected speakers are used to create the adaptation set. This approach leverages transfer learning using pre-trained models, which are adapted with these speech samples. We propose three alternative adaptation strategies: unique speaker, oversampling and weighting approaches. These methods differ on the use of the adaptation set in the personalization of the valence models. The results demonstrate that a valence prediction model can be efficiently personalized with these unsupervised approaches, leading to relative improvements as high as 13.52

READ FULL TEXT
research
04/15/2021

Speaker Attentive Speech Emotion Recognition

Speech Emotion Recognition (SER) task has known significant improvements...
research
09/05/2023

Personalized Adaptation with Pre-trained Speech Encoders for Continuous Emotion Recognition

There are individual differences in expressive behaviors driven by cultu...
research
04/06/2022

Emotional Speech Recognition with Pre-trained Deep Visual Models

In this paper, we propose a new methodology for emotional speech recogni...
research
03/22/2019

Towards adversarial learning of speaker-invariant representation for speech emotion recognition

Speech emotion recognition (SER) has attracted great attention in recent...
research
02/24/2023

Pre-Finetuning for Few-Shot Emotional Speech Recognition

Speech models have long been known to overfit individual speakers for ma...
research
06/21/2019

Unsupervised Phoneme and Word Discovery from Multiple Speakers using Double Articulation Analyzer and Neural Network with Parametric Bias

This paper describes a new unsupervised machine learning method for simu...
research
10/26/2022

Effect of different splitting criteria on the performance of speech emotion recognition

Traditional speech emotion recognition (SER) evaluations have been perfo...

Please sign up or login with your details

Forgot password? Click here to reset